public class PDFHighlighter extends PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output
Constructor and Description |
---|
PDFHighlighter()
Default constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
endPage(PDPage pdPage)
End a page.
|
void |
generateXMLHighlight(PDDocument pdDocument,
String[] sWords,
Writer xmlOutput)
Generate an XML highlight string based on the PDF.
|
void |
generateXMLHighlight(PDDocument pdDocument,
String highlightWord,
Writer xmlOutput)
Generate an XML highlight string based on the PDF.
|
static void |
main(String[] args)
Command line application.
|
protected void |
showGlyph(Matrix textRenderingMatrix,
PDFont font,
int code,
String unicode,
Vector displacement)
This method was originally written by Ben Litchfield for PDFStreamEngine.
|
endArticle, endDocument, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
addOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
public PDFHighlighter() throws IOException
IOException
- If there is an error constructing this class.public void generateXMLHighlight(PDDocument pdDocument, String highlightWord, Writer xmlOutput) throws IOException
pdDocument
- The PDF to find words in.highlightWord
- The word to search for.xmlOutput
- The resulting output xml file.IOException
- If there is an error reading from the PDF, or writing to the XML.public void generateXMLHighlight(PDDocument pdDocument, String[] sWords, Writer xmlOutput) throws IOException
pdDocument
- The PDF to find words in.sWords
- The words to search for.xmlOutput
- The resulting output xml file.IOException
- If there is an error reading from the PDF, or writing to the XML.protected void endPage(PDPage pdPage) throws IOException
endPage
in class PDFTextStripper
pdPage
- The page we are about to process.IOException
- If there is any error writing to the stream.public static void main(String[] args) throws IOException
args
- The command line arguments to the application.IOException
- If there is an error generating the highlight file.protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
showGlyph
in class PDFStreamEngine
textRenderingMatrix
- the current text rendering matrix, Trmfont
- the current fontcode
- internal PDF character code for the glyphunicode
- the Unicode text for this glyph, or null if the PDF does provide itdisplacement
- the displacement (i.e. advance) of the glyph in text spaceIOException
- if the glyph cannot be processedCopyright © 2002–2016 The Apache Software Foundation. All rights reserved.