public class PDFHighlighter extends PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output| Constructor and Description |
|---|
PDFHighlighter()
Default constructor.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
endPage(PDPage pdPage)
End a page.
|
void |
generateXMLHighlight(PDDocument pdDocument,
String[] sWords,
Writer xmlOutput)
Generate an XML highlight string based on the PDF.
|
void |
generateXMLHighlight(PDDocument pdDocument,
String highlightWord,
Writer xmlOutput)
Generate an XML highlight string based on the PDF.
|
static void |
main(String[] args)
Command line application.
|
protected void |
showGlyph(Matrix textRenderingMatrix,
PDFont font,
int code,
String unicode,
Vector displacement)
This method was originally written by Ben Litchfield for PDFStreamEngine.
|
endArticle, endDocument, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparatoraddOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperatorpublic PDFHighlighter()
throws IOException
IOException - If there is an error constructing this class.public void generateXMLHighlight(PDDocument pdDocument, String highlightWord, Writer xmlOutput) throws IOException
pdDocument - The PDF to find words in.highlightWord - The word to search for.xmlOutput - The resulting output xml file.IOException - If there is an error reading from the PDF, or writing to the XML.public void generateXMLHighlight(PDDocument pdDocument, String[] sWords, Writer xmlOutput) throws IOException
pdDocument - The PDF to find words in.sWords - The words to search for.xmlOutput - The resulting output xml file.IOException - If there is an error reading from the PDF, or writing to the XML.protected void endPage(PDPage pdPage) throws IOException
endPage in class PDFTextStripperpdPage - The page we are about to process.IOException - If there is any error writing to the stream.public static void main(String[] args) throws IOException
args - The command line arguments to the application.IOException - If there is an error generating the highlight file.protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
showGlyph in class PDFStreamEnginetextRenderingMatrix - the current text rendering matrix, Trmfont - the current fontcode - internal PDF character code for the glyphunicode - the Unicode text for this glyph, or null if the PDF does provide itdisplacement - the displacement (i.e. advance) of the glyph in text spaceIOException - if the glyph cannot be processedCopyright © 2002–2018 The Apache Software Foundation. All rights reserved.