org.apache.pdfbox.util
Class PDFHighlighter

java.lang.Object
  extended by org.apache.pdfbox.util.PDFStreamEngine
      extended by org.apache.pdfbox.util.PDFTextStripper
          extended by org.apache.pdfbox.util.PDFHighlighter

public class PDFHighlighter
extends PDFTextStripper

Highlighting of words in a PDF document with an XML file.

Version:
$Revision: 1.7 $
Author:
slagraulet (slagraulet@cardiweb.com), Ben Litchfield
See Also:
Adobe Highlight File Format

Field Summary
 
Fields inherited from class org.apache.pdfbox.util.PDFTextStripper
charactersByArticle, document, lineSeparator, output, outputEncoding
 
Constructor Summary
PDFHighlighter()
          Default constructor.
 
Method Summary
protected  void endPage(PDPage pdPage)
          End a page.
 void generateXMLHighlight(PDDocument pdDocument, String[] sWords, Writer xmlOutput)
          Generate an XML highlight string based on the PDF.
 void generateXMLHighlight(PDDocument pdDocument, String highlightWord, Writer xmlOutput)
          Generate an XML highlight string based on the PDF.
static void main(String[] args)
          Command line application.
 
Methods inherited from class org.apache.pdfbox.util.PDFTextStripper
endArticle, endDocument, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getEndBookmark, getEndPage, getLineSeparator, getOutput, getPageSeparator, getSpacingTolerance, getStartBookmark, getStartPage, getText, getText, getWordSeparator, inspectFontEncoding, processPage, processPages, processTextPosition, resetEngine, setAverageCharTolerance, setEndBookmark, setEndPage, setLineSeparator, setPageSeparator, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, shouldSeparateByBeads, shouldSortByPosition, shouldSuppressDuplicateOverlappingText, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageSeperator, writeString, writeText, writeText, writeWordSeparator
 
Methods inherited from class org.apache.pdfbox.util.PDFStreamEngine
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDFHighlighter

public PDFHighlighter()
               throws IOException
Default constructor.

Throws:
IOException - If there is an error constructing this class.
Method Detail

generateXMLHighlight

public void generateXMLHighlight(PDDocument pdDocument,
                                 String highlightWord,
                                 Writer xmlOutput)
                          throws IOException
Generate an XML highlight string based on the PDF.

Parameters:
pdDocument - The PDF to find words in.
highlightWord - The word to search for.
xmlOutput - The resulting output xml file.
Throws:
IOException - If there is an error reading from the PDF, or writing to the XML.

generateXMLHighlight

public void generateXMLHighlight(PDDocument pdDocument,
                                 String[] sWords,
                                 Writer xmlOutput)
                          throws IOException
Generate an XML highlight string based on the PDF.

Parameters:
pdDocument - The PDF to find words in.
sWords - The words to search for.
xmlOutput - The resulting output xml file.
Throws:
IOException - If there is an error reading from the PDF, or writing to the XML.

endPage

protected void endPage(PDPage pdPage)
                throws IOException
End a page. Default implementation is to do nothing. Subclasses may provide additional information.

Overrides:
endPage in class PDFTextStripper
Parameters:
pdPage - The page we are about to process.
Throws:
IOException - If there is any error writing to the stream.

main

public static void main(String[] args)
                 throws IOException
Command line application.

Parameters:
args - The command line arguments to the application.
Throws:
IOException - If there is an error generating the highlight file.


Copyright © 2002-2010 The Apache Software Foundation. All Rights Reserved.