public class PDFTextStripperByArea extends PDFTextStripper
charactersByArticle, document, output, outputEncoding, systemLineSeparator| Constructor and Description | 
|---|
| PDFTextStripperByArea()Constructor. | 
| PDFTextStripperByArea(Properties props)Instantiate a new PDFTextStripperArea object. | 
| PDFTextStripperByArea(String encoding)Instantiate a new PDFTextStripperArea object. | 
| Modifier and Type | Method and Description | 
|---|---|
| void | addRegion(String regionName,
         Rectangle2D rect)Add a new region to group text by. | 
| void | extractRegions(PDPage page)Process the page to extract the region text. | 
| List<String> | getRegions()Get the list of regions that have been setup. | 
| String | getTextForRegion(String regionName)Get the text for the region, this should be called after extractRegions(). | 
| protected void | processTextPosition(TextPosition text)This will process a TextPosition object and add the
 text to the list of characters on a page. | 
| void | removeRegion(String regionName)Delete a region to group text by. | 
| protected void | writePage()This will print the processed page text to the output stream. | 
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageSeparator, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getText, getWordSeparator, handleLineSeparation, inspectFontEncoding, isParagraphSeparation, matchListItemPattern, matchPattern, processPage, processPages, resetEngine, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageSeparator, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageSeperator, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeText, writeWordSeparatorgetColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, isForceParsing, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, setColorSpaces, setFonts, setForceParsing, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrixpublic PDFTextStripperByArea()
                      throws IOException
IOException - If there is an error loading properties.public PDFTextStripperByArea(Properties props) throws IOException
props - The properties containing the mapping of operators to
            PDFOperator classes.IOException - If there is an error reading the properties.public PDFTextStripperByArea(String encoding) throws IOException
encoding - The encoding that the output will be written in.IOException - If there is an error reading the properties.public void addRegion(String regionName, Rectangle2D rect)
regionName - The name of the region.rect - The rectangle area to retrieve the text from.public void removeRegion(String regionName)
regionName - The name of the region to delete.public List<String> getRegions()
public String getTextForRegion(String regionName)
regionName - The name of the region to get the text from.public void extractRegions(PDPage page) throws IOException
page - The page to extract the regions from.IOException - If there is an error while extracting text.protected void processTextPosition(TextPosition text)
processTextPosition in class PDFTextStrippertext - The text to process.protected void writePage()
                  throws IOException
writePage in class PDFTextStripperIOException - If there is an error writing the text.Copyright © 2002–2016 The Apache Software Foundation. All rights reserved.