public class PDFTextStripperByArea extends PDFTextStripper
charactersByArticle, document, output, outputEncoding, systemLineSeparator| Constructor and Description |
|---|
PDFTextStripperByArea()
Constructor.
|
PDFTextStripperByArea(Properties props)
Instantiate a new PDFTextStripperArea object.
|
PDFTextStripperByArea(String encoding)
Instantiate a new PDFTextStripperArea object.
|
| Modifier and Type | Method and Description |
|---|---|
void |
addRegion(String regionName,
Rectangle2D rect)
Add a new region to group text by.
|
void |
extractRegions(PDPage page)
Process the page to extract the region text.
|
List<String> |
getRegions()
Get the list of regions that have been setup.
|
String |
getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions().
|
protected void |
processTextPosition(TextPosition text)
This will process a TextPosition object and add the
text to the list of characters on a page.
|
void |
removeRegion(String regionName)
Delete a region to group text by.
|
protected void |
writePage()
This will print the processed page text to the output stream.
|
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageSeparator, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getText, getWordSeparator, handleLineSeparation, inspectFontEncoding, isParagraphSeparation, matchListItemPattern, matchPattern, processPage, processPages, resetEngine, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageSeparator, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageSeperator, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeText, writeWordSeparatorgetColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getTotalCharCnt, getValidCharCnt, getXObjects, isForceParsing, processEncodedText, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, setColorSpaces, setFonts, setForceParsing, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrixpublic PDFTextStripperByArea()
throws IOException
IOException - If there is an error loading properties.public PDFTextStripperByArea(Properties props) throws IOException
props - The properties containing the mapping of operators to
PDFOperator classes.IOException - If there is an error reading the properties.public PDFTextStripperByArea(String encoding) throws IOException
encoding - The encoding that the output will be written in.IOException - If there is an error reading the properties.public void addRegion(String regionName, Rectangle2D rect)
regionName - The name of the region.rect - The rectangle area to retrieve the text from.public void removeRegion(String regionName)
regionName - The name of the region to delete.public List<String> getRegions()
public String getTextForRegion(String regionName)
regionName - The name of the region to get the text from.public void extractRegions(PDPage page) throws IOException
page - The page to extract the regions from.IOException - If there is an error while extracting text.protected void processTextPosition(TextPosition text)
processTextPosition in class PDFTextStrippertext - The text to process.protected void writePage()
throws IOException
writePage in class PDFTextStripperIOException - If there is an error writing the text.Copyright © 2002-2016 The Apache Software Foundation. All Rights Reserved.