public class PDFTextStripperByArea extends PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output| Constructor and Description |
|---|
PDFTextStripperByArea()
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
void |
addRegion(String regionName,
Rectangle2D rect)
Add a new region to group text by.
|
void |
extractRegions(PDPage page)
Process the page to extract the region text.
|
List<String> |
getRegions()
Get the list of regions that have been setup.
|
String |
getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions().
|
protected void |
processTextPosition(TextPosition text)
This will process a TextPosition object and add the text to the list of characters on a page.
|
void |
removeRegion(String regionName)
Delete a region to group text by.
|
void |
setShouldSeparateByBeads(boolean aShouldSeparateByBeads)
This method does nothing in this derived class, because beads and regions are incompatible.
|
protected void |
showGlyph(Matrix textRenderingMatrix,
PDFont font,
int code,
String unicode,
Vector displacement)
This method was originally written by Ben Litchfield for PDFStreamEngine.
|
protected void |
writePage()
This will print the processed page text to the output stream.
|
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparatoraddOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperatorpublic PDFTextStripperByArea()
throws IOException
IOException - If there is an error loading properties.public final void setShouldSeparateByBeads(boolean aShouldSeparateByBeads)
setShouldSeparateByBeads in class PDFTextStripperaShouldSeparateByBeads - The new grouping of beads.public void addRegion(String regionName, Rectangle2D rect)
regionName - The name of the region.rect - The rectangle area to retrieve the text from.public void removeRegion(String regionName)
regionName - The name of the region to delete.public List<String> getRegions()
public String getTextForRegion(String regionName)
regionName - The name of the region to get the text from.public void extractRegions(PDPage page) throws IOException
page - The page to extract the regions from.IOException - If there is an error while extracting text.protected void processTextPosition(TextPosition text)
processTextPosition in class PDFTextStrippertext - The text to process.protected void writePage()
throws IOException
writePage in class PDFTextStripperIOException - If there is an error writing the text.protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
showGlyph in class PDFStreamEnginetextRenderingMatrix - the current text rendering matrix, Trmfont - the current fontcode - internal PDF character code for the glyphunicode - the Unicode text for this glyph, or null if the PDF does provide itdisplacement - the displacement (i.e. advance) of the glyph in text spaceIOException - if the glyph cannot be processedCopyright © 2002–2016 The Apache Software Foundation. All rights reserved.