public class PDFTextStripperByArea extends PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output
Constructor and Description |
---|
PDFTextStripperByArea()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
addRegion(String regionName,
Rectangle2D rect)
Add a new region to group text by.
|
void |
extractRegions(PDPage page)
Process the page to extract the region text.
|
List<String> |
getRegions()
Get the list of regions that have been setup.
|
String |
getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions().
|
protected void |
processTextPosition(TextPosition text)
This will process a TextPosition object and add the text to the list of characters on a page.
|
void |
removeRegion(String regionName)
Delete a region to group text by.
|
void |
setShouldSeparateByBeads(boolean aShouldSeparateByBeads)
This method does nothing in this derived class, because beads and regions are incompatible.
|
protected void |
showGlyph(Matrix textRenderingMatrix,
PDFont font,
int code,
String unicode,
Vector displacement)
This method was originally written by Ben Litchfield for PDFStreamEngine.
|
protected void |
writePage()
This will print the processed page text to the output stream.
|
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
addOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
public PDFTextStripperByArea() throws IOException
IOException
- If there is an error loading properties.public final void setShouldSeparateByBeads(boolean aShouldSeparateByBeads)
setShouldSeparateByBeads
in class PDFTextStripper
aShouldSeparateByBeads
- The new grouping of beads.public void addRegion(String regionName, Rectangle2D rect)
regionName
- The name of the region.rect
- The rectangle area to retrieve the text from.public void removeRegion(String regionName)
regionName
- The name of the region to delete.public List<String> getRegions()
public String getTextForRegion(String regionName)
regionName
- The name of the region to get the text from.public void extractRegions(PDPage page) throws IOException
page
- The page to extract the regions from.IOException
- If there is an error while extracting text.protected void processTextPosition(TextPosition text)
processTextPosition
in class PDFTextStripper
text
- The text to process.protected void writePage() throws IOException
writePage
in class PDFTextStripper
IOException
- If there is an error writing the text.protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
showGlyph
in class PDFStreamEngine
textRenderingMatrix
- the current text rendering matrix, Trmfont
- the current fontcode
- internal PDF character code for the glyphunicode
- the Unicode text for this glyph, or null if the PDF does provide itdisplacement
- the displacement (i.e. advance) of the glyph in text spaceIOException
- if the glyph cannot be processedCopyright © 2002–2018 The Apache Software Foundation. All rights reserved.