public class PDFMarkedContentExtractor extends PDFStreamEngine
Constructor and Description |
---|
PDFMarkedContentExtractor()
Instantiate a new PDFTextStripper object.
|
PDFMarkedContentExtractor(String encoding)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
beginMarkedContentSequence(COSName tag,
COSDictionary properties) |
void |
endMarkedContentSequence() |
List<PDMarkedContent> |
getMarkedContents() |
void |
processPage(PDPage page)
This will initialise and process the contents of the stream.
|
protected void |
processTextPosition(TextPosition text)
This will process a TextPosition object and add the
text to the list of characters on a page.
|
protected void |
showGlyph(Matrix textRenderingMatrix,
PDFont font,
int code,
String unicode,
Vector displacement)
This method was originally written by Ben Litchfield for PDFStreamEngine.
|
void |
xobject(PDXObject xobject) |
addOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
public PDFMarkedContentExtractor() throws IOException
IOException
public PDFMarkedContentExtractor(String encoding) throws IOException
encoding
- The encoding that the output will be written in.IOException
public void beginMarkedContentSequence(COSName tag, COSDictionary properties)
public void endMarkedContentSequence()
public void xobject(PDXObject xobject)
protected void processTextPosition(TextPosition text)
text
- The text to process.public List<PDMarkedContent> getMarkedContents()
public void processPage(PDPage page) throws IOException
processPage
in class PDFStreamEngine
page
- the page to processIOException
- if there is an error accessing the stream.protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
showGlyph
in class PDFStreamEngine
textRenderingMatrix
- the current text rendering matrix, Trmfont
- the current fontcode
- internal PDF character code for the glyphunicode
- the Unicode text for this glyph, or null if the PDF does provide itdisplacement
- the displacement (i.e. advance) of the glyph in text spaceIOException
- if the glyph cannot be processedCopyright © 2002–2017 The Apache Software Foundation. All rights reserved.