public class PDFText2HTML extends PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output
Constructor and Description |
---|
PDFText2HTML()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
endArticle()
Write out the article separator.
|
void |
endDocument(PDDocument document)
This method is available for subclasses of this class.
|
protected String |
getTitle()
This method will attempt to guess the title of the document using
either the document properties or the first lines of text.
|
protected void |
showGlyph(Matrix textRenderingMatrix,
PDFont font,
int code,
String unicode,
Vector displacement)
This method was originally written by Ben Litchfield for PDFStreamEngine.
|
protected void |
startArticle(boolean isLTR)
Write out the article separator (div tag) with proper text direction
information.
|
protected void |
startDocument(PDDocument document)
This method is available for subclasses of this class.
|
protected void |
writeHeader()
Deprecated.
deprecated, use
startDocument(PDDocument) |
protected void |
writeParagraphEnd()
Writes the paragraph end "</p>" to the output.
|
protected void |
writeString(String chars)
Write a string to the output stream and escape some HTML characters.
|
protected void |
writeString(String text,
List<TextPosition> textPositions)
Write a string to the output stream, maintain font state, and escape some HTML characters.
|
endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphSeparator, writeParagraphStart, writeText, writeWordSeparator
addOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
public PDFText2HTML() throws IOException
IOException
- If there is an error during initialization.protected void writeHeader() throws IOException
startDocument(PDDocument)
IOException
- If there is a problem writing out the header to the document.protected void startDocument(PDDocument document) throws IOException
PDFTextStripper
startDocument
in class PDFTextStripper
document
- The PDF document that is being processed.IOException
- If an IO error occurs.public void endDocument(PDDocument document) throws IOException
endDocument
in class PDFTextStripper
document
- The PDF document that is being processed.IOException
- If an IO error occurs.protected String getTitle()
protected void startArticle(boolean isLTR) throws IOException
startArticle
in class PDFTextStripper
isLTR
- true if direction of text is left to rightIOException
- If there is an error writing to the stream.protected void endArticle() throws IOException
endArticle
in class PDFTextStripper
IOException
- If there is an error writing to the stream.protected void writeString(String text, List<TextPosition> textPositions) throws IOException
writeString
in class PDFTextStripper
text
- The text to write to the stream.textPositions
- the corresponding text positionsIOException
- If there is an error writing to the stream.protected void writeString(String chars) throws IOException
writeString
in class PDFTextStripper
chars
- String to be written to the streamIOException
- If there is an error writing to the stream.protected void writeParagraphEnd() throws IOException
writeParagraphEnd
in class PDFTextStripper
IOException
- if something went wrongprotected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
showGlyph
in class PDFStreamEngine
textRenderingMatrix
- the current text rendering matrix, Trmfont
- the current fontcode
- internal PDF character code for the glyphunicode
- the Unicode text for this glyph, or null if the PDF does provide itdisplacement
- the displacement (i.e. advance) of the glyph in text spaceIOException
- if the glyph cannot be processedCopyright © 2002–2017 The Apache Software Foundation. All rights reserved.