public class PDFText2HTML extends PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output| Constructor and Description | 
|---|
| PDFText2HTML()Constructor. | 
| Modifier and Type | Method and Description | 
|---|---|
| protected void | endArticle()Write out the article separator. | 
| void | endDocument(PDDocument document)This method is available for subclasses of this class. | 
| protected String | getTitle()This method will attempt to guess the title of the document using
 either the document properties or the first lines of text. | 
| protected void | showGlyph(Matrix textRenderingMatrix,
         PDFont font,
         int code,
         String unicode,
         Vector displacement)This method was originally written by Ben Litchfield for PDFStreamEngine. | 
| protected void | startArticle(boolean isLTR)Write out the article separator (div tag) with proper text direction
 information. | 
| protected void | startDocument(PDDocument document)This method is available for subclasses of this class. | 
| protected void | writeHeader()Deprecated. 
 deprecated, use  startDocument(PDDocument) | 
| protected void | writeParagraphEnd()Writes the paragraph end "</p>" to the output. | 
| protected void | writeString(String chars)Write a string to the output stream and escape some HTML characters. | 
| protected void | writeString(String text,
           List<TextPosition> textPositions)Write a string to the output stream, maintain font state, and escape some HTML characters. | 
endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphSeparator, writeParagraphStart, writeText, writeWordSeparatoraddOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperatorpublic PDFText2HTML()
             throws IOException
IOException - If there is an error during initialization.protected void writeHeader()
                    throws IOException
startDocument(PDDocument)IOException - If there is a problem writing out the header to the document.protected void startDocument(PDDocument document) throws IOException
PDFTextStripperstartDocument in class PDFTextStripperdocument - The PDF document that is being processed.IOException - If an IO error occurs.public void endDocument(PDDocument document) throws IOException
endDocument in class PDFTextStripperdocument - The PDF document that is being processed.IOException - If an IO error occurs.protected String getTitle()
protected void startArticle(boolean isLTR)
                     throws IOException
startArticle in class PDFTextStripperisLTR - true if direction of text is left to rightIOException - If there is an error writing to the stream.protected void endArticle()
                   throws IOException
endArticle in class PDFTextStripperIOException - If there is an error writing to the stream.protected void writeString(String text, List<TextPosition> textPositions) throws IOException
writeString in class PDFTextStrippertext - The text to write to the stream.textPositions - the corresponding text positionsIOException - If there is an error writing to the stream.protected void writeString(String chars) throws IOException
writeString in class PDFTextStripperchars - String to be written to the streamIOException - If there is an error writing to the stream.protected void writeParagraphEnd()
                          throws IOException
writeParagraphEnd in class PDFTextStripperIOException - if something went wrongprotected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
showGlyph in class PDFStreamEnginetextRenderingMatrix - the current text rendering matrix, Trmfont - the current fontcode - internal PDF character code for the glyphunicode - the Unicode text for this glyph, or null if the PDF does provide itdisplacement - the displacement (i.e. advance) of the glyph in text spaceIOException - if the glyph cannot be processedCopyright © 2002–2016 The Apache Software Foundation. All rights reserved.