public class NonSequentialPDFParser extends PDFParser
PDFParser.
 
 This class can be used as a PDFParser replacement. First
 parse() must be called before page objects can be retrieved, e.g.
 getPDDocument().
 
 This class is a much enhanced version of QuickParser presented
 in PDFBOX-1104 by
 Jeremy Villalobos.| Modifier and Type | Field and Description | 
|---|---|
| protected static int | DEFAULT_TRAIL_BYTECOUNT | 
| protected static char[] | EOF_MARKEREOF-marker. | 
| protected static char[] | OBJ_MARKERobj-marker. | 
| protected SecurityHandler | securityHandlerThe security handler. | 
| protected static char[] | STARTXREF_MARKERStartXRef-marker. | 
| static String | SYSPROP_EOFLOOKUPRANGE | 
| static String | SYSPROP_PARSEMINIMAL | 
| static String | TMP_FILE_PREFIX | 
isFDFDocment, xrefTrailerResolverDEF, document, ENDOBJ, ENDSTREAM, forceParsing, pdfSource, PROP_PUSHBACK_SIZE| Constructor and Description | 
|---|
| NonSequentialPDFParser(File file,
                      RandomAccess raBuf)Constructs parser for given file using given buffer for temporary
 storage. | 
| NonSequentialPDFParser(File file,
                      RandomAccess raBuf,
                      String decryptionPassword)Constructs parser for given file using given buffer for temporary
 storage. | 
| NonSequentialPDFParser(InputStream input)Constructor. | 
| NonSequentialPDFParser(InputStream input,
                      RandomAccess raBuf,
                      String decryptionPassword)Constructor. | 
| NonSequentialPDFParser(String filename)Constructs parser for given file using memory buffer. | 
| Modifier and Type | Method and Description | 
|---|---|
| protected void | decrypt(COSBase pb,
       int objNr,
       int objGenNr)Decrypts given object. | 
| protected void | decryptDictionary(COSDictionary dict,
                 long objNr,
                 long objGenNr) | 
| protected void | decryptString(COSString str,
             long objNr,
             long objGenNr)Decrypts given COSString. | 
| protected void | deleteTempFile()Remove the temporary file. | 
| PDPage | getPage(int pageNr)Returns the page requested with all the objects loaded into it. | 
| int | getPageNumber()Returns the number of pages in a document. | 
| PDDocument | getPDDocument()This will get the PD document that was parsed. | 
| protected File | getPdfFile()Return the pdf file. | 
| SecurityHandler | getSecurityHandler()Returns security handler of the document or  nullif document
 is not encrypted orparse()wasn't called before. | 
| protected long | getStartxrefOffset()Looks for and parses startxref. | 
| protected void | initialParse()The initial parse will first parse only the trailer, the xrefstart and
 all xref tables to have a pointer (offset) to all the pdf's objects. | 
| boolean | isLenient()Return true if parser is lenient. | 
| protected int | lastIndexOf(char[] pattern,
           byte[] buf,
           int endOff)Searches last appearance of pattern within buffer. | 
| void | parse()This will parse the stream and populate the COSDocument object. | 
| protected COSStream | parseCOSStream(COSDictionary dic,
              RandomAccess file)This will read a COSStream from the input stream using length attribute
 within dictionary. | 
| protected COSBase | parseObjectDynamically(COSObject obj,
                      boolean requireExistingNotCompressedObj)This will parse the next object from the stream and add it to the local
 state. | 
| protected COSBase | parseObjectDynamically(int objNr,
                      int objGenNr,
                      boolean requireExistingNotCompressedObj)This will parse the next object from the stream and add it to the local
 state. | 
| protected void | readPattern(char[] pattern)Reads given pattern from  BaseParser.pdfSource. | 
| protected void | releasePdfSourceInputStream()Enable handling of alternative pdfSource implementation. | 
| void | setEOFLookupRange(int byteCount)Sets how many trailing bytes of PDF file are searched for EOF marker and
 'startxref' marker. | 
| void | setLenient(boolean lenient)Change the parser leniency flag. | 
| protected void | setPdfSource(long fileOffset)Sets  BaseParser.pdfSourceto start next parsing at given file offset. | 
clearResources, getDocument, getFDFDocument, isContinueOnError, parseHeader, parseStartXref, parseTrailer, parseXrefStream, parseXrefStream, parseXrefTable, readVersionInTrailer, setTempDirectoryisClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseCOSString, parseDirObject, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, readUntilEndStream, setDocument, skipSpacespublic static final String SYSPROP_PARSEMINIMAL
public static final String SYSPROP_EOFLOOKUPRANGE
protected static final int DEFAULT_TRAIL_BYTECOUNT
protected static final char[] EOF_MARKER
protected static final char[] STARTXREF_MARKER
protected static final char[] OBJ_MARKER
protected SecurityHandler securityHandler
public static final String TMP_FILE_PREFIX
public NonSequentialPDFParser(String filename) throws IOException
filename - the filename of the pdf to be parsedIOException - If something went wrong.public NonSequentialPDFParser(File file, RandomAccess raBuf) throws IOException
file - the pdf to be parsedraBuf - the buffer to be used for parsingIOException - If something went wrong.public NonSequentialPDFParser(File file, RandomAccess raBuf, String decryptionPassword) throws IOException
file - the pdf to be parsedraBuf - the buffer to be used for parsingdecryptionPassword - password to be used for decryptionIOException - If something went wrong.public NonSequentialPDFParser(InputStream input) throws IOException
input - input stream representing the pdf.IOException - If something went wrong.public NonSequentialPDFParser(InputStream input, RandomAccess raBuf, String decryptionPassword) throws IOException
input - input stream representing the pdf.raBuf - the buffer to be used for parsingdecryptionPassword - password to be used for decryption.IOException - If something went wrong.public void setEOFLookupRange(int byteCount)
DEFAULT_TRAIL_BYTECOUNT.
 
 
In case system property SYSPROP_EOFLOOKUPRANGE is defined
 this value will be set on initialization but can be overwritten
 later.
byteCount - number of trailing bytesprotected void initialParse()
                     throws IOException
IOException - If something went wrong.protected final void setPdfSource(long fileOffset)
                           throws IOException
BaseParser.pdfSource to start next parsing at given file offset.fileOffset - file offsetIOException - If something went wrong.protected final void releasePdfSourceInputStream()
                                          throws IOException
IOException - If something went wrong.protected final long getStartxrefOffset()
                                 throws IOException
DEFAULT_TRAIL_BYTECOUNT bytes (or range set via
 setEOFLookupRange(int)) and go back to find
 startxref.IOException - If something went wrong.protected int lastIndexOf(char[] pattern,
              byte[] buf,
              int endOff)
pattern - pattern to search forbuf - buffer to search pattern inendOff - offset (exclusive) where lookup starts at-1 if
         pattern could not be foundprotected final void readPattern(char[] pattern)
                          throws IOException
BaseParser.pdfSource. Skipping whitespace at start
 and end.pattern - pattern to be skippedIOException - if pattern could not be readpublic void parse()
           throws IOException
parse in class PDFParserIOException - If there is an error reading from the stream or corrupt data
 is found.protected File getPdfFile()
public boolean isLenient()
public void setLenient(boolean lenient)
                throws IllegalArgumentException
lenient - IllegalArgumentException - if the method is called after parsing.protected void deleteTempFile()
public SecurityHandler getSecurityHandler()
null if document
 is not encrypted or parse() wasn't called before.public PDDocument getPDDocument() throws IOException
getPDDocument in class PDFParserIOException - If there is an error getting the document.public int getPageNumber()
                  throws IOException
IOException - if PAGES or other needed object is missingpublic PDPage getPage(int pageNr) throws IOException
pageNr - starts from 0 to the number of pages.IOException - If something went wrong.protected final COSBase parseObjectDynamically(COSObject obj, boolean requireExistingNotCompressedObj) throws IOException
PDFParser and reduced to parsing an
 indirect object.obj - object to be parsed (we only take object number and generation
            number for lookup start offset)requireExistingNotCompressedObj - if true object to be
            parsed must not be contained within compressed streamIOException - If an IO error occurs.protected COSBase parseObjectDynamically(int objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws IOException
PDFParser and reduced to parsing an
 indirect object.objNr - object number of object to be parsedobjGenNr - object generation number of object to be parsedrequireExistingNotCompressedObj - if true the object to
            be parsed must be defined in xref (comment: null objects may
            be missing from xref) and it must not be a compressed object
            within object stream (this is used to circumvent being stuck
            in a loop in a malicious PDF)IOException - If an IO error occurs.protected final void decryptDictionary(COSDictionary dict, long objNr, long objGenNr) throws IOException
dict - the dictionary to be decryptedobjNr - the object numberobjGenNr - the object generation numberIOException - ff something went wrongprotected final void decryptString(COSString str, long objNr, long objGenNr) throws IOException
str - the string to be decryptedobjNr - the object numberobjGenNr - the object generation numberIOException - ff something went wrongprotected final void decrypt(COSBase pb, int objNr, int objGenNr) throws IOException
pb - the object to be decryptedobjNr - the object numberobjGenNr - the object generation numberIOException - ff something went wrongprotected COSStream parseCOSStream(COSDictionary dic, RandomAccess file) throws IOException
parseCOSStream in class BaseParserdic - dictionary that goes with this stream.file - file to write the stream to when reading.IOException - if an error occurred reading the stream, like
             problems with reading length attribute, stream does not end
             with 'endstream' after data read, stream too short etc.Copyright © 2002–2016 The Apache Software Foundation. All rights reserved.