public class COSParser extends BaseParser
PDFParser.parse() or  FDFParser.parse() must be called before page objects
 can be retrieved, e.g. PDFParser.getPDDocument().
 
 This class is a much enhanced version of QuickParser presented in PDFBOX-1104 by Jeremy Villalobos.| Modifier and Type | Field and Description | 
|---|---|
| protected static char[] | EOF_MARKEREOF-marker. | 
| protected long | fileLenfile length. | 
| protected boolean | initialParseDone | 
| protected static char[] | OBJ_MARKERobj-marker. | 
| protected SecurityHandler | securityHandlerThe security handler. | 
| protected RandomAccessRead | source | 
| static String | SYSPROP_EOFLOOKUPRANGEThe range within the %%EOF marker will be searched. | 
| static String | SYSPROP_PARSEMINIMALOnly parse the PDF file minimally allowing access to basic information. | 
| static String | TMP_FILE_PREFIXThe prefix for the temp file being used. | 
| protected XrefTrailerResolver | xrefTrailerResolverCollects all Xref/trailer objects and resolves them into single
 object using startxref reference. | 
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, N, O, R, S, seqSource, STREAM_STRING, T| Constructor and Description | 
|---|
| COSParser(RandomAccessRead source)Default constructor. | 
| Modifier and Type | Method and Description | 
|---|---|
| protected void | checkPages(COSDictionary root)Check if all entries of the pages dictionary are present. | 
| COSDocument | getDocument()This will get the document that was parsed. | 
| protected long | getStartxrefOffset()Looks for and parses startxref. | 
| protected boolean | isCatalog(COSDictionary dictionary)Tell if the dictionary is a PDF catalog. | 
| boolean | isLenient()Return true if parser is lenient. | 
| protected int | lastIndexOf(char[] pattern,
           byte[] buf,
           int endOff)Searches last appearance of pattern within buffer. | 
| protected COSStream | parseCOSStream(COSDictionary dic)This will read a COSStream from the input stream using length attribute within dictionary. | 
| protected void | parseDictObjects(COSDictionary dict,
                COSName... excludeObjects)Will parse every object necessary to load a single page from the pdf document. | 
| protected boolean | parseFDFHeader()Parse the header of a fdf. | 
| protected COSBase | parseObjectDynamically(COSObject obj,
                      boolean requireExistingNotCompressedObj)This will parse the next object from the stream and add it to the local state. | 
| protected COSBase | parseObjectDynamically(long objNr,
                      int objGenNr,
                      boolean requireExistingNotCompressedObj)This will parse the next object from the stream and add it to the local state. | 
| protected boolean | parsePDFHeader()Parse the header of a pdf. | 
| protected COSBase | parseTrailerValuesDynamically(COSDictionary trailer)Parse the values of the trailer dictionary and return the root object. | 
| protected COSDictionary | parseXref(long startXRefOffset)Parses cross reference tables. | 
| protected boolean | parseXrefTable(long startByteOffset)This will parse the xref table from the stream and add it to the state
 The XrefTable contents are ignored. | 
| protected COSDictionary | rebuildTrailer()Rebuild the trailer dictionary if startxref can't be found. | 
| protected COSDictionary | retrieveTrailer()Read the trailer information and provide a COSDictionary containing the trailer information. | 
| void | setEOFLookupRange(int byteCount)Sets how many trailing bytes of PDF file are searched for EOF marker and 'startxref' marker. | 
| void | setLenient(boolean lenient)Change the parser leniency flag. | 
isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedChar, readExpectedString, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpacesprotected final RandomAccessRead source
public static final String SYSPROP_PARSEMINIMAL
public static final String SYSPROP_EOFLOOKUPRANGE
protected static final char[] EOF_MARKER
protected static final char[] OBJ_MARKER
protected long fileLen
protected boolean initialParseDone
protected SecurityHandler securityHandler
protected XrefTrailerResolver xrefTrailerResolver
public static final String TMP_FILE_PREFIX
public COSParser(RandomAccessRead source)
public void setEOFLookupRange(int byteCount)
DEFAULT_TRAIL_BYTECOUNT.
 
 We check that new value is at least 16. However for practical use cases this value should not be lower than 1000; even 2000 was found to not be enough in some cases where some trailing garbage like HTML snippets followed the EOF marker.
 In case system property SYSPROP_EOFLOOKUPRANGE is defined this value will be set on initialization but
 can be overwritten later.
 
byteCount - number of trailing bytesprotected COSDictionary retrieveTrailer() throws IOException
IOException - if something went wrongprotected COSDictionary parseXref(long startXRefOffset) throws IOException
startXRefOffset - start offset of the first tableIOException - if something went wrongprotected final long getStartxrefOffset()
                                 throws IOException
DEFAULT_TRAIL_BYTECOUNT bytes (or range set via setEOFLookupRange(int)) and go back to find
 startxref.IOException - If something went wrong.protected int lastIndexOf(char[] pattern,
              byte[] buf,
              int endOff)
pattern - pattern to search forbuf - buffer to search pattern inendOff - offset (exclusive) where lookup starts at-1 if pattern could not be foundpublic boolean isLenient()
public void setLenient(boolean lenient)
lenient - try to handle malformed PDFs.protected void parseDictObjects(COSDictionary dict, COSName... excludeObjects) throws IOException
dict - the COSObject from the parent pages.excludeObjects - dictionary object reference entries with these names will not be parsedIOException - if something went wrongprotected final COSBase parseObjectDynamically(COSObject obj, boolean requireExistingNotCompressedObj) throws IOException
obj - object to be parsed (we only take object number and generation number for lookup start offset)requireExistingNotCompressedObj - if true object to be parsed must not be contained within
 compressed streamIOException - If an IO error occurs.protected COSBase parseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws IOException
objNr - object number of object to be parsedobjGenNr - object generation number of object to be parsedrequireExistingNotCompressedObj - if true the object to be parsed must be defined in xref
 (comment: null objects may be missing from xref) and it must not be a compressed object within object stream
 (this is used to circumvent being stuck in a loop in a malicious PDF)IOException - If an IO error occurs.protected COSStream parseCOSStream(COSDictionary dic) throws IOException
dic - dictionary that goes with this stream.IOException - if an error occurred reading the stream, like problems with reading
 length attribute, stream does not end with 'endstream' after data read, stream too short etc.protected final COSDictionary rebuildTrailer() throws IOException
IOException - if something went wrongprotected void checkPages(COSDictionary root)
root - the root dictionary of the pdfprotected boolean isCatalog(COSDictionary dictionary)
dictionary - protected boolean parsePDFHeader()
                          throws IOException
IOException - if something went wrongprotected boolean parseFDFHeader()
                          throws IOException
IOException - if something went wrongprotected boolean parseXrefTable(long startByteOffset)
                          throws IOException
startByteOffset - the offset to start atIOException - If an IO error occurs.public COSDocument getDocument() throws IOException
IOException - If there is an error getting the document.protected COSBase parseTrailerValuesDynamically(COSDictionary trailer) throws IOException
trailer - The trailer dictionary.IOException - If an IO error occurs or if the root object is
 missing in the trailer dictionary.Copyright © 2002–2017 The Apache Software Foundation. All rights reserved.