public class PreflightParser extends PDFParser
| Modifier and Type | Field and Description |
|---|---|
protected PreflightContext |
ctx |
protected DataSource |
dataSource |
static Charset |
encoding
Define a one byte encoding that hasn't specific encoding in UTF-8 charset.
|
protected PreflightDocument |
preflightDocument |
protected ValidationResult |
validationResult |
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, source, SYSPROP_EOFLOOKUPRANGE, SYSPROP_PARSEMINIMAL, TMP_FILE_PREFIX, xrefTrailerResolverA, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, N, O, R, S, seqSource, STREAM_STRING, T| Constructor and Description |
|---|
PreflightParser(DataSource dataSource)
Constructor.
|
PreflightParser(DataSource dataSource,
ScratchFile scratch)
Constructor.
|
PreflightParser(File file)
Constructor.
|
PreflightParser(File file,
ScratchFile scratch)
Constructor.
|
PreflightParser(String filename)
Constructor.
|
PreflightParser(String filename,
ScratchFile scratch)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
addValidationError(ValidationResult.ValidationError error)
Add the error to the ValidationResult.
|
protected void |
addValidationErrors(List<ValidationResult.ValidationError> errors) |
protected void |
checkEndstreamKeyWord()
'endstream' must be preceded by an EOL
|
protected void |
checkPdfHeader()
Check that the PDF header match rules of the PDF/A specification.
|
protected void |
checkStreamKeyWord()
'stream' must be followed by <CR><LF> or only <LF>
|
protected void |
createContext()
Create a validation context.
|
protected void |
createPdfADocument(Format format,
PreflightConfiguration config) |
protected static ValidationResult |
createUnknownErrorResult()
Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)
|
PDDocument |
getPDDocument()
This will get the PD document that was parsed.
|
PreflightDocument |
getPreflightDocument() |
protected void |
initialParse()
The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset)
to all the pdf's objects.
|
protected int |
lastIndexOf(char[] pattern,
byte[] buf,
int endOff)
Searches last appearance of pattern within buffer.
|
void |
parse()
This will parse the stream and populate the COSDocument object.
|
void |
parse(Format format)
Parse the given file and check if it is a confirming file according to the given format.
|
void |
parse(Format format,
PreflightConfiguration config)
Parse the given file and check if it is a confirming file according to the given format.
|
protected COSArray |
parseCOSArray()
This will parse a PDF array object.
|
protected COSName |
parseCOSName()
This will parse a PDF name from the stream.
|
protected COSStream |
parseCOSStream(COSDictionary dic)
Wraps the
COSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary) to check rules on 'stream' and 'endstream'
keywords. |
protected COSString |
parseCOSString()
Check that the hexa string contains only an even number of
Hexadecimal characters.
|
protected COSBase |
parseDirObject()
Call
BaseParser.parseDirObject() check limit range for Float, Integer and number of
Dictionary entries. |
protected COSBase |
parseObjectDynamically(long objNr,
int objGenNr,
boolean requireExistingNotCompressedObj)
This will parse the next object from the stream and add it to the local state.
|
protected boolean |
parseXrefTable(long startByteOffset)
Same method than the COSParser.parseXrefTable(long) with additional controls : -
EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white
space as separator - and so on
|
getDocument, getStartxrefOffset, isLenient, parseDictObjects, parseFDFHeader, parseObjectDynamically, parsePDFHeader, parseTrailerValuesDynamically, parseXref, rebuildTrailer, setEOFLookupRange, setLenientisClosing, isClosing, isDigit, isDigit, isEndOfName, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseBoolean, parseCOSDictionary, readExpectedChar, readExpectedString, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpacespublic static final Charset encoding
protected DataSource dataSource
protected ValidationResult validationResult
protected PreflightDocument preflightDocument
protected PreflightContext ctx
public PreflightParser(File file) throws IOException
file - IOException - if there is a reading error.public PreflightParser(File file, ScratchFile scratch) throws IOException
file - scratch - IOException - if there is a reading error.public PreflightParser(String filename) throws IOException
filename - IOException - if there is a reading error.public PreflightParser(String filename, ScratchFile scratch) throws IOException
filename - scratch - IOException - if there is a reading error.public PreflightParser(DataSource dataSource) throws IOException
dataSource - the datasourceIOException - if there is a reading error.public PreflightParser(DataSource dataSource, ScratchFile scratch) throws IOException
dataSource - the datasourcescratch - IOException - if there is a reading error.protected static ValidationResult createUnknownErrorResult()
protected void addValidationError(ValidationResult.ValidationError error)
error - protected void addValidationErrors(List<ValidationResult.ValidationError> errors)
public void parse()
throws IOException
PDFParserparse in class PDFParserInvalidPasswordException - If the password is incorrect.IOException - If there is an error reading from the stream or corrupt data
is found.public void parse(Format format) throws IOException
format - format that the document should follow (default Format.PDF_A1B)IOExceptionpublic void parse(Format format, PreflightConfiguration config) throws IOException
format - format that the document should follow (default Format.PDF_A1B)config - Configuration bean that will be used by the PreflightDocument. If null the format is used to determine
the default configuration.IOExceptionprotected void createPdfADocument(Format format, PreflightConfiguration config) throws IOException
IOExceptionprotected void createContext()
public PDDocument getPDDocument() throws IOException
PDFParsergetPDDocument in class PDFParserIOException - If there is an error getting the document.public PreflightDocument getPreflightDocument() throws IOException
IOExceptionprotected void initialParse()
throws IOException
PDFParserinitialParse in class PDFParserInvalidPasswordException - If the password is incorrect.IOException - If something went wrong.protected void checkPdfHeader()
protected boolean parseXrefTable(long startByteOffset)
throws IOException
parseXrefTable in class COSParserstartByteOffset - the offset to start atIOException - If an IO error occurs.protected COSStream parseCOSStream(COSDictionary dic) throws IOException
COSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary) to check rules on 'stream' and 'endstream'
keywords. checkStreamKeyWord() and checkEndstreamKeyWord()parseCOSStream in class COSParserdic - dictionary that goes with this stream.IOException - if an error occurred reading the stream, like problems with reading
length attribute, stream does not end with 'endstream' after data read, stream too short etc.protected void checkStreamKeyWord()
throws IOException
IOExceptionprotected void checkEndstreamKeyWord()
throws IOException
IOExceptionprotected COSArray parseCOSArray() throws IOException
BaseParserparseCOSArray in class BaseParserIOException - If there is an error parsing the stream.protected COSName parseCOSName() throws IOException
BaseParserparseCOSName in class BaseParserIOException - If there is an error reading from the stream.protected COSString parseCOSString() throws IOException
BaseParser.parseCOSString()parseCOSString in class BaseParserIOException - If there is an error reading from the stream.protected COSBase parseDirObject() throws IOException
BaseParser.parseDirObject() check limit range for Float, Integer and number of
Dictionary entries.parseDirObject in class BaseParserIOException - if there is an error during parsing.protected COSBase parseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws IOException
COSParserparseObjectDynamically in class COSParserobjNr - object number of object to be parsedobjGenNr - object generation number of object to be parsedrequireExistingNotCompressedObj - if true the object to be parsed must be defined in xref
(comment: null objects may be missing from xref) and it must not be a compressed object within object stream
(this is used to circumvent being stuck in a loop in a malicious PDF)IOException - If an IO error occurs.protected int lastIndexOf(char[] pattern,
byte[] buf,
int endOff)
COSParserlastIndexOf in class COSParserpattern - pattern to search forbuf - buffer to search pattern inendOff - offset (exclusive) where lookup starts at-1 if pattern could not be foundCopyright © 2002–2017 The Apache Software Foundation. All rights reserved.