public class PreflightParser extends PDFParser
Modifier and Type | Field and Description |
---|---|
protected PreflightContext |
ctx |
protected DataSource |
dataSource |
static Charset |
encoding
Define a one byte encoding that hasn't specific encoding in UTF-8 charset.
|
protected PreflightDocument |
preflightDocument |
protected ValidationResult |
validationResult |
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, source, SYSPROP_EOFLOOKUPRANGE, SYSPROP_PARSEMINIMAL, TMP_FILE_PREFIX, xrefTrailerResolver
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, N, O, R, S, seqSource, STREAM_STRING, T
Constructor and Description |
---|
PreflightParser(DataSource dataSource)
Constructor.
|
PreflightParser(File file)
Constructor.
|
PreflightParser(String filename)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
addValidationError(ValidationResult.ValidationError error)
Add the error to the ValidationResult.
|
protected void |
addValidationErrors(List<ValidationResult.ValidationError> errors) |
protected void |
checkEndstreamKeyWord()
'endstream' must be preceded by an EOL
|
protected void |
checkPdfHeader()
Check that the PDF header match rules of the PDF/A specification.
|
protected void |
checkStreamKeyWord()
'stream' must be followed by <CR><LF> or only <LF>
|
protected void |
createContext()
Create a validation context.
|
protected void |
createPdfADocument(Format format,
PreflightConfiguration config) |
protected static ValidationResult |
createUnknownErrorResult()
Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)
|
PDDocument |
getPDDocument()
This will get the PD document that was parsed.
|
PreflightDocument |
getPreflightDocument() |
protected void |
initialParse()
The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset)
to all the pdf's objects.
|
protected int |
lastIndexOf(char[] pattern,
byte[] buf,
int endOff)
Searches last appearance of pattern within buffer.
|
void |
parse()
This will parse the stream and populate the COSDocument object.
|
void |
parse(Format format)
Parse the given file and check if it is a confirming file according to the given format.
|
void |
parse(Format format,
PreflightConfiguration config)
Parse the given file and check if it is a confirming file according to the given format.
|
protected COSArray |
parseCOSArray()
This will parse a PDF array object.
|
protected COSName |
parseCOSName()
This will parse a PDF name from the stream.
|
protected COSStream |
parseCOSStream(COSDictionary dic)
Wraps the
COSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary) to check rules on 'stream' and 'endstream'
keywords. |
protected COSString |
parseCOSString()
Check that the hexa string contains only an even number of
Hexadecimal characters.
|
protected COSBase |
parseDirObject()
Call
BaseParser.parseDirObject() check limit range for Float, Integer and number of
Dictionary entries. |
protected COSBase |
parseObjectDynamically(long objNr,
int objGenNr,
boolean requireExistingNotCompressedObj)
This will parse the next object from the stream and add it to the local state.
|
protected boolean |
parseXrefTable(long startByteOffset)
Same method than the COSParser.parseXrefTable(long) with additional controls : -
EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white
space as separator - and so on
|
getDocument, getStartxrefOffset, isLenient, parseDictObjects, parseFDFHeader, parseObjectDynamically, parsePDFHeader, parseTrailerValuesDynamically, parseXref, rebuildTrailer, setEOFLookupRange, setLenient
isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseBoolean, parseCOSDictionary, readExpectedChar, readExpectedString, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpaces
public static final Charset encoding
protected DataSource dataSource
protected ValidationResult validationResult
protected PreflightDocument preflightDocument
protected PreflightContext ctx
public PreflightParser(File file) throws IOException
file
- IOException
- if there is a reading error.public PreflightParser(String filename) throws IOException
filename
- IOException
- if there is a reading error.public PreflightParser(DataSource dataSource) throws IOException
dataSource
- the datasourceIOException
- if there is a reading error.protected static ValidationResult createUnknownErrorResult()
protected void addValidationError(ValidationResult.ValidationError error)
error
- protected void addValidationErrors(List<ValidationResult.ValidationError> errors)
public void parse() throws IOException
PDFParser
parse
in class PDFParser
IOException
- If there is an error reading from the stream or corrupt data
is found.public void parse(Format format) throws IOException
format
- format that the document should follow (default Format.PDF_A1B
)IOException
public void parse(Format format, PreflightConfiguration config) throws IOException
format
- format that the document should follow (default Format.PDF_A1B
)config
- Configuration bean that will be used by the PreflightDocument. If null the format is used to determine
the default configuration.IOException
protected void createPdfADocument(Format format, PreflightConfiguration config) throws IOException
IOException
protected void createContext()
public PDDocument getPDDocument() throws IOException
PDFParser
getPDDocument
in class PDFParser
IOException
- If there is an error getting the document.public PreflightDocument getPreflightDocument() throws IOException
IOException
protected void initialParse() throws IOException
PDFParser
initialParse
in class PDFParser
IOException
- If something went wrong.protected void checkPdfHeader()
protected boolean parseXrefTable(long startByteOffset) throws IOException
parseXrefTable
in class COSParser
startByteOffset
- the offset to start atIOException
- If an IO error occurs.protected COSStream parseCOSStream(COSDictionary dic) throws IOException
COSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)
to check rules on 'stream' and 'endstream'
keywords. checkStreamKeyWord()
and checkEndstreamKeyWord()
parseCOSStream
in class COSParser
dic
- dictionary that goes with this stream.IOException
- if an error occurred reading the stream, like problems with reading
length attribute, stream does not end with 'endstream' after data read, stream too short etc.protected void checkStreamKeyWord() throws IOException
IOException
protected void checkEndstreamKeyWord() throws IOException
IOException
protected COSArray parseCOSArray() throws IOException
BaseParser
parseCOSArray
in class BaseParser
IOException
- If there is an error parsing the stream.protected COSName parseCOSName() throws IOException
BaseParser
parseCOSName
in class BaseParser
IOException
- If there is an error reading from the stream.protected COSString parseCOSString() throws IOException
BaseParser.parseCOSString()
parseCOSString
in class BaseParser
IOException
- If there is an error reading from the stream.protected COSBase parseDirObject() throws IOException
BaseParser.parseDirObject()
check limit range for Float, Integer and number of
Dictionary entries.parseDirObject
in class BaseParser
IOException
- if there is an error during parsing.protected COSBase parseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws IOException
COSParser
parseObjectDynamically
in class COSParser
objNr
- object number of object to be parsedobjGenNr
- object generation number of object to be parsedrequireExistingNotCompressedObj
- if true
the object to be parsed must be defined in xref
(comment: null objects may be missing from xref) and it must not be a compressed object within object stream
(this is used to circumvent being stuck in a loop in a malicious PDF)IOException
- If an IO error occurs.protected int lastIndexOf(char[] pattern, byte[] buf, int endOff)
COSParser
lastIndexOf
in class COSParser
pattern
- pattern to search forbuf
- buffer to search pattern inendOff
- offset (exclusive) where lookup starts at-1
if pattern could not be foundCopyright © 2002–2016 The Apache Software Foundation. All rights reserved.