org.apache.pdfbox.util
Class PDFStreamEngine

java.lang.Object
  extended by org.apache.pdfbox.util.PDFStreamEngine
Direct Known Subclasses:
PageDrawer, PDFImageWriter, PDFMarkedContentExtractor, PDFTextStripper, PrintImageLocations, Type3StreamParser

public class PDFStreamEngine
extends Object

This class will run through a PDF content stream and execute certain operations and provide a callback interface for clients that want to do things with the stream. See the PDFTextStripper class for an example of how to use this class.

Version:
$Revision: 1.38 $
Author:
Ben Litchfield

Constructor Summary
PDFStreamEngine()
          Constructor.
PDFStreamEngine(Properties properties)
          Constructor with engine properties.
 
Method Summary
 Map getColorSpaces()
           
 PDPage getCurrentPage()
          Get the current page that is being processed.
 Map getFonts()
           
 Stack getGraphicsStack()
           
 PDGraphicsState getGraphicsState()
           
 Map getGraphicsStates()
           
 PDResources getResources()
           
 Matrix getTextLineMatrix()
           
 Matrix getTextMatrix()
           
 int getTotalCharCnt()
          Get the total number of characters in the doc (including ones that could not be mapped).
 int getValidCharCnt()
          Get the total number of valid characters in the doc that could be decoded in processEncodedText().
 Map getXObjects()
           
protected  String inspectFontEncoding(String str)
          A method provided as an event interface to allow a subclass to perform some specific functionality on the string encoded by a glyph.
 void processEncodedText(byte[] string)
          Process encoded text from the PDF Stream.
protected  void processOperator(PDFOperator operator, List arguments)
          This is used to handle an operation.
 void processOperator(String operation, List arguments)
          This is used to handle an operation.
 void processStream(PDPage aPage, PDResources resources, COSStream cosStream)
          This will process the contents of the stream.
 void processSubStream(PDPage aPage, PDResources resources, COSStream cosStream)
          Process a sub stream of the current stream.
protected  void processTextPosition(TextPosition text)
          A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.
 void registerOperatorProcessor(String operator, OperatorProcessor op)
          Register a custom operator processor with the engine.
 void resetEngine()
          This method must be called between processing documents.
 void setColorSpaces(Map value)
           
 void setFonts(Map value)
           
 void setGraphicsStack(Stack value)
           
 void setGraphicsState(PDGraphicsState value)
           
 void setGraphicsStates(Map value)
           
 void setTextLineMatrix(Matrix value)
           
 void setTextMatrix(Matrix value)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDFStreamEngine

public PDFStreamEngine()
Constructor.


PDFStreamEngine

public PDFStreamEngine(Properties properties)
                throws IOException
Constructor with engine properties. The property keys are all PDF operators, the values are class names used to execute those operators. An empty value means that the operator will be silently ignored.

Parameters:
properties - The engine properties.
Throws:
IOException - If there is an error setting the engine properties.
Method Detail

registerOperatorProcessor

public void registerOperatorProcessor(String operator,
                                      OperatorProcessor op)
Register a custom operator processor with the engine.

Parameters:
operator - The operator as a string.
op - Processor instance.

resetEngine

public void resetEngine()
This method must be called between processing documents. The PDFStreamEngine caches information for the document between pages and this will release the cached information. This only needs to be called if processing a new document.


processStream

public void processStream(PDPage aPage,
                          PDResources resources,
                          COSStream cosStream)
                   throws IOException
This will process the contents of the stream.

Parameters:
aPage - The page.
resources - The location to retrieve resources.
cosStream - the Stream to execute.
Throws:
IOException - if there is an error accessing the stream.

processSubStream

public void processSubStream(PDPage aPage,
                             PDResources resources,
                             COSStream cosStream)
                      throws IOException
Process a sub stream of the current stream.

Parameters:
aPage - The page used for drawing.
resources - The resources used when processing the stream.
cosStream - The stream to process.
Throws:
IOException - If there is an exception while processing the stream.

processTextPosition

protected void processTextPosition(TextPosition text)
A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.

Parameters:
text - The text to be processed.

inspectFontEncoding

protected String inspectFontEncoding(String str)
A method provided as an event interface to allow a subclass to perform some specific functionality on the string encoded by a glyph.

Parameters:
str - The string to be processed.

processEncodedText

public void processEncodedText(byte[] string)
                        throws IOException
Process encoded text from the PDF Stream. You should override this method if you want to perform an action when encoded text is being processed.

Parameters:
string - The encoded text
Throws:
IOException - If there is an error processing the string

processOperator

public void processOperator(String operation,
                            List arguments)
                     throws IOException
This is used to handle an operation.

Parameters:
operation - The operation to perform.
arguments - The list of arguments.
Throws:
IOException - If there is an error processing the operation.

processOperator

protected void processOperator(PDFOperator operator,
                               List arguments)
                        throws IOException
This is used to handle an operation.

Parameters:
operator - The operation to perform.
arguments - The list of arguments.
Throws:
IOException - If there is an error processing the operation.

getColorSpaces

public Map getColorSpaces()
Returns:
Returns the colorSpaces.

getXObjects

public Map getXObjects()
Returns:
Returns the colorSpaces.

setColorSpaces

public void setColorSpaces(Map value)
Parameters:
value - The colorSpaces to set.

getFonts

public Map getFonts()
Returns:
Returns the fonts.

setFonts

public void setFonts(Map value)
Parameters:
value - The fonts to set.

getGraphicsStack

public Stack getGraphicsStack()
Returns:
Returns the graphicsStack.

setGraphicsStack

public void setGraphicsStack(Stack value)
Parameters:
value - The graphicsStack to set.

getGraphicsState

public PDGraphicsState getGraphicsState()
Returns:
Returns the graphicsState.

setGraphicsState

public void setGraphicsState(PDGraphicsState value)
Parameters:
value - The graphicsState to set.

getGraphicsStates

public Map getGraphicsStates()
Returns:
Returns the graphicsStates.

setGraphicsStates

public void setGraphicsStates(Map value)
Parameters:
value - The graphicsStates to set.

getTextLineMatrix

public Matrix getTextLineMatrix()
Returns:
Returns the textLineMatrix.

setTextLineMatrix

public void setTextLineMatrix(Matrix value)
Parameters:
value - The textLineMatrix to set.

getTextMatrix

public Matrix getTextMatrix()
Returns:
Returns the textMatrix.

setTextMatrix

public void setTextMatrix(Matrix value)
Parameters:
value - The textMatrix to set.

getResources

public PDResources getResources()
Returns:
Returns the resources.

getCurrentPage

public PDPage getCurrentPage()
Get the current page that is being processed.

Returns:
The page being processed.

getValidCharCnt

public int getValidCharCnt()
Get the total number of valid characters in the doc that could be decoded in processEncodedText().

Returns:
The number of valid characters.

getTotalCharCnt

public int getTotalCharCnt()
Get the total number of characters in the doc (including ones that could not be mapped).

Returns:
The number of characters.


Copyright © 2002-2010 The Apache Software Foundation. All Rights Reserved.