PDFBox consists of three related components and depends on a few external libraries. This page describes what these libraries are and how to include them in your application.
These components are needed during runtime, development and testing dependent on the details below.
The three PDFBox components are named
jempbox. The Maven groupId of all PDFBox components is org.apache.pdfbox.
The main PDFBox component, pdfbox, has a hard dependency on the commons-logging library. Commons Logging is a generic wrapper around different logging frameworks, so you’ll either need to also use a logging library like log4j or let commons-logging fall back to the standard java.util.logging API included in the Java platform.
For PDFBox Preflight only commons-io 1.4 is needed.
For font handling the fontbox component is needed.
To support XMP metadata the jembox component is needed.
To add the pdfbox, fontbox, jempbox and commons-logging jars to your application, the easiest thing is to declare the Maven dependency shown below. This gives you the main pdfbox library directly and the other required jars as transitive dependencies.
<dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>...</version> </dependency>
Set the version field to the latest stable PDFBox version.
Some features in PDFBox depend on optional external libraries. You can enable these features simply by including the required libraries in the classpath of your application.
To support JBIG2 and writing TIFF images additional libraries are needed.
The image plugins described below are not part of the PDFBox distribution because of incompatible licensing terms. Please make sure to check if the licensing terms are compatible to your usage.
To write TIFF images a JAI ImageIO Core library will be needed.
The most notable such optional feature is support for PDF encryption. Instead of implementing its own encryption algorithms, PDFBox uses libraries from the Legion of the Bouncy Castle. Both the bcprov and bcmail libraries are needed and can be included using the Maven dependencies shown below.
<dependency> <groupId>org.bouncycastle</groupId> <artifactId>bcprov-jdk15</artifactId> <version>1.44</version> </dependency> <dependency> <groupId>org.bouncycastle</groupId> <artifactId>bcmail-jdk15</artifactId> <version>1.44</version> </dependency>
Another important optional feature is support for bidirectional languages like Arabic. PDFBox uses the ICU4J library from the International Components for Unicode (ICU) project to support such languages in PDF documents. To add the ICU4J jar to your project, use the following Maven dependency.
<dependency> <groupId>com.ibm.icu</groupId> <artifactId>icu4j</artifactId> <version>3.8</version> </dependency>
PDFBox also contains extra support for use with the Lucene and Ant projects. Since in these cases PDFBox is just an add-on feature to these projects, you should first set up your application to use Lucene or Ant and then add PDFBox support as described on this page.
The above instructions expect that you’re using Maven or another build tool like Ivy that supports Maven dependencies. If you instead use tools like Ant where you need to explicitly include all the required library jars in your application, you’ll need to do something different.
The easiest approach is to run
mvn dependency:copy-dependencies inside the pdfbox directory of the latest PDFBox source release. This will copy all the required and optional
libraries discussed above into the pdfbox/target/dependencies directory. You can then simply copy all the libraries you need from this directory to your application.