PDFBox consists of a three related components and depends on a few external libraries. This page describes what these libraries are and how to include them in your application.
The three PDFBox components are named pdfbox, fontbox and jempbox. The Maven groupId of all PDFBox components is org.apache.pdfbox.
The fontbox and jempbox components are standalone libraries for handling font information and XMP metadata. These components have no external dependencies and can be used simply by adding the respective jar files to the classpath of your application.
The main PDFBox component, pdfbox, has hard dependencies on the fontbox and jempbox components and the commons-logging library. Commons Logging is a generic wrapper around different logging frameworks, so you'll either need to also use a logging library like log4j or let commons-logging fall back to the standard java.util.logging API included in the Java platform.
To add the pdfbox, fontbox, jempbox and commons-logging jars to your application, the easiest thing is to declare the Maven dependency shown below. This gives you the main pdfbox library directly and the other required jars as transitive dependencies.
<dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>...</version> </dependency>
Set the version field to the latest stable PDFBox version.
Some features in PDFBox depend on optional external libraries. You can enable these features simply by including the required libraries in the classpath of your application.
The most notable such optional feature is support for PDF encryption. Instead of implementing its own encryption algorithms, PDFBox uses libraries from the Legion of the Bouncy Castle. Both the bcprov and bcmail libraries are needed and can be included using the Maven dependencies shown below.
<dependency> <groupId>org.bouncycastle</groupId> <artifactId>bcprov-jdk15</artifactId> <version>1.44</version> </dependency> <dependency> <groupId>org.bouncycastle</groupId> <artifactId>bcmail-jdk15</artifactId> <version>1.44</version> </dependency>
Another important optional feature is support for bidirectional languages like Arabic. PDFBox uses the ICU4J library from the International Components for Unicode (ICU) project to support such languages in PDF documents. To add the ICU4J jar to your project, use the following Maven dependency.
<dependency> <groupId>com.ibm.icu</groupId> <artifactId>icu4j</artifactId> <version>3.8</version> </dependency>
PDFBox also contains extra support for use with the Lucene and Ant projects. Since in these cases PDFBox is just an add-on feature to these projects, you should first set up your application to use Lucene or Ant and then add PDFBox support as described on this page.
The above instructions expect that you're using Maven or another build tool like Ivy that supports Maven dependencies. If you instead use tools like Ant where you need to explicitly include all the required library jars in your application, you'll need to do something different.
The easiest approach is to run
mvn dependency:copy-dependencies inside the pdfbox directory of the latest PDFBox source release. This will copy all the required and optional
libraries discussed above into the pdfbox/target/dependencies directory. You can then simply copy all the libraries you need from this directory to your application.