public class LucenePDFDocument extends Object
| Lucene Field Name | Description |
|---|---|
| path | File system path if loaded from a file |
| url | URL to PDF document |
| contents | Entire contents of PDF document, indexed but not stored |
| summary | First 500 characters of content |
| modified | The modified date/time according to the url or path |
| uid | A unique identifier for the Lucene document. |
| CreationDate | From PDF meta-data if available |
| Creator | From PDF meta-data if available |
| Keywords | From PDF meta-data if available |
| ModificationDate | From PDF meta-data if available |
| Producer | From PDF meta-data if available |
| Subject | From PDF meta-data if available |
| Trapped | From PDF meta-data if available |
| Modifier and Type | Field and Description |
|---|---|
static org.apache.lucene.document.FieldType |
TYPE_STORED_NOT_INDEXED
not Indexed, tokenized, stored.
|
| Constructor and Description |
|---|
LucenePDFDocument()
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
org.apache.lucene.document.Document |
convertDocument(File file)
This will take a reference to a PDF document and create a lucene document.
|
org.apache.lucene.document.Document |
convertDocument(InputStream is)
Convert the PDF stream to a lucene document.
|
org.apache.lucene.document.Document |
convertDocument(URL url)
Convert the document from a PDF to a lucene document.
|
static String |
createUID(File file)
Create an UID for the given file.
|
static String |
createUID(URL url,
long time)
Create an UID for the given file using the given time.
|
static org.apache.lucene.document.Document |
getDocument(File file)
This will get a lucene document from a PDF file.
|
static org.apache.lucene.document.Document |
getDocument(InputStream is)
This will get a lucene document from a PDF file.
|
static org.apache.lucene.document.Document |
getDocument(URL url)
This will get a lucene document from a PDF file.
|
void |
setTextStripper(PDFTextStripper aStripper)
Set the text stripper that will be used during extraction.
|
public static final org.apache.lucene.document.FieldType TYPE_STORED_NOT_INDEXED
public void setTextStripper(PDFTextStripper aStripper)
aStripper - The new pdf text stripper.public org.apache.lucene.document.Document convertDocument(InputStream is) throws IOException
is - The input stream.IOException - If there is an error converting the PDF.public org.apache.lucene.document.Document convertDocument(File file) throws IOException
file - A reference to a PDF document.IOException - If there is an exception while converting the document.public org.apache.lucene.document.Document convertDocument(URL url) throws IOException
url - A url to a PDF document.IOException - If there is an error while converting the document.public static org.apache.lucene.document.Document getDocument(InputStream is) throws IOException
is - The stream to read the PDF from.IOException - If there is an error parsing or indexing the document.public static org.apache.lucene.document.Document getDocument(File file) throws IOException
file - The file to get the document for.IOException - If there is an error parsing or indexing the document.public static org.apache.lucene.document.Document getDocument(URL url) throws IOException
url - The file to get the document for.IOException - If there is an error parsing or indexing the document.public static String createUID(URL url, long time)
url - the file we have to create an UID fortime - the time to used to the UIDCopyright © 2002–2017 The Apache Software Foundation. All rights reserved.