LucenePDFDocument (PDFBox reactor 2.0.8 API)

java.lang.Object
- org.apache.pdfbox.examples.lucene.LucenePDFDocument

public class LucenePDFDocument
extends Object

This class is used to create a document for the lucene search engine. This should easily plug into the IndexPDFFiles that comes with the lucene project. This class will populate the following fields.


Lucene Field Name	Description
path	File system path if loaded from a file
url	URL to PDF document
contents	Entire contents of PDF document, indexed but not stored
summary	First 500 characters of content
modified	The modified date/time according to the url or path
uid	A unique identifier for the Lucene document.
CreationDate	From PDF meta-data if available
Creator	From PDF meta-data if available
Keywords	From PDF meta-data if available
ModificationDate	From PDF meta-data if available
Producer	From PDF meta-data if available
Subject	From PDF meta-data if available
Trapped	From PDF meta-data if available

Author:: Ben Litchfield

Field Summary

Fields
Modifier and Type Field and Description

static org.apache.lucene.document.FieldType TYPE_STORED_NOT_INDEXED
not Indexed, tokenized, stored.

Fields
Modifier and Type	Field and Description
`static org.apache.lucene.document.FieldType`	`TYPE_STORED_NOT_INDEXED` not Indexed, tokenized, stored.

Constructor Summary

Constructors
Constructor and Description

LucenePDFDocument()
Constructor.

Constructors
Constructor and Description
`LucenePDFDocument()` Constructor.

Method Summary

Methods
Modifier and Type	Method and Description
`org.apache.lucene.document.Document`	`convertDocument(File file)` This will take a reference to a PDF document and create a lucene document.
`org.apache.lucene.document.Document`	`convertDocument(InputStream is)` Convert the PDF stream to a lucene document.
`org.apache.lucene.document.Document`	`convertDocument(URL url)` Convert the document from a PDF to a lucene document.
`static String`	`createUID(File file)` Create an UID for the given file.
`static String`	`createUID(URL url, long time)` Create an UID for the given file using the given time.
`static org.apache.lucene.document.Document`	`getDocument(File file)` This will get a lucene document from a PDF file.
`static org.apache.lucene.document.Document`	`getDocument(InputStream is)` This will get a lucene document from a PDF file.
`static org.apache.lucene.document.Document`	`getDocument(URL url)` This will get a lucene document from a PDF file.
`void`	`setTextStripper(PDFTextStripper aStripper)` Set the text stripper that will be used during extraction.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - TYPE_STORED_NOT_INDEXED
```
public static final org.apache.lucene.document.FieldType TYPE_STORED_NOT_INDEXED
```
    not Indexed, tokenized, stored.
- Constructor Detail
  - LucenePDFDocument
```
public LucenePDFDocument()
```
    Constructor.
- Method Detail
  - setTextStripper
```
public void setTextStripper(PDFTextStripper aStripper)
```
    Set the text stripper that will be used during extraction.
    
    Parameters:
    aStripper - The new pdf text stripper.
  - convertDocument
```
public org.apache.lucene.document.Document convertDocument(InputStream is)
                                                    throws IOException
```
    Convert the PDF stream to a lucene document.
    
    Parameters:
    is - The input stream.
    
    Returns:
    The input stream converted to a lucene document.
    
    Throws:
    
    IOException - If there is an error converting the PDF.
  - convertDocument
```
public org.apache.lucene.document.Document convertDocument(File file)
                                                    throws IOException
```
    This will take a reference to a PDF document and create a lucene document.
    
    Parameters:
    file - A reference to a PDF document.
    
    Returns:
    The converted lucene document.
    
    Throws:
    
    IOException - If there is an exception while converting the document.
  - convertDocument
```
public org.apache.lucene.document.Document convertDocument(URL url)
                                                    throws IOException
```
    Convert the document from a PDF to a lucene document.
    
    Parameters:
    url - A url to a PDF document.
    
    Returns:
    The PDF converted to a lucene document.
    
    Throws:
    
    IOException - If there is an error while converting the document.
  - getDocument
```
public static org.apache.lucene.document.Document getDocument(InputStream is)
                                                       throws IOException
```
    This will get a lucene document from a PDF file.
    
    Parameters:
    is - The stream to read the PDF from.
    
    Returns:
    The lucene document.
    
    Throws:
    
    IOException - If there is an error parsing or indexing the document.
  - getDocument
```
public static org.apache.lucene.document.Document getDocument(File file)
                                                       throws IOException
```
    This will get a lucene document from a PDF file.
    
    Parameters:
    file - The file to get the document for.
    
    Returns:
    The lucene document.
    
    Throws:
    
    IOException - If there is an error parsing or indexing the document.
  - getDocument
```
public static org.apache.lucene.document.Document getDocument(URL url)
                                                       throws IOException
```
    This will get a lucene document from a PDF file.
    
    Parameters:
    url - The file to get the document for.
    
    Returns:
    The lucene document.
    
    Throws:
    
    IOException - If there is an error parsing or indexing the document.
  - createUID
```
public static String createUID(URL url,
               long time)
```
    Create an UID for the given file using the given time.
    
    Parameters:
    url - the file we have to create an UID for
    time - the time to used to the UID
    
    Returns:
    the created UID
  - createUID
```
public static String createUID(File file)
```
    Create an UID for the given file.
    
    Parameters:
    file - the file we have to create an UID for
    
    Returns:
    the created UID

Class LucenePDFDocument

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

TYPE_STORED_NOT_INDEXED

Constructor Detail

LucenePDFDocument

Method Detail

setTextStripper

convertDocument

convertDocument

convertDocument

getDocument

getDocument

getDocument

createUID

createUID