|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.pdfbox.searchengine.lucene.LucenePDFDocument
This class is used to create a document for the lucene search engine. This should easily plug into the IndexHTML or IndexFiles that comes with the lucene project. This class will populate the following fields.
Lucene Field Name | Description |
---|---|
path | File system path if loaded from a file |
url | URL to PDF document |
contents | Entire contents of PDF document, indexed but not stored |
summary | First 500 characters of content |
modified | The modified date/time according to the url or path |
uid | A unique identifier for the Lucene document. |
CreationDate | From PDF meta-data if available |
Creator | From PDF meta-data if available |
Keywords | From PDF meta-data if available |
ModificationDate | From PDF meta-data if available |
Producer | From PDF meta-data if available |
Subject | From PDF meta-data if available |
Trapped | From PDF meta-data if available |
Constructor Summary | |
LucenePDFDocument()
Constructor. |
Method Summary | |
Document |
convertDocument(File file)
This will take a reference to a PDF document and create a lucene document. |
Document |
convertDocument(InputStream is)
Convert the PDF stream to a lucene document. |
Document |
convertDocument(URL url)
Convert the document from a PDF to a lucene document. |
DateTools.Resolution |
getDateTimeResolution()
Get the Lucene data time resolution. |
static Document |
getDocument(File file)
This will get a lucene document from a PDF file. |
static Document |
getDocument(InputStream is)
This will get a lucene document from a PDF file. |
static Document |
getDocument(URL url)
This will get a lucene document from a PDF file. |
static void |
main(String[] args)
This will test creating a document. |
void |
setDateTimeResolution(DateTools.Resolution resolution)
Set the Lucene data time resolution. |
void |
setTextStripper(PDFTextStripper aStripper)
Set the text stripper that will be used during extraction. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public LucenePDFDocument()
Method Detail |
public void setTextStripper(PDFTextStripper aStripper)
aStripper
- The new pdf text stripper.public DateTools.Resolution getDateTimeResolution()
public void setDateTimeResolution(DateTools.Resolution resolution)
resolution
- set new date/time resolutionpublic Document convertDocument(InputStream is) throws IOException
is
- The input stream.
IOException
- If there is an error converting the PDF.public Document convertDocument(File file) throws IOException
file
- A reference to a PDF document.
IOException
- If there is an exception while converting the document.public Document convertDocument(URL url) throws IOException
url
- A url to a PDF document.
IOException
- If there is an error while converting the document.public static Document getDocument(InputStream is) throws IOException
is
- The stream to read the PDF from.
IOException
- If there is an error parsing or indexing the document.public static Document getDocument(File file) throws IOException
file
- The file to get the document for.
IOException
- If there is an error parsing or indexing the document.public static Document getDocument(URL url) throws IOException
url
- The file to get the document for.
IOException
- If there is an error parsing or indexing the document.public static void main(String[] args) throws IOException
args
- command line arguments.
IOException
- If there is an error.
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |