|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.pdfbox.util.PDFStreamEngine
org.pdfbox.util.PDFTextStripper
org.pdfbox.util.PDFText2HTML
Wrap stripped text in simple HTML, trying to form HTML paragraphs. Paragraphs broken by pages, columns, or figures are not mended.
Field Summary |
Fields inherited from class org.pdfbox.util.PDFTextStripper |
charactersByArticle, output |
Constructor Summary | |
PDFText2HTML()
Constructor. |
Method Summary | |
void |
endDocument(PDDocument pdf)
This method is available for subclasses of this class. It will be called after processing of the document finishes. |
protected void |
endParagraph()
Write out the paragraph separator. |
protected void |
flushText()
This will print the text to the output stream. |
protected String |
getTitleGuess()
The guess to the document title. |
protected TextPosition |
guessTitle(Iterator textIter)
This method will attempt to guess the title of the document. |
boolean |
isSuppressParagraphs()
|
void |
setSuppressParagraphs(boolean shouldSuppressParagraphs)
|
protected void |
startParagraph()
Write out the paragraph separator. |
protected void |
writeCharacters(TextPosition position)
Write the string to the output stream. |
protected void |
writeHeader()
Write the header to the output document. |
Methods inherited from class org.pdfbox.util.PDFStreamEngine |
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showString |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public PDFText2HTML() throws IOException
IOException
- If there is an error during initialization.Method Detail |
protected void writeHeader() throws IOException
IOException
- If there is a problem writing out the header to the document.protected String getTitleGuess()
protected void flushText() throws IOException
flushText
in class PDFTextStripper
IOException
- If there is an error writing the text.public void endDocument(PDDocument pdf) throws IOException
endDocument
in class PDFTextStripper
pdf
- The PDF document that is being processed.
IOException
- If an IO error occurs.protected TextPosition guessTitle(Iterator textIter)
textIter
- The characters on the first page.
protected void startParagraph() throws IOException
startParagraph
in class PDFTextStripper
IOException
- If there is an error writing to the stream.protected void endParagraph() throws IOException
endParagraph
in class PDFTextStripper
IOException
- If there is an error writing to the stream.protected void writeCharacters(TextPosition position) throws IOException
writeCharacters
in class PDFTextStripper
position
- The text to write to the stream.
IOException
- If there is an error when writing the text.public boolean isSuppressParagraphs()
public void setSuppressParagraphs(boolean shouldSuppressParagraphs)
shouldSuppressParagraphs
- The suppressParagraphs to set.
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |