|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.pdfbox.util.PDFStreamEngine
org.pdfbox.util.PDFTextStripper
org.pdfbox.util.PDFTextStripperByArea
This will extract text from a specified region in the PDF.
Field Summary |
Fields inherited from class org.pdfbox.util.PDFTextStripper |
charactersByArticle, output |
Constructor Summary | |
PDFTextStripperByArea()
Constructor. |
Method Summary | |
void |
addRegion(String regionName,
Rectangle2D rect)
Add a new region to group text by. |
void |
extractRegions(PDPage page)
Process the page to extract the region text. |
protected void |
flushText()
This will print the text to the output stream. |
List |
getRegions()
Get the list of regions that have been setup. |
String |
getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions(). |
protected void |
showCharacter(TextPosition text)
This will show add a character to the list of characters to be printed to the text file. |
Methods inherited from class org.pdfbox.util.PDFStreamEngine |
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showString |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public PDFTextStripperByArea() throws IOException
IOException
- If there is an error loading properties.Method Detail |
public void addRegion(String regionName, Rectangle2D rect)
regionName
- The name of the region.rect
- The rectangle area to retrieve the text from.public List getRegions()
public String getTextForRegion(String regionName)
regionName
- The name of the region to get the text from.
public void extractRegions(PDPage page) throws IOException
page
- The page to extract the regions from.
IOException
- If there is an error while extracting text.protected void showCharacter(TextPosition text)
showCharacter
in class PDFTextStripper
text
- The description of the character to display.protected void flushText() throws IOException
flushText
in class PDFTextStripper
IOException
- If there is an error writing the text.
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |