org.pdfbox.util
Class PDFTextStripperByArea

java.lang.Object
  extended byorg.pdfbox.util.PDFStreamEngine
      extended byorg.pdfbox.util.PDFTextStripper
          extended byorg.pdfbox.util.PDFTextStripperByArea

public class PDFTextStripperByArea
extends PDFTextStripper

This will extract text from a specified region in the PDF.

Version:
$Revision: 1.5 $
Author:
Ben Litchfield

Field Summary
 
Fields inherited from class org.pdfbox.util.PDFTextStripper
charactersByArticle, output
 
Constructor Summary
PDFTextStripperByArea()
          Constructor.
 
Method Summary
 void addRegion(String regionName, Rectangle2D rect)
          Add a new region to group text by.
 void extractRegions(PDPage page)
          Process the page to extract the region text.
protected  void flushText()
          This will print the text to the output stream.
 List getRegions()
          Get the list of regions that have been setup.
 String getTextForRegion(String regionName)
          Get the text for the region, this should be called after extractRegions().
protected  void showCharacter(TextPosition text)
          This will show add a character to the list of characters to be printed to the text file.
 
Methods inherited from class org.pdfbox.util.PDFTextStripper
endDocument, endPage, endParagraph, getCharactersByArticle, getCurrentPageNo, getEndBookmark, getEndPage, getLineSeparator, getOutput, getPageSeparator, getStartBookmark, getStartPage, getText, getText, getWordSeparator, processPage, processPages, setEndBookmark, setEndPage, setLineSeparator, setPageSeparator, setShouldSeparateByBeads, setSortByPosition, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, shouldSeparateByBeads, shouldSortByPosition, shouldSuppressDuplicateOverlappingText, startDocument, startPage, startParagraph, writeCharacters, writeText, writeText
 
Methods inherited from class org.pdfbox.util.PDFStreamEngine
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDFTextStripperByArea

public PDFTextStripperByArea()
                      throws IOException
Constructor.

Throws:
IOException - If there is an error loading properties.
Method Detail

addRegion

public void addRegion(String regionName,
                      Rectangle2D rect)
Add a new region to group text by.

Parameters:
regionName - The name of the region.
rect - The rectangle area to retrieve the text from.

getRegions

public List getRegions()
Get the list of regions that have been setup.

Returns:
A list of java.lang.String objects to identify the region names.

getTextForRegion

public String getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions().

Parameters:
regionName - The name of the region to get the text from.
Returns:
The text that was identified in that region.

extractRegions

public void extractRegions(PDPage page)
                    throws IOException
Process the page to extract the region text.

Parameters:
page - The page to extract the regions from.
Throws:
IOException - If there is an error while extracting text.

showCharacter

protected void showCharacter(TextPosition text)
This will show add a character to the list of characters to be printed to the text file.

Overrides:
showCharacter in class PDFTextStripper
Parameters:
text - The description of the character to display.

flushText

protected void flushText()
                  throws IOException
This will print the text to the output stream.

Overrides:
flushText in class PDFTextStripper
Throws:
IOException - If there is an error writing the text.