PDFParser (PDFBox-0.7.3 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.pdfbox.pdfparser
Class PDFParser

java.lang.Object
  org.pdfbox.pdfparser.BaseParser
      org.pdfbox.pdfparser.PDFParser

public class PDFParser
extends BaseParser

This class will handle the parsing of the PDF document.

Version:: $Revision: 1.53 $
Author:: Ben Litchfield

Field Summary

Fields inherited from class org.pdfbox.pdfparser.BaseParser

DEF, ENDSTREAM, pdfSource

Constructor Summary
`PDFParser(InputStream input)` Constructor.
`PDFParser(InputStream input, RandomAccess rafi)` Constructor to allow control over RandomAccessFile.

Method Summary
`COSDocument`	`getDocument()` This will get the document that was parsed.
`FDFDocument`	`getFDFDocument()` This will get the FDF document that was parsed.
`PDDocument`	`getPDDocument()` This will get the PD document that was parsed.
`void`	`parse()` This will prase the stream and create the PDF document.
`protected PDFXref`	`parseXrefSection()` This will parse the xref table and trailers from the stream.
`protected void`	`parseXrefTable(int[] params)` This will parse the xref table from the stream.
`void`	`setTempDirectory(File tmpDir)` This is the directory where pdfbox will create a temporary file for storing pdf document stream in.
`protected void`	`skipHeaderFillBytes()` This will skip a header's binary fill bytes.

Methods inherited from class org.pdfbox.pdfparser.BaseParser

addXref, getXrefs, isClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSStream, parseCOSString, parseDirObject, readExpectedString, readInt, readLine, readString, readString, setDocument, skipSpaces

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

PDFParser

public PDFParser(InputStream input)
          throws IOException

Constructor.
Parameters:: input - The input stream that contains the PDF document.
Throws:: IOException - If there is an error initializing the stream.

PDFParser

public PDFParser(InputStream input,
                 RandomAccess rafi)
          throws IOException

Constructor to allow control over RandomAccessFile.
Parameters:: input - The input stream that contains the PDF document.; rafi - The RandomAccessFile to be used in internal COSDocument
Throws:: IOException - If there is an error initializing the stream.

Method Detail

setTempDirectory

public void setTempDirectory(File tmpDir)

This is the directory where pdfbox will create a temporary file for storing pdf document stream in. By default this directory will be the value of the system property java.io.tmpdir.

Parameters:: tmpDir - The directory to create scratch files needed to store pdf document streams.

parse

public void parse()
           throws IOException

This will prase the stream and create the PDF document. This will close the stream when it is done parsing.

Throws:: IOException - If there is an error reading from the stream.

skipHeaderFillBytes

protected void skipHeaderFillBytes()
                            throws IOException

This will skip a header's binary fill bytes. This is in accordance to PDF Specification 1.5 pg 68 section 3.4.1 "Syntax.File Structure.File Header"

Throws:: IOException - If there is an error reading from the stream.

getDocument

public COSDocument getDocument()
                        throws IOException

This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.

Returns:: The document that was parsed.
Throws:: IOException - If there is an error getting the document.

getPDDocument

public PDDocument getPDDocument()
                         throws IOException

This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.

Returns:: The document at the PD layer.
Throws:: IOException - If there is an error getting the document.

getFDFDocument

public FDFDocument getFDFDocument()
                           throws IOException

This will get the FDF document that was parsed. When you are done with this document you must call close() on it to release resources.

Returns:: The document at the PD layer.
Throws:: IOException - If there is an error getting the document.

parseXrefSection

protected PDFXref parseXrefSection()
                            throws IOException

This will parse the xref table and trailers from the stream.

Returns:: a new PDFXref
Throws:: IOException - If an IO error occurs.

parseXrefTable

protected void parseXrefTable(int[] params)
                       throws IOException

This will parse the xref table from the stream. It stores the starting object number and the count

Parameters:: params - The start and count parameters
Throws:: IOException - If an IO error occurs.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.pdfbox.pdfparser Class PDFParser

PDFParser

PDFParser

setTempDirectory

parse

skipHeaderFillBytes

getDocument

getPDDocument

getFDFDocument

parseXrefSection

parseXrefTable

org.pdfbox.pdfparser
Class PDFParser