History of Changes

Version 0.7.3 (10/12/2006)
- Changes to the Code Base
- Changes to Documentation
Version 0.7.2 (09/11/2005)
- Changes to the Code Base
Version 0.7.1 (04/10/2005)
- Changes to the Code Base
Version 0.7.0 (1/22/2005)
- Changes to the Code Base
Version 0.6.7 (10/09/2004)
- Changes to the Code Base
Version 0.6.6 (07/20/2004)
- Changes to the Code Base
Version 0.6.5 (03/08/2004)
- Changes to the Code Base
Version 0.6.4 (11/02/2003)
- Changes to the Code Base
Version 0.6.3 (09/13/2003)
- Changes to the Code Base
Version 0.6.2 (4/18/2003)
- Changes to the Code Base
Version 0.6.1 (3/9/2003)
- Changes to the Code Base
Version 0.6.0 (3/5/2003)
- Changes to the Code Base
Version 0.5.6 (11/28/2002)
- Changes to the Code Base
Version 0.5.5 (10/03/2002)
- Changes to the Code Base
Version 0.5.4 (09/17/2002)
- Changes to the Code Base
Version 0.5.3 (09/13/2002)
- Changes to the Code Base
Version 0.5.2 (09/06/2002)
- Changes to the Code Base
Version 0.5.1 (09/04/2002)
- Changes to the Code Base
Version 0.5.0 (08/31/2002)
- Changes to the Code Base
Version 0.4.1 (07/25/2002)
- Changes to the Code Base
Version 0.4.0 (07/23/2002)
- Changes to the Code Base
Version 0.3.0 (07/09/2002)
- Changes to the Code Base
Version 0.2.0 (06/03/2002)
- Changes to the Code Base
Version 0.1.0 (05/25/2002)
- Changes to the Code Base

Version 0.7.3 (10/12/2006)

Changes to the Code Base

Upgraded to Checkstyle 4.2(BJL)
Upgraded to IKVM 0.30.0.0(BJL)
[ 1546399 ] Use get/set functions for separators in PDFTextStripper(BJL)
PDDocument.silentPrint() to print without prompting for a printer(BJL)
[ 1544118 ] Bug in PDFont.getCodeFromArray(BJL)
[ 1529835 ] Add COSFloat.setValue()(BJL)
[ 1492555 ] PDChoiceField dead loop(BJL)
[ 1499521 ] NPE PDAppearance.convertToMultiLine(BJL)
[ 1522007 ] Error converting date(BJL)
Upgraded to Lucene 2.0.0(BJL)
[ 1451164 ] Problems filling combo and radio form fields(BJL)
Upgraded to lucene 1.9.1(BJL)
[ 1023133 ] Support PDF Functions(BJL)
Added command line org.pdfbox.PDFMerger(BJL)
***API Change*** Promoted AppendDoc from example to util package, renamed to PDFMergerUtility.(BJL)
Upgraded to IKVM-0.24.0.1(BJL)
[ 1391952 ] Problem extracting embedded attachments(BJL)
[ 1249607 ] Fixed issue with broken PDFs that contain multiple endobj(BJL)
[ 1153174 ] Added documentation for PDFHighlighter(BJL)
Removed log4j dependency(BJL)
[ 974661 ] getKids() Null Pointer Exception when parsing pdf(BJL)
Added better support for CJK encoding(BJL)
Changed signature of PDFPageContentStream.drawImage to take float arguments instead of int(BJL)
Fixed issue where form xobjects where not being drawn in the viewer(BJL)
Changed signature to PDDocumentCatalog.OpenAction to be an PDDestinationOrAction instead of just action.(BJL)
Added tolerance to text extraction sorting where text on a line was not at the same exact y coordinate but very close(BJL)
[ 1327133 ] Printing with form data(BJL)
Fixed issue with DateConverter that was trying to parse an empty string(BJL)
[ 1324846 ] appending text to PDPageContentStream messes up fonts(BJL)
Added new example ReplaceURLs to show how to replace a clickable URL in a PDF(BJL)
Implemented annotation drawing(BJL)
Implemented EndPath and StrokeAndClosePath operators(BJL)
Move text extraction permission checking from PDFTextStripper to ExtractText(BJL)
Added support for more annotations, thanks to a contribution from Paul King(BJL)
Created new FontBox project to hold all font library code(BJL)
Fixed issue where only the first page was sent to the printer(BJL)
Now automatically sets the page orientation when printing(BJL)

Changes to Documentation

Upgraded to Apache Forrest 0.8-dev(BJL)

Version 0.7.2 (09/11/2005)

Changes to the Code Base

Upgraded to IKVM-0.20.0.0(BJL)
Added support to get annotations from a page and to create a RubberStamp annotation(BJL)
Added PDDocument.print() to send the PDF to a printer.(BJL)
[ 1276623 ] NullPointerException in PageDrawer:241 when extractin images(BJL)
Allow creation of PDJpeg from a BufferedImage, thanks to contribution from Paul King(BJL)
Removed PDTiff in favor of PDCcitt(BJL)
PDFBox no longer requires log4j!!(BJL)
New class to allow you to specify 'named' regions where text is to be extracted.(BJL)
[ 1261555 ] Unexpected end of ZLIB input stream when stream has a zero length(BJL)
[ 1226665 ] ImportXFDF giving NPE error(BJL)
renamed COSDictionary.setItem( String, boolean ) to COSDictionary.setBoolean( String, boolean )(BJL)
Added sorting parameter to PDFTextStripper(BJL)
Fixed issues with PDF encryption(BJL)
Better date support, added support for PDFs that use non standard dates, support for time zone offsets(BJL)
FlateFilter-class now supports PNG-Predictors for decoding the imagedata, thanks to a contribution from Marcel Kammer(BJL)
Added support for extracting tiff images, thanks to a contribution from Marcel Kammer(BJL)
Added PDDocument.removePage to remove PDF pages(BJL)
Fixed issue when creating a COSString with a UTF 16 string(BJL)
Committed patch for type 1 PFB font parser(special thanks to Michael Niedermair)(BJL)
Committed patch for PNG predictors (special thanks Erik Martino)(BJL)
[ 1227428 ] failure of getMediaBox(BJL)
[ 1227426 ] null pointer in PDFToImage(ColorModel is null)(BJL)
[ 1207113 ] Enhancement: runtime accessible version(BJL)
[ 1213320 ] setFfFlag() of PDField not working correctly(BJL)
[ 1215945 ] Error in COSString.writePDF() - fixed escape sequences(BJL)
[ 1198912 ] COSName with escaped characters not parsed correctly(BJL)
Fixed issue where resources were not being cleared in PDFStreamEngine(BJL)
[ 1165686 ] Expected int type parse error(BJL)
[ 1182825 ] Wrong handling of signed/unsigned byte/int in TTF parsing(BJL)
[ 1182892 ] PDFHighlight.setHighlightColor was removed because it is not implemented by adobe(BJL)

Version 0.7.1 (04/10/2005)

Changes to the Code Base

[ 1170068 ] text field is not found(BJL)
fixed NPE issue where an image did not have any applied filters(BJL)
Fixed issue where extra spaces were being added during text extraction for type3 fonts(BJL)
[ 1119420 ] Extract and Update the Meta-Information as XML(BJL)
[ 1119410 ] Extract text in/between bookmarks(BJL)
[ 1164476 ] XFDFImport should fail with non XFDF document(BJL)
[ 1119408 ] Support named target for Bookmark extraction.(BJL)
Created Resources/PDFBox_External_Fonts.properties to create a mapping for non-embedded fonts(BJL)
**API Change** Renamed PDField.getName() to PDField.getPartialName(), added method getFullyQualifiedName() (BJL)
**API Change** Renamed PDWidget to PDAnnotationWidget for naming consistency(BJL)
Text is now extracted from embedded form xobjects.(BJL)
Deployed site to new hosting vendor.(BJL)
committed code for PDFHighlighter to highlight words in a PDF document.(BJL)
Added command line application org.pdfbox.PDFToImage(BJL)
Implemented runlength decoding(BJL)
Added patch from Jorge Hernández Sellés to append content streams to existing page.(BJL)
**API Change**renamed package from pdmodel.graphics.image to pdmodel.graphics.xobject(BJL)
**API Change**Removed PDRadioButton, should use PDCheckbox instead(BJL)
**API Change**COSStream now extends COSDictionary instead of containing a dictionary(BJL)
[ 1021241 ] Text extraction should follow PDF article divisions(BJL)
Added implementation for PDF page articles(BJL)
Created TextToPDF command line application(BJL)
Created ImageToPDF example(BJL)
fixed parsing of header where a trailing % exists(BJL)
[ 1110029 ] Character ">" not quoted in COSName::writePDF(BJL)

Version 0.7.0 (1/22/2005)

Changes to the Code Base

committed [ 1097913 ] Enhance LucenePDFDocument streams(thanks to Olivier Parent)(BJL)
Added implementation for PDF Bookmarks(BJL)
Added implementation for PDF Destinations(BJL)
Updated website for better format for documentation(BJL)
Now ExportFDF and ExportXFDF will default output files to pdfname.fdf and pdfname.xfdf(BJL)
[ 1046278 ] ClassCastException when doing FDF/XFDF(BJL)
ExtractText now allows you to extract text if you decrypt with the owner password(BJL)
Added PDF 1.5 Object Stream support(BJL)
Added pdmodel.common.PDStream to represent COSStream(BJL)
changed PDPage.getContents to use PDStream instead of COSStream(BJL)
Updated LucenePDFDocument Javadoc to tell which Lucene fields it populates(BJL)
moved HelloWorld example from persistence to pdmodel and updated to use new PD Model features(BJL)
Refactored PDFStreamEngine based on contributions from Christophe Huault(BJL)
This class no longer uses a gigantic if/else statement for all of the operators they are defined as properties when instantiating the class(BJL)
Updated AFM resources to be ones released on Adobe's site, include AFM license as well(BJL)
Added ability to embed TTF fonts, only WinAnsiEncoding is supported at this time(BJL)
Added ability to extract images, thanks to contributions by Brigitte Mathiak(BJL)
COSWriter now generates the document id if it does not already exist(BJL)
improved performance for text extraction(BJL)
[ 1058693 ] TextPosition does not take account of tz operator(BJL)
upgraded to log4j-1.2.9(BJL)
include package-list for javadocs(BJL)
[ 1037145 ] Infinite loop in PDFParser.parseObject(BJL)
fixed error where spaces before integers was causing parse errors(BJL)

Version 0.6.7 (10/09/2004)

Changes to the Code Base

Revamped the way character spacing and font information is obtained(BJL)
Improved location information about a character drawn on the screen.(BJL)
Changed the PDFStreamEngine.showString to showCharacter to support the newly improved location information. This will now only show one character at a time.(BJL)
Fixed bug in PDDocument.isOwnerPassword and isUserPassword that was using the wrong length for the encryption key(BJL)
Upgraded to ant 1.6.2(BJL)
Upgraded to checkstyle-3.4(BJL)
Upgraded to JUnit-3.8.1(BJL)
Upgraded to lucene-1.4.2(BJL)
Integrated patch(1016603) for issue 943319 to fix parsing of open office documents(BJL)
Patch:985347 No longer throw exception for "No 'ToUnicode' and no 'Encoding' for Font"(BJL)
Patch:996191 Fixed case statement with missing break(BJL)
Patch:996781 Fixed null pointer exception in acroform fields(BJL)
Renamed DecryptDocument to DocumentEncryption to support encryption and decryption(BJL)
Added load/save/encrypt/decrypt convenience methods on the PDDocument class(BJL)
COSWriter now attempts to keep object numbers from parsed documents and writes 'free' entries in the xref if necessary(BJL)
Added the ability to set the word separator on the PDFTextStripper(BJL)
Fixed issue where PDFBox would throw an IOException if a PDF was incorrectly missing an endobj tag(BJL)
Fixed 918220 where PDFBox would freeze when parsing certain cmap files(BJL)
Added initial colorspace support(BJL)
Fixed issue where AppendDoc was throwing ClassCastException(BJL)
Fixed 1013163 Can't parse filters that use filter abbreviation(BJL)
Fixed 1011244 Where encrypting then decrypting was causing a problem(BJL)
renamed TextPosition.getWidth to TextPosition.getCombinedHorizontalDisplacement to better reflect its actual value(BJL)
Fixed 919215 PDFBox now support stream replacement(BJL)
Fixed 955043 Added support for 'ETenms-B5-H' encoding(BJL)
Fixed 996050 Class Cast exception when importing(BJL)
Added support for Font descriptors(BJL)
Fixed spacing issues when doing textfield FDF import(BJL)
Fixed 1017175 Large number converted when re-written(BJL)
Fixed 1029873 PDFBox now allows for multiple xref sections(BJL)
Added support for document Viewer Preferences(BJL)
Made currentDocument and pdfDocument protected in util.Splitter to allow easier subclassing(BJL)
Fixed 1034427 After Splitting page orientation is lost(BJL)
Added the following command line applications (BJL)

Version 0.6.6 (07/20/2004)

Changes to the Code Base

Improved support for setting of checkbox fields(FDF import)(BJL)
Added the org.pdfbox.PDFSplit utility to split a single document into many documents(BJL)
PDFBox now ignore the Length field that is associated with a stream, it has been found to be wrong in some documents(BJL)
Fixed bug when writing out PDF documents and the document contained an non alphabetic character such as ( or )(BJL)
Fixed bug in PDFont where dictionary encodings where not being processed correctly(BJL)
Fixed bug in COSDocument.isEncrypted which was comparing COSNull to the wrong object(BJL)
Integrated patch for supporting multiple lines in the appearance stream(BJL)
Upgraded to lucene-1.4-final(BJL)
org.pdfbox.ExtractText now uses the system encoding as the default encoding instead of ISO-8859-1(BJL)

Version 0.6.5 (03/08/2004)

Changes to the Code Base

Fixed bug in revision 3 encryption algorithm(BJL)
added support for CIDFontType0 glyph widths, which fixed issue with spaces being during text extraction(BJL)
Fixed infinite loop when parsing a corrupt content stream(BJL)
Add characterspacing + wordspacing when determining the width of a space character(BJL)
Added support for more font types(BJL)
refactored the pdmodel.interactive package, form fields use object delegation instead of inheritance for the widget, see PDField.getWidget and PDField.getKids(BJL)
Fixed bug where an inheritable cropbox would cause stackoverflow exception(BJL)
Changed usage of PDField/PDWidget to look like object delegation instead of inheritance by adding a PDField.getWidget instead of extending PDWidget(BJL)
refactored interactive package, this will break any existing code that uses the PDField/PDAnnotation classes. You will need to adjust your package names!!(BJL)
Now uses StandardEncoding as the default encoding(BJL)
Bug in AppendDoc example that did not take into account groups of pages(BJL)
PDFont now also tries the bootstrap classloader when loading AFM resources(BJL)
added -startPage and -endPage command line options to org.pdfbox.ExtractText(BJL)
Added support for corrupt PDFs with garbage before the header(BJL)
Fixed bug where there was whitespace instead of garbage characters in front of the first object(BJL)
performance improvements for the Matrix implementation(BJL)
upgraded to lucene 1.3(BJL)
fixed bug in cmap parser for cmap files that all ended in 'def'(BJL)
Removed createObject method from COSDocument, COSWriter will handle all object references for you(BJL)
Updated AppendDoc to use PDDocument instead of COSDocument and a couple bug fixes(BJL)
PDFParser now closes the document if there were parse errors(BJL)
TextPosition now has the PDFont that is associated with the piece of text(BJL)
Added initial version of org.pdfbox.PDFViewer, a GUI application to view the internal structure of a PDF document. This can be used for debugging purposes at this time but may end up being a Adobe Reader like application if there is enough interest(BJL)
Changed COSNumber/COSInteger/COSFloat interface to have both intValue and longValue(BJL)
Added methods isUserPassword & isOwnerPassword to PDDocument(BJL)
Added cmap files for CJK languages, please give me some feedback(BJL)

Version 0.6.4 (11/02/2003)

Changes to the Code Base

Fixed bug which caused infinite loop(BJL)
Fixed bug in encoding where DictionaryEncoding kept a reference instead of making a copy leading to encoding problems(BJL)
Added PDFTextStripper.(get|set)PageSeparator, which will allow the user to output a string after every page(BJL)
refactored text stripping code to separate the logic processing of PDF operators and the logic of extracting text(BJL)
ran findbugs on source code and fixed a couple minor issues(BJL)
Refactored font functionality to PDFont, some API methods are no longer available in COSObject(BJL)
changed name of org.pdfbox.Main to org.pdfbox.ExtractText(BJL)
added contribution of org.pdfbox.Overlay from Mario Ivankovits(BJL)
added log.isDebugEnabled checks to log4j calls(BJL)
added better escaping when writing COSNames(BJL)
fixed bug where encryption dictionary is sometimes set to COSNull instead of not being present(BJL)

Version 0.6.3 (09/13/2003)

Changes to the Code Base

Now contains the ability to import/set FDF data thanks to a contribution from Stefan Uldum Grinsted(BJL)
No longer throw an error when stream is not followed by 0A or 0D0A to allow more PDFs to be parsed(BJL)
Added -encoding argument to org.pdfbox.Main to control the encoding of the output(BJL)
Remove Prev entry from trailer if it exists because PDFBox automatically clears all old entries, only an issue when modifying/saving an existing PDF document(BJL)
Fixed bug in master password encryption algorithm for Revision 3 encrypted documents(BJL)
COSString no longer uses UTF-8 when encoding the byte array(BJL)
Added PDDocument.getPageCount()(BJL)
Fixed bug in PDFEncryption where(BJL)
Now enforces text extraction permissions(BJL)

Version 0.6.2 (4/18/2003)

Changes to the Code Base

Modified build so that build.properties settings are no longer required(BJL)
Added required libraries to CVS(BJL)
Added log4j logging(BJL)
Significant text extraction work(BJL)
Added automatic handling of files encrypted with the empty password(BJL)
Added automated tests and test data for text extraction(BJL)
Removed unimplemented decoders from filters test(BJL)
Fixed several LZW decode bugs introduced after 0.5.6(BJL)
Fixed bugs relating to processing out of spec PDF's with bad # escaping in the name ("java.io.IOException: Error: expected hex number" bug)(BJL)
Fixed Lucene UID generation bug(BJL)
Fixed GetFontWidths null pointer exception bug(BJL)

Version 0.6.1 (3/9/2003)

Changes to the Code Base

Fixed bug in parsing stream objects which led to "Unexpected end of ZLIB input stream"(BJL)
Changed license from LGPL to BSD to allow pdfbox to be used easily in Apache projects(BJL)

Version 0.6.0 (3/5/2003)

Changes to the Code Base

Massive improvements to memory footprint(BJL)
Must call close() on the COSDocument(LucenePDFDocument does this for you)(BJL)
Really fixed the bug where small documents were not being indexed(BJL)
Fixed bug where no whitespace existed between obj and start of object. Exception in thread "main" java.io.IOException: expected='obj' actual='obj<</Pro(BJL)
Fixed issue with spacing where textLineMatrix was not being copied properly(BJL)
Fixed 'bug' where parsing would fail with some pdfs with double endobj definitions(BJL)
Added PDF document summary fields to the lucene document(BJL)

Version 0.5.6 (11/28/2002)

Changes to the Code Base

Fixed bug in LucenePDFDocument where stream was not being closed and small documents were not being indexed (BJL)
Fixed a spacing issue for some PDF documents (BJL)
Fixed error while parsing the version number (BJL)
Fixed NullPointer in persistence example (BJL)
Create example lucene IndexFiles class which models the demo from lucene (BJL)
Fixed bug where garbage at the end of file caused an infinite loop (BJL)
Fixed bug in parsing boolean values with stuff at the end like "true>>" (BJL)

Version 0.5.5 (10/03/2002)

Changes to the Code Base

Added example of printing document signature(BJL)
Added example to print out form fields values(BJL)
Fixed bug when appending documents(BJL)
Various other bug fixes(BJL)

Version 0.5.4 (09/17/2002)

Changes to the Code Base

Fixed bug in text output where '?' instead of the proper character(BJL)
Fixed bug where sections of text were not being output at all(BJL)

Version 0.5.3 (09/13/2002)

Changes to the Code Base

Fixed bug in 128 bit encryption(BJL)

Version 0.5.2 (09/06/2002)

Changes to the Code Base

Fixed bug where FDF documents could not be appended to PDF Documents(BJL)
Catch all NumberFormatExceptions and wrap them with IOExceptions(BJL)

Version 0.5.1 (09/04/2002)

Changes to the Code Base

Now supports unicode for the document summary(BJL)
Better support for Type0 fonts(BJL)
Fixed bug with an empty LZW stream(BJL)
Fixed parsing error for ID operator(BJL)

Version 0.5.0 (08/31/2002)

Changes to the Code Base

Now supports unicode for the document summary(BJL)
Better support for Type0 fonts(BJL)
Fixed bug with an empty LZW stream(BJL)
Fixed parsing error for ID operator(BJL)

Version 0.4.1 (07/25/2002)

Changes to the Code Base

Fixed bug where .notdef was being output as document text(BJL)

Version 0.4.0 (07/23/2002)

Changes to the Code Base

Added extract text ant task(BJL)
Implemented AFM(Adobe Font Metrics) resource loading(BJL)
Fixed numerous bugs submitted by users(BJL)
Changed project from pdfparser to pdfbox to better reflect future needs(BJL)

Version 0.3.0 (07/09/2002)

Changes to the Code Base

Added indexer for the lucene project(BJL)
Initial implementation of PDF encryption(not working yet)(BJL)

Version 0.2.0 (06/03/2002)

Changes to the Code Base

Added support for the various encodings(BJL)
Improved the accuracy of the text output(BJL)

Version 0.1.0 (05/25/2002)

Changes to the Code Base

Initial Version(BJL)