History of Changes
Version 0.7.3 (10/12/2006)
Changes to the Code Base
-
Upgraded to Checkstyle 4.2(BJL)
-
Upgraded to IKVM 0.30.0.0(BJL)
-
[ 1546399 ] Use get/set functions for separators in PDFTextStripper(BJL)
-
PDDocument.silentPrint() to print without prompting for a printer(BJL)
-
[ 1544118 ] Bug in PDFont.getCodeFromArray(BJL)
-
[ 1529835 ] Add COSFloat.setValue()(BJL)
-
[ 1492555 ] PDChoiceField dead loop(BJL)
-
[ 1499521 ] NPE PDAppearance.convertToMultiLine(BJL)
-
[ 1522007 ] Error converting date(BJL)
-
Upgraded to Lucene 2.0.0(BJL)
-
[ 1451164 ] Problems filling combo and radio form fields(BJL)
-
Upgraded to lucene 1.9.1(BJL)
-
[ 1023133 ] Support PDF Functions(BJL)
-
Added command line org.pdfbox.PDFMerger(BJL)
-
***API Change*** Promoted AppendDoc from example to util package, renamed to PDFMergerUtility.(BJL)
-
Upgraded to IKVM-0.24.0.1(BJL)
-
[ 1391952 ] Problem extracting embedded attachments(BJL)
-
[ 1249607 ] Fixed issue with broken PDFs that contain multiple endobj(BJL)
-
[ 1153174 ] Added documentation for PDFHighlighter(BJL)
-
Removed log4j dependency(BJL)
-
[ 974661 ] getKids() Null Pointer Exception when parsing pdf(BJL)
-
Added better support for CJK encoding(BJL)
-
Changed signature of PDFPageContentStream.drawImage to take float arguments instead of int(BJL)
-
Fixed issue where form xobjects where not being drawn in the viewer(BJL)
-
Changed signature to PDDocumentCatalog.OpenAction to be an PDDestinationOrAction instead of just action.(BJL)
-
Added tolerance to text extraction sorting where text on a line was not at the same exact y coordinate but very close(BJL)
-
[ 1327133 ] Printing with form data(BJL)
-
Fixed issue with DateConverter that was trying to parse an empty string(BJL)
-
[ 1324846 ] appending text to PDPageContentStream messes up fonts(BJL)
-
Added new example ReplaceURLs to show how to replace a clickable URL in a PDF(BJL)
-
Implemented annotation drawing(BJL)
-
Implemented EndPath and StrokeAndClosePath operators(BJL)
-
Move text extraction permission checking from PDFTextStripper to ExtractText(BJL)
-
Added support for more annotations, thanks to a contribution from Paul King(BJL)
-
Created new FontBox project to hold all font library code(BJL)
-
Fixed issue where only the first page was sent to the printer(BJL)
-
Now automatically sets the page orientation when printing(BJL)
Changes to Documentation
-
Upgraded to Apache Forrest 0.8-dev(BJL)
Version 0.7.2 (09/11/2005)
Changes to the Code Base
-
Upgraded to IKVM-0.20.0.0(BJL)
-
Added support to get annotations from a page and to create a RubberStamp annotation(BJL)
-
Added PDDocument.print() to send the PDF to a printer.(BJL)
-
[ 1276623 ] NullPointerException in PageDrawer:241 when extractin images(BJL)
-
Allow creation of PDJpeg from a BufferedImage, thanks to contribution from Paul King(BJL)
-
Removed PDTiff in favor of PDCcitt(BJL)
-
PDFBox no longer requires log4j!!(BJL)
-
New class to allow you to specify 'named' regions where text is to be extracted.(BJL)
-
[ 1261555 ] Unexpected end of ZLIB input stream when stream has a zero length(BJL)
-
[ 1226665 ] ImportXFDF giving NPE error(BJL)
-
renamed COSDictionary.setItem( String, boolean ) to COSDictionary.setBoolean( String, boolean )(BJL)
-
Added sorting parameter to PDFTextStripper(BJL)
-
Fixed issues with PDF encryption(BJL)
-
Better date support, added support for PDFs that use non standard dates, support for time zone offsets(BJL)
-
FlateFilter-class now supports PNG-Predictors for decoding the imagedata, thanks to a contribution from Marcel Kammer(BJL)
-
Added support for extracting tiff images, thanks to a contribution from Marcel Kammer(BJL)
-
Added PDDocument.removePage to remove PDF pages(BJL)
-
Fixed issue when creating a COSString with a UTF 16 string(BJL)
-
Committed patch for type 1 PFB font parser(special thanks to Michael Niedermair)(BJL)
-
Committed patch for PNG predictors (special thanks Erik Martino)(BJL)
-
[ 1227428 ] failure of getMediaBox(BJL)
-
[ 1227426 ] null pointer in PDFToImage(ColorModel is null)(BJL)
-
[ 1207113 ] Enhancement: runtime accessible version(BJL)
-
[ 1213320 ] setFfFlag() of PDField not working correctly(BJL)
-
[ 1215945 ] Error in COSString.writePDF() - fixed escape sequences(BJL)
-
[ 1198912 ] COSName with escaped characters not parsed correctly(BJL)
-
Fixed issue where resources were not being cleared in PDFStreamEngine(BJL)
-
[ 1165686 ] Expected int type parse error(BJL)
-
[ 1182825 ] Wrong handling of signed/unsigned byte/int in TTF parsing(BJL)
-
[ 1182892 ] PDFHighlight.setHighlightColor was removed because it is not implemented by adobe(BJL)
Version 0.7.1 (04/10/2005)
Changes to the Code Base
-
[ 1170068 ] text field is not found(BJL)
-
fixed NPE issue where an image did not have any applied filters(BJL)
-
Fixed issue where extra spaces were being added during text extraction for type3 fonts(BJL)
-
[ 1119420 ] Extract and Update the Meta-Information as XML(BJL)
-
[ 1119410 ] Extract text in/between bookmarks(BJL)
-
[ 1164476 ] XFDFImport should fail with non XFDF document(BJL)
-
[ 1119408 ] Support named target for Bookmark extraction.(BJL)
-
Created Resources/PDFBox_External_Fonts.properties to create a mapping for non-embedded fonts(BJL)
-
**API Change** Renamed PDField.getName() to PDField.getPartialName(), added method getFullyQualifiedName() (BJL)
-
**API Change** Renamed PDWidget to PDAnnotationWidget for naming consistency(BJL)
-
Text is now extracted from embedded form xobjects.(BJL)
-
Deployed site to new hosting vendor.(BJL)
-
committed code for PDFHighlighter to highlight words in a PDF document.(BJL)
-
Added command line application org.pdfbox.PDFToImage(BJL)
-
Implemented runlength decoding(BJL)
-
Added patch from Jorge Hernández Sellés to append content streams to existing page.(BJL)
-
**API Change**renamed package from pdmodel.graphics.image to pdmodel.graphics.xobject(BJL)
-
**API Change**Removed PDRadioButton, should use PDCheckbox instead(BJL)
-
**API Change**COSStream now extends COSDictionary instead of containing a dictionary(BJL)
-
[ 1021241 ] Text extraction should follow PDF article divisions(BJL)
-
Added implementation for PDF page articles(BJL)
-
Created TextToPDF command line application(BJL)
-
Created ImageToPDF example(BJL)
-
fixed parsing of header where a trailing % exists(BJL)
-
[ 1110029 ] Character ">" not quoted in COSName::writePDF(BJL)
Version 0.7.0 (1/22/2005)
Changes to the Code Base
-
committed [ 1097913 ] Enhance LucenePDFDocument streams(thanks to Olivier Parent)(BJL)
-
Added implementation for PDF Bookmarks(BJL)
-
Added implementation for PDF Destinations(BJL)
-
Updated website for better format for documentation(BJL)
-
Now ExportFDF and ExportXFDF will default output files to pdfname.fdf and pdfname.xfdf(BJL)
-
[ 1046278 ] ClassCastException when doing FDF/XFDF(BJL)
-
ExtractText now allows you to extract text if you decrypt with the owner password(BJL)
-
Added PDF 1.5 Object Stream support(BJL)
-
Added pdmodel.common.PDStream to represent COSStream(BJL)
-
changed PDPage.getContents to use PDStream instead of COSStream(BJL)
-
Updated LucenePDFDocument Javadoc to tell which Lucene fields it populates(BJL)
-
moved HelloWorld example from persistence to pdmodel and updated to use new PD Model features(BJL)
-
Refactored PDFStreamEngine based on contributions from Christophe Huault(BJL)
-
This class no longer uses a gigantic if/else statement for all of the operators they are defined as properties when instantiating the class(BJL)
-
Updated AFM resources to be ones released on Adobe's site, include AFM license as well(BJL)
-
Added ability to embed TTF fonts, only WinAnsiEncoding is supported at this time(BJL)
-
Added ability to extract images, thanks to contributions by Brigitte Mathiak(BJL)
-
COSWriter now generates the document id if it does not already exist(BJL)
-
improved performance for text extraction(BJL)
-
[ 1058693 ] TextPosition does not take account of tz operator(BJL)
-
upgraded to log4j-1.2.9(BJL)
-
include package-list for javadocs(BJL)
-
[ 1037145 ] Infinite loop in PDFParser.parseObject(BJL)
-
fixed error where spaces before integers was causing parse errors(BJL)
Version 0.6.7 (10/09/2004)
Changes to the Code Base
-
Revamped the way character spacing and font information is obtained(BJL)
-
Improved location information about a character drawn on the screen.(BJL)
-
Changed the PDFStreamEngine.showString to showCharacter to support the newly improved location information. This will now only show one character at a time.(BJL)
-
Fixed bug in PDDocument.isOwnerPassword and isUserPassword that was using the wrong length for the encryption key(BJL)
-
Upgraded to ant 1.6.2(BJL)
-
Upgraded to checkstyle-3.4(BJL)
-
Upgraded to JUnit-3.8.1(BJL)
-
Upgraded to lucene-1.4.2(BJL)
-
Integrated patch(1016603) for issue 943319 to fix parsing of open office documents(BJL)
-
Patch:985347 No longer throw exception for "No 'ToUnicode' and no 'Encoding' for Font"(BJL)
-
Patch:996191 Fixed case statement with missing break(BJL)
-
Patch:996781 Fixed null pointer exception in acroform fields(BJL)
-
Renamed DecryptDocument to DocumentEncryption to support encryption and decryption(BJL)
-
Added load/save/encrypt/decrypt convenience methods on the PDDocument class(BJL)
-
COSWriter now attempts to keep object numbers from parsed documents and writes 'free' entries in the xref if necessary(BJL)
-
Added the ability to set the word separator on the PDFTextStripper(BJL)
-
Fixed issue where PDFBox would throw an IOException if a PDF was incorrectly missing an endobj tag(BJL)
-
Fixed 918220 where PDFBox would freeze when parsing certain cmap files(BJL)
-
Added initial colorspace support(BJL)
-
Fixed issue where AppendDoc was throwing ClassCastException(BJL)
-
Fixed 1013163 Can't parse filters that use filter abbreviation(BJL)
-
Fixed 1011244 Where encrypting then decrypting was causing a problem(BJL)
-
renamed TextPosition.getWidth to TextPosition.getCombinedHorizontalDisplacement to better reflect its actual value(BJL)
-
Fixed 919215 PDFBox now support stream replacement(BJL)
-
Fixed 955043 Added support for 'ETenms-B5-H' encoding(BJL)
-
Fixed 996050 Class Cast exception when importing(BJL)
-
Added support for Font descriptors(BJL)
-
Fixed spacing issues when doing textfield FDF import(BJL)
-
Fixed 1017175 Large number converted when re-written(BJL)
-
Fixed 1029873 PDFBox now allows for multiple xref sections(BJL)
-
Added support for document Viewer Preferences(BJL)
-
Made currentDocument and pdfDocument protected in util.Splitter to allow easier subclassing(BJL)
-
Fixed 1034427 After Splitting page orientation is lost(BJL)
-
Added the following command line applications
(BJL)
Version 0.6.6 (07/20/2004)
Changes to the Code Base
-
Improved support for setting of checkbox fields(FDF import)(BJL)
-
Added the org.pdfbox.PDFSplit utility to split a single document into many documents(BJL)
-
PDFBox now ignore the Length field that is associated with a stream, it has been found to be wrong in some documents(BJL)
-
Fixed bug when writing out PDF documents and the document contained an non alphabetic character such as ( or )(BJL)
-
Fixed bug in PDFont where dictionary encodings where not being processed correctly(BJL)
-
Fixed bug in COSDocument.isEncrypted which was comparing COSNull to the wrong object(BJL)
-
Integrated patch for supporting multiple lines in the appearance stream(BJL)
-
Upgraded to lucene-1.4-final(BJL)
-
org.pdfbox.ExtractText now uses the system encoding as the default encoding instead of ISO-8859-1(BJL)
Version 0.6.5 (03/08/2004)
Changes to the Code Base
-
Fixed bug in revision 3 encryption algorithm(BJL)
-
added support for CIDFontType0 glyph widths, which fixed issue with spaces being during text extraction(BJL)
-
Fixed infinite loop when parsing a corrupt content stream(BJL)
-
Add characterspacing + wordspacing when determining the width of a space character(BJL)
-
Added support for more font types(BJL)
-
refactored the pdmodel.interactive package, form fields use object delegation instead of inheritance for the widget, see PDField.getWidget and PDField.getKids(BJL)
-
Fixed bug where an inheritable cropbox would cause stackoverflow exception(BJL)
-
Changed usage of PDField/PDWidget to look like object delegation instead of inheritance by adding a PDField.getWidget instead of extending PDWidget(BJL)
-
refactored interactive package, this will break any existing code that uses the PDField/PDAnnotation classes. You will need to adjust your package names!!(BJL)
-
Now uses StandardEncoding as the default encoding(BJL)
-
Bug in AppendDoc example that did not take into account groups of pages(BJL)
-
PDFont now also tries the bootstrap classloader when loading AFM resources(BJL)
-
added -startPage and -endPage command line options to org.pdfbox.ExtractText(BJL)
-
Added support for corrupt PDFs with garbage before the header(BJL)
-
Fixed bug where there was whitespace instead of garbage characters in front of the first object(BJL)
-
performance improvements for the Matrix implementation(BJL)
-
upgraded to lucene 1.3(BJL)
-
fixed bug in cmap parser for cmap files that all ended in 'def'(BJL)
-
Removed createObject method from COSDocument, COSWriter will handle all object references for you(BJL)
-
Updated AppendDoc to use PDDocument instead of COSDocument and a couple bug fixes(BJL)
-
PDFParser now closes the document if there were parse errors(BJL)
-
TextPosition now has the PDFont that is associated with the piece of text(BJL)
-
Added initial version of org.pdfbox.PDFViewer, a GUI application to view the internal structure of a PDF document. This can be used for debugging purposes at this time but may end up being a Adobe Reader like application if there is enough interest(BJL)
-
Changed COSNumber/COSInteger/COSFloat interface to have both intValue and longValue(BJL)
-
Added methods isUserPassword & isOwnerPassword to PDDocument(BJL)
-
Added cmap files for CJK languages, please give me some feedback(BJL)
Version 0.6.4 (11/02/2003)
Changes to the Code Base
-
Fixed bug which caused infinite loop(BJL)
-
Fixed bug in encoding where DictionaryEncoding kept a reference instead of making a copy leading to encoding problems(BJL)
-
Added PDFTextStripper.(get|set)PageSeparator, which will allow the user to output a string after every page(BJL)
-
refactored text stripping code to separate the logic processing of PDF operators and the logic of extracting text(BJL)
-
ran findbugs on source code and fixed a couple minor issues(BJL)
-
Refactored font functionality to PDFont, some API methods are no longer available in COSObject(BJL)
-
changed name of org.pdfbox.Main to org.pdfbox.ExtractText(BJL)
-
added contribution of org.pdfbox.Overlay from Mario Ivankovits(BJL)
-
added log.isDebugEnabled checks to log4j calls(BJL)
-
added better escaping when writing COSNames(BJL)
-
fixed bug where encryption dictionary is sometimes set to COSNull instead of not being present(BJL)
Version 0.6.3 (09/13/2003)
Changes to the Code Base
-
Now contains the ability to import/set FDF data thanks to a contribution from Stefan Uldum Grinsted(BJL)
-
No longer throw an error when stream is not followed by 0A or 0D0A to allow more PDFs to be parsed(BJL)
-
Added -encoding argument to org.pdfbox.Main to control the encoding of the output(BJL)
-
Remove Prev entry from trailer if it exists because PDFBox automatically clears all old entries, only an issue when modifying/saving an existing PDF document(BJL)
-
Fixed bug in master password encryption algorithm for Revision 3 encrypted documents(BJL)
-
COSString no longer uses UTF-8 when encoding the byte array(BJL)
-
Added PDDocument.getPageCount()(BJL)
-
Fixed bug in PDFEncryption where(BJL)
-
Now enforces text extraction permissions(BJL)
Version 0.6.2 (4/18/2003)
Changes to the Code Base
-
Modified build so that build.properties settings are no longer required(BJL)
-
Added required libraries to CVS(BJL)
-
Added log4j logging(BJL)
-
Significant text extraction work(BJL)
-
Added automatic handling of files encrypted with the empty password(BJL)
-
Added automated tests and test data for text extraction(BJL)
-
Removed unimplemented decoders from filters test(BJL)
-
Fixed several LZW decode bugs introduced after 0.5.6(BJL)
-
Fixed bugs relating to processing out of spec PDF's with bad # escaping in the name ("java.io.IOException: Error: expected hex number" bug)(BJL)
-
Fixed Lucene UID generation bug(BJL)
-
Fixed GetFontWidths null pointer exception bug(BJL)
Version 0.6.1 (3/9/2003)
Changes to the Code Base
-
Fixed bug in parsing stream objects which led to "Unexpected end of ZLIB input stream"(BJL)
-
Changed license from LGPL to BSD to allow pdfbox to be used easily in Apache projects(BJL)
Version 0.6.0 (3/5/2003)
Changes to the Code Base
-
Massive improvements to memory footprint(BJL)
-
Must call close() on the COSDocument(LucenePDFDocument does this for you)(BJL)
-
Really fixed the bug where small documents were not being indexed(BJL)
-
Fixed bug where no whitespace existed between obj and start of object. Exception in thread "main" java.io.IOException: expected='obj' actual='obj<</Pro(BJL)
-
Fixed issue with spacing where textLineMatrix was not being copied properly(BJL)
-
Fixed 'bug' where parsing would fail with some pdfs with double endobj definitions(BJL)
-
Added PDF document summary fields to the lucene document(BJL)
Version 0.5.6 (11/28/2002)
Changes to the Code Base
-
Fixed bug in LucenePDFDocument where stream was not being closed and small documents were not being indexed (BJL)
-
Fixed a spacing issue for some PDF documents (BJL)
-
Fixed error while parsing the version number (BJL)
-
Fixed NullPointer in persistence example (BJL)
-
Create example lucene IndexFiles class which models the demo from lucene (BJL)
-
Fixed bug where garbage at the end of file caused an infinite loop (BJL)
-
Fixed bug in parsing boolean values with stuff at the end like "true>>" (BJL)
Version 0.5.5 (10/03/2002)
Changes to the Code Base
-
Added example of printing document signature(BJL)
-
Added example to print out form fields values(BJL)
-
Fixed bug when appending documents(BJL)
-
Various other bug fixes(BJL)
Version 0.5.4 (09/17/2002)
Changes to the Code Base
-
Fixed bug in text output where '?' instead of the proper character(BJL)
-
Fixed bug where sections of text were not being output at all(BJL)
Version 0.5.3 (09/13/2002)
Changes to the Code Base
-
Fixed bug in 128 bit encryption(BJL)
Version 0.5.2 (09/06/2002)
Changes to the Code Base
-
Fixed bug where FDF documents could not be appended to PDF Documents(BJL)
-
Catch all NumberFormatExceptions and wrap them with IOExceptions(BJL)
Version 0.5.1 (09/04/2002)
Changes to the Code Base
-
Now supports unicode for the document summary(BJL)
-
Better support for Type0 fonts(BJL)
-
Fixed bug with an empty LZW stream(BJL)
-
Fixed parsing error for ID operator(BJL)
Version 0.5.0 (08/31/2002)
Changes to the Code Base
-
Now supports unicode for the document summary(BJL)
-
Better support for Type0 fonts(BJL)
-
Fixed bug with an empty LZW stream(BJL)
-
Fixed parsing error for ID operator(BJL)
Version 0.4.1 (07/25/2002)
Changes to the Code Base
-
Fixed bug where .notdef was being output as document text(BJL)
Version 0.4.0 (07/23/2002)
Changes to the Code Base
-
Added extract text ant task(BJL)
-
Implemented AFM(Adobe Font Metrics) resource loading(BJL)
-
Fixed numerous bugs submitted by users(BJL)
-
Changed project from pdfparser to pdfbox to better reflect future needs(BJL)
Version 0.3.0 (07/09/2002)
Changes to the Code Base
-
Added indexer for the lucene project(BJL)
-
Initial implementation of PDF encryption(not working yet)(BJL)
Version 0.2.0 (06/03/2002)
Changes to the Code Base
-
Added support for the various encodings(BJL)
-
Improved the accuracy of the text output(BJL)
Version 0.1.0 (05/25/2002)
Changes to the Code Base
-
Initial Version(BJL)