History of Changes

RSS

Version 0.7.3 (10/12/2006)

Changes to the Code Base

  • updateUpgraded to Checkstyle 4.2(BJL)
  • updateUpgraded to IKVM 0.30.0.0(BJL)
  • update[ 1546399 ] Use get/set functions for separators in PDFTextStripper(BJL)
  • addPDDocument.silentPrint() to print without prompting for a printer(BJL)
  • fix[ 1544118 ] Bug in PDFont.getCodeFromArray(BJL)
  • update[ 1529835 ] Add COSFloat.setValue()(BJL)
  • fix[ 1492555 ] PDChoiceField dead loop(BJL)
  • fix[ 1499521 ] NPE PDAppearance.convertToMultiLine(BJL)
  • fix[ 1522007 ] Error converting date(BJL)
  • updateUpgraded to Lucene 2.0.0(BJL)
  • fix[ 1451164 ] Problems filling combo and radio form fields(BJL)
  • updateUpgraded to lucene 1.9.1(BJL)
  • update[ 1023133 ] Support PDF Functions(BJL)
  • updateAdded command line org.pdfbox.PDFMerger(BJL)
  • update***API Change*** Promoted AppendDoc from example to util package, renamed to PDFMergerUtility.(BJL)
  • updateUpgraded to IKVM-0.24.0.1(BJL)
  • fix[ 1391952 ] Problem extracting embedded attachments(BJL)
  • fix[ 1249607 ] Fixed issue with broken PDFs that contain multiple endobj(BJL)
  • add[ 1153174 ] Added documentation for PDFHighlighter(BJL)
  • updateRemoved log4j dependency(BJL)
  • fix[ 974661 ] getKids() Null Pointer Exception when parsing pdf(BJL)
  • addAdded better support for CJK encoding(BJL)
  • updateChanged signature of PDFPageContentStream.drawImage to take float arguments instead of int(BJL)
  • fixFixed issue where form xobjects where not being drawn in the viewer(BJL)
  • updateChanged signature to PDDocumentCatalog.OpenAction to be an PDDestinationOrAction instead of just action.(BJL)
  • fixAdded tolerance to text extraction sorting where text on a line was not at the same exact y coordinate but very close(BJL)
  • add[ 1327133 ] Printing with form data(BJL)
  • fixFixed issue with DateConverter that was trying to parse an empty string(BJL)
  • fix[ 1324846 ] appending text to PDPageContentStream messes up fonts(BJL)
  • addAdded new example ReplaceURLs to show how to replace a clickable URL in a PDF(BJL)
  • addImplemented annotation drawing(BJL)
  • addImplemented EndPath and StrokeAndClosePath operators(BJL)
  • updateMove text extraction permission checking from PDFTextStripper to ExtractText(BJL)
  • addAdded support for more annotations, thanks to a contribution from Paul King(BJL)
  • updateCreated new FontBox project to hold all font library code(BJL)
  • fixFixed issue where only the first page was sent to the printer(BJL)
  • fixNow automatically sets the page orientation when printing(BJL)

Changes to Documentation

  • updateUpgraded to Apache Forrest 0.8-dev(BJL)

Version 0.7.2 (09/11/2005)

Changes to the Code Base

  • updateUpgraded to IKVM-0.20.0.0(BJL)
  • addAdded support to get annotations from a page and to create a RubberStamp annotation(BJL)
  • addAdded PDDocument.print() to send the PDF to a printer.(BJL)
  • fix[ 1276623 ] NullPointerException in PageDrawer:241 when extractin images(BJL)
  • updateAllow creation of PDJpeg from a BufferedImage, thanks to contribution from Paul King(BJL)
  • addRemoved PDTiff in favor of PDCcitt(BJL)
  • addPDFBox no longer requires log4j!!(BJL)
  • addNew class to allow you to specify 'named' regions where text is to be extracted.(BJL)
  • fix[ 1261555 ] Unexpected end of ZLIB input stream when stream has a zero length(BJL)
  • fix[ 1226665 ] ImportXFDF giving NPE error(BJL)
  • updaterenamed COSDictionary.setItem( String, boolean ) to COSDictionary.setBoolean( String, boolean )(BJL)
  • updateAdded sorting parameter to PDFTextStripper(BJL)
  • fixFixed issues with PDF encryption(BJL)
  • updateBetter date support, added support for PDFs that use non standard dates, support for time zone offsets(BJL)
  • updateFlateFilter-class now supports PNG-Predictors for decoding the imagedata, thanks to a contribution from Marcel Kammer(BJL)
  • addAdded support for extracting tiff images, thanks to a contribution from Marcel Kammer(BJL)
  • addAdded PDDocument.removePage to remove PDF pages(BJL)
  • addFixed issue when creating a COSString with a UTF 16 string(BJL)
  • addCommitted patch for type 1 PFB font parser(special thanks to Michael Niedermair)(BJL)
  • addCommitted patch for PNG predictors (special thanks Erik Martino)(BJL)
  • fix[ 1227428 ] failure of getMediaBox(BJL)
  • fix[ 1227426 ] null pointer in PDFToImage(ColorModel is null)(BJL)
  • update[ 1207113 ] Enhancement: runtime accessible version(BJL)
  • fix[ 1213320 ] setFfFlag() of PDField not working correctly(BJL)
  • fix[ 1215945 ] Error in COSString.writePDF() - fixed escape sequences(BJL)
  • fix[ 1198912 ] COSName with escaped characters not parsed correctly(BJL)
  • fixFixed issue where resources were not being cleared in PDFStreamEngine(BJL)
  • fix[ 1165686 ] Expected int type parse error(BJL)
  • fix[ 1182825 ] Wrong handling of signed/unsigned byte/int in TTF parsing(BJL)
  • remove[ 1182892 ] PDFHighlight.setHighlightColor was removed because it is not implemented by adobe(BJL)

Version 0.7.1 (04/10/2005)

Changes to the Code Base

  • fix[ 1170068 ] text field is not found(BJL)
  • fixfixed NPE issue where an image did not have any applied filters(BJL)
  • fixFixed issue where extra spaces were being added during text extraction for type3 fonts(BJL)
  • update[ 1119420 ] Extract and Update the Meta-Information as XML(BJL)
  • update[ 1119410 ] Extract text in/between bookmarks(BJL)
  • update[ 1164476 ] XFDFImport should fail with non XFDF document(BJL)
  • add[ 1119408 ] Support named target for Bookmark extraction.(BJL)
  • addCreated Resources/PDFBox_External_Fonts.properties to create a mapping for non-embedded fonts(BJL)
  • update**API Change** Renamed PDField.getName() to PDField.getPartialName(), added method getFullyQualifiedName() (BJL)
  • update**API Change** Renamed PDWidget to PDAnnotationWidget for naming consistency(BJL)
  • updateText is now extracted from embedded form xobjects.(BJL)
  • updateDeployed site to new hosting vendor.(BJL)
  • updatecommitted code for PDFHighlighter to highlight words in a PDF document.(BJL)
  • updateAdded command line application org.pdfbox.PDFToImage(BJL)
  • updateImplemented runlength decoding(BJL)
  • updateAdded patch from Jorge Hernández Sellés to append content streams to existing page.(BJL)
  • update**API Change**renamed package from pdmodel.graphics.image to pdmodel.graphics.xobject(BJL)
  • update**API Change**Removed PDRadioButton, should use PDCheckbox instead(BJL)
  • update**API Change**COSStream now extends COSDictionary instead of containing a dictionary(BJL)
  • update[ 1021241 ] Text extraction should follow PDF article divisions(BJL)
  • addAdded implementation for PDF page articles(BJL)
  • addCreated TextToPDF command line application(BJL)
  • addCreated ImageToPDF example(BJL)
  • fixfixed parsing of header where a trailing % exists(BJL)
  • fix[ 1110029 ] Character ">" not quoted in COSName::writePDF(BJL)

Version 0.7.0 (1/22/2005)

Changes to the Code Base

  • updatecommitted [ 1097913 ] Enhance LucenePDFDocument streams(thanks to Olivier Parent)(BJL)
  • addAdded implementation for PDF Bookmarks(BJL)
  • addAdded implementation for PDF Destinations(BJL)
  • updateUpdated website for better format for documentation(BJL)
  • fixNow ExportFDF and ExportXFDF will default output files to pdfname.fdf and pdfname.xfdf(BJL)
  • fix[ 1046278 ] ClassCastException when doing FDF/XFDF(BJL)
  • fixExtractText now allows you to extract text if you decrypt with the owner password(BJL)
  • fixAdded PDF 1.5 Object Stream support(BJL)
  • fixAdded pdmodel.common.PDStream to represent COSStream(BJL)
  • fixchanged PDPage.getContents to use PDStream instead of COSStream(BJL)
  • fixUpdated LucenePDFDocument Javadoc to tell which Lucene fields it populates(BJL)
  • fixmoved HelloWorld example from persistence to pdmodel and updated to use new PD Model features(BJL)
  • fixRefactored PDFStreamEngine based on contributions from Christophe Huault(BJL)
  • fixThis class no longer uses a gigantic if/else statement for all of the operators they are defined as properties when instantiating the class(BJL)
  • fixUpdated AFM resources to be ones released on Adobe's site, include AFM license as well(BJL)
  • fixAdded ability to embed TTF fonts, only WinAnsiEncoding is supported at this time(BJL)
  • fixAdded ability to extract images, thanks to contributions by Brigitte Mathiak(BJL)
  • fixCOSWriter now generates the document id if it does not already exist(BJL)
  • fiximproved performance for text extraction(BJL)
  • fix[ 1058693 ] TextPosition does not take account of tz operator(BJL)
  • fixupgraded to log4j-1.2.9(BJL)
  • fixinclude package-list for javadocs(BJL)
  • fix[ 1037145 ] Infinite loop in PDFParser.parseObject(BJL)
  • fixfixed error where spaces before integers was causing parse errors(BJL)

Version 0.6.7 (10/09/2004)

Changes to the Code Base

  • fixRevamped the way character spacing and font information is obtained(BJL)
  • fixImproved location information about a character drawn on the screen.(BJL)
  • fixChanged the PDFStreamEngine.showString to showCharacter to support the newly improved location information. This will now only show one character at a time.(BJL)
  • fixFixed bug in PDDocument.isOwnerPassword and isUserPassword that was using the wrong length for the encryption key(BJL)
  • fixUpgraded to ant 1.6.2(BJL)
  • fixUpgraded to checkstyle-3.4(BJL)
  • fixUpgraded to JUnit-3.8.1(BJL)
  • fixUpgraded to lucene-1.4.2(BJL)
  • fixIntegrated patch(1016603) for issue 943319 to fix parsing of open office documents(BJL)
  • fixPatch:985347 No longer throw exception for "No 'ToUnicode' and no 'Encoding' for Font"(BJL)
  • fixPatch:996191 Fixed case statement with missing break(BJL)
  • fixPatch:996781 Fixed null pointer exception in acroform fields(BJL)
  • fixRenamed DecryptDocument to DocumentEncryption to support encryption and decryption(BJL)
  • fixAdded load/save/encrypt/decrypt convenience methods on the PDDocument class(BJL)
  • fixCOSWriter now attempts to keep object numbers from parsed documents and writes 'free' entries in the xref if necessary(BJL)
  • fixAdded the ability to set the word separator on the PDFTextStripper(BJL)
  • fixFixed issue where PDFBox would throw an IOException if a PDF was incorrectly missing an endobj tag(BJL)
  • fixFixed 918220 where PDFBox would freeze when parsing certain cmap files(BJL)
  • fixAdded initial colorspace support(BJL)
  • fixFixed issue where AppendDoc was throwing ClassCastException(BJL)
  • fixFixed 1013163 Can't parse filters that use filter abbreviation(BJL)
  • fixFixed 1011244 Where encrypting then decrypting was causing a problem(BJL)
  • fixrenamed TextPosition.getWidth to TextPosition.getCombinedHorizontalDisplacement to better reflect its actual value(BJL)
  • fixFixed 919215 PDFBox now support stream replacement(BJL)
  • fixFixed 955043 Added support for 'ETenms-B5-H' encoding(BJL)
  • fixFixed 996050 Class Cast exception when importing(BJL)
  • fixAdded support for Font descriptors(BJL)
  • fixFixed spacing issues when doing textfield FDF import(BJL)
  • fixFixed 1017175 Large number converted when re-written(BJL)
  • fixFixed 1029873 PDFBox now allows for multiple xref sections(BJL)
  • fixAdded support for document Viewer Preferences(BJL)
  • fixMade currentDocument and pdfDocument protected in util.Splitter to allow easier subclassing(BJL)
  • fixFixed 1034427 After Splitting page orientation is lost(BJL)
  • addAdded the following command line applications (BJL)

Version 0.6.6 (07/20/2004)

Changes to the Code Base

  • fixImproved support for setting of checkbox fields(FDF import)(BJL)
  • fixAdded the org.pdfbox.PDFSplit utility to split a single document into many documents(BJL)
  • fixPDFBox now ignore the Length field that is associated with a stream, it has been found to be wrong in some documents(BJL)
  • fixFixed bug when writing out PDF documents and the document contained an non alphabetic character such as ( or )(BJL)
  • fixFixed bug in PDFont where dictionary encodings where not being processed correctly(BJL)
  • fixFixed bug in COSDocument.isEncrypted which was comparing COSNull to the wrong object(BJL)
  • fixIntegrated patch for supporting multiple lines in the appearance stream(BJL)
  • fixUpgraded to lucene-1.4-final(BJL)
  • fixorg.pdfbox.ExtractText now uses the system encoding as the default encoding instead of ISO-8859-1(BJL)

Version 0.6.5 (03/08/2004)

Changes to the Code Base

  • fixFixed bug in revision 3 encryption algorithm(BJL)
  • fixadded support for CIDFontType0 glyph widths, which fixed issue with spaces being during text extraction(BJL)
  • fixFixed infinite loop when parsing a corrupt content stream(BJL)
  • fixAdd characterspacing + wordspacing when determining the width of a space character(BJL)
  • fixAdded support for more font types(BJL)
  • fixrefactored the pdmodel.interactive package, form fields use object delegation instead of inheritance for the widget, see PDField.getWidget and PDField.getKids(BJL)
  • fixFixed bug where an inheritable cropbox would cause stackoverflow exception(BJL)
  • fixChanged usage of PDField/PDWidget to look like object delegation instead of inheritance by adding a PDField.getWidget instead of extending PDWidget(BJL)
  • fixrefactored interactive package, this will break any existing code that uses the PDField/PDAnnotation classes. You will need to adjust your package names!!(BJL)
  • fixNow uses StandardEncoding as the default encoding(BJL)
  • fixBug in AppendDoc example that did not take into account groups of pages(BJL)
  • fixPDFont now also tries the bootstrap classloader when loading AFM resources(BJL)
  • fixadded -startPage and -endPage command line options to org.pdfbox.ExtractText(BJL)
  • fixAdded support for corrupt PDFs with garbage before the header(BJL)
  • fixFixed bug where there was whitespace instead of garbage characters in front of the first object(BJL)
  • fixperformance improvements for the Matrix implementation(BJL)
  • fixupgraded to lucene 1.3(BJL)
  • fixfixed bug in cmap parser for cmap files that all ended in 'def'(BJL)
  • fixRemoved createObject method from COSDocument, COSWriter will handle all object references for you(BJL)
  • fixUpdated AppendDoc to use PDDocument instead of COSDocument and a couple bug fixes(BJL)
  • fixPDFParser now closes the document if there were parse errors(BJL)
  • fixTextPosition now has the PDFont that is associated with the piece of text(BJL)
  • fixAdded initial version of org.pdfbox.PDFViewer, a GUI application to view the internal structure of a PDF document. This can be used for debugging purposes at this time but may end up being a Adobe Reader like application if there is enough interest(BJL)
  • fixChanged COSNumber/COSInteger/COSFloat interface to have both intValue and longValue(BJL)
  • fixAdded methods isUserPassword & isOwnerPassword to PDDocument(BJL)
  • fixAdded cmap files for CJK languages, please give me some feedback(BJL)

Version 0.6.4 (11/02/2003)

Changes to the Code Base

  • fixFixed bug which caused infinite loop(BJL)
  • fixFixed bug in encoding where DictionaryEncoding kept a reference instead of making a copy leading to encoding problems(BJL)
  • fixAdded PDFTextStripper.(get|set)PageSeparator, which will allow the user to output a string after every page(BJL)
  • fixrefactored text stripping code to separate the logic processing of PDF operators and the logic of extracting text(BJL)
  • fixran findbugs on source code and fixed a couple minor issues(BJL)
  • fixRefactored font functionality to PDFont, some API methods are no longer available in COSObject(BJL)
  • fixchanged name of org.pdfbox.Main to org.pdfbox.ExtractText(BJL)
  • fixadded contribution of org.pdfbox.Overlay from Mario Ivankovits(BJL)
  • fixadded log.isDebugEnabled checks to log4j calls(BJL)
  • fixadded better escaping when writing COSNames(BJL)
  • fixfixed bug where encryption dictionary is sometimes set to COSNull instead of not being present(BJL)

Version 0.6.3 (09/13/2003)

Changes to the Code Base

  • fixNow contains the ability to import/set FDF data thanks to a contribution from Stefan Uldum Grinsted(BJL)
  • fixNo longer throw an error when stream is not followed by 0A or 0D0A to allow more PDFs to be parsed(BJL)
  • fixAdded -encoding argument to org.pdfbox.Main to control the encoding of the output(BJL)
  • fixRemove Prev entry from trailer if it exists because PDFBox automatically clears all old entries, only an issue when modifying/saving an existing PDF document(BJL)
  • fixFixed bug in master password encryption algorithm for Revision 3 encrypted documents(BJL)
  • fixCOSString no longer uses UTF-8 when encoding the byte array(BJL)
  • fixAdded PDDocument.getPageCount()(BJL)
  • fixFixed bug in PDFEncryption where(BJL)
  • fixNow enforces text extraction permissions(BJL)

Version 0.6.2 (4/18/2003)

Changes to the Code Base

  • fixModified build so that build.properties settings are no longer required(BJL)
  • addAdded required libraries to CVS(BJL)
  • addAdded log4j logging(BJL)
  • updateSignificant text extraction work(BJL)
  • fixAdded automatic handling of files encrypted with the empty password(BJL)
  • addAdded automated tests and test data for text extraction(BJL)
  • fixRemoved unimplemented decoders from filters test(BJL)
  • fixFixed several LZW decode bugs introduced after 0.5.6(BJL)
  • fixFixed bugs relating to processing out of spec PDF's with bad # escaping in the name ("java.io.IOException: Error: expected hex number" bug)(BJL)
  • fixFixed Lucene UID generation bug(BJL)
  • fixFixed GetFontWidths null pointer exception bug(BJL)

Version 0.6.1 (3/9/2003)

Changes to the Code Base

  • fixFixed bug in parsing stream objects which led to "Unexpected end of ZLIB input stream"(BJL)
  • fixChanged license from LGPL to BSD to allow pdfbox to be used easily in Apache projects(BJL)

Version 0.6.0 (3/5/2003)

Changes to the Code Base

  • fixMassive improvements to memory footprint(BJL)
  • fixMust call close() on the COSDocument(LucenePDFDocument does this for you)(BJL)
  • fixReally fixed the bug where small documents were not being indexed(BJL)
  • fixFixed bug where no whitespace existed between obj and start of object. Exception in thread "main" java.io.IOException: expected='obj' actual='obj<</Pro(BJL)
  • fixFixed issue with spacing where textLineMatrix was not being copied properly(BJL)
  • fixFixed 'bug' where parsing would fail with some pdfs with double endobj definitions(BJL)
  • addAdded PDF document summary fields to the lucene document(BJL)

Version 0.5.6 (11/28/2002)

Changes to the Code Base

  • addFixed bug in LucenePDFDocument where stream was not being closed and small documents were not being indexed (BJL)
  • addFixed a spacing issue for some PDF documents (BJL)
  • addFixed error while parsing the version number (BJL)
  • addFixed NullPointer in persistence example (BJL)
  • addCreate example lucene IndexFiles class which models the demo from lucene (BJL)
  • addFixed bug where garbage at the end of file caused an infinite loop (BJL)
  • addFixed bug in parsing boolean values with stuff at the end like "true>>" (BJL)

Version 0.5.5 (10/03/2002)

Changes to the Code Base

  • addAdded example of printing document signature(BJL)
  • addAdded example to print out form fields values(BJL)
  • fixFixed bug when appending documents(BJL)
  • fixVarious other bug fixes(BJL)

Version 0.5.4 (09/17/2002)

Changes to the Code Base

  • fixFixed bug in text output where '?' instead of the proper character(BJL)
  • fixFixed bug where sections of text were not being output at all(BJL)

Version 0.5.3 (09/13/2002)

Changes to the Code Base

  • fixFixed bug in 128 bit encryption(BJL)

Version 0.5.2 (09/06/2002)

Changes to the Code Base

  • fixFixed bug where FDF documents could not be appended to PDF Documents(BJL)
  • updateCatch all NumberFormatExceptions and wrap them with IOExceptions(BJL)

Version 0.5.1 (09/04/2002)

Changes to the Code Base

  • addNow supports unicode for the document summary(BJL)
  • updateBetter support for Type0 fonts(BJL)
  • fixFixed bug with an empty LZW stream(BJL)
  • fixFixed parsing error for ID operator(BJL)

Version 0.5.0 (08/31/2002)

Changes to the Code Base

  • addNow supports unicode for the document summary(BJL)
  • updateBetter support for Type0 fonts(BJL)
  • fixFixed bug with an empty LZW stream(BJL)
  • fixFixed parsing error for ID operator(BJL)

Version 0.4.1 (07/25/2002)

Changes to the Code Base

  • fixFixed bug where .notdef was being output as document text(BJL)

Version 0.4.0 (07/23/2002)

Changes to the Code Base

  • addAdded extract text ant task(BJL)
  • addImplemented AFM(Adobe Font Metrics) resource loading(BJL)
  • fixFixed numerous bugs submitted by users(BJL)
  • updateChanged project from pdfparser to pdfbox to better reflect future needs(BJL)

Version 0.3.0 (07/09/2002)

Changes to the Code Base

  • addAdded indexer for the lucene project(BJL)
  • fixInitial implementation of PDF encryption(not working yet)(BJL)

Version 0.2.0 (06/03/2002)

Changes to the Code Base

  • addAdded support for the various encodings(BJL)
  • fixImproved the accuracy of the text output(BJL)

Version 0.1.0 (05/25/2002)

Changes to the Code Base

  • addInitial Version(BJL)