1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
|
Core:
* error reporting facilities
* add support for different character sets (to 'all' extractors)
* add support for passing options to extractors
'Unclean' code:
* QT
* ASF
* RPM
Incomplete code (missing features):
* RIFF (idx1 attribute)
* IDv2{3,4} (some attributes)
* StarOffice sdw (some attributes, see doc/)
* man pages (interpret sections for authors, brief description)
* pdf: full-text extraction!
Desirable missing formats:
* mbox / various e-mail formats
* info pages (scan for 'Node: %s^?ID' - see end of .info files!)
* sources (Java, C, C++, see doxygen!)
* EXIF (www.exif.org)
* a.out (== ar?)
* rtf
* EXE
* APEv2 (MPC file format, www.personal.uni-jena.de/~pfk/mpp/sv8/apetag.html)
* PRC (Palm module, http://web.mit.edu/tytso/www/pilot/prc-format.html)
* KOffice
* TGA
==============
UTF-8 conversion (only listing what is left to do):
* DVI: special headers are in what format? (rest is ASCII)
* SDW: needs to be done (need info about charsets)
* JPEG: presumably ASCII (or not specified)
* PS?
* WAV?
* ZIP?
* TAR?
* RIFF?
* MAN: presumably ASCII/Utf-8
* DEB: to be done
* ASF: ?
* HTML: to be done
* OLE2: to be done
* OO: to be done
|