File: TODO

package info (click to toggle)
libextractor 0.4.2-2sarge6
  • links: PTS
  • area: main
  • in suites: sarge
  • size: 26,048 kB
  • ctags: 4,689
  • sloc: ansic: 24,558; cpp: 17,181; sh: 11,543; makefile: 689; java: 159; sed: 16; python: 10
file content (47 lines) | stat: -rw-r--r-- 1,162 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Core:
* error reporting facilities
* add support for different character sets (to 'all' extractors)
* add support for passing options to extractors

'Unclean' code:
* QT
* ASF
* RPM

Incomplete code (missing features):
* RIFF (idx1 attribute)
* IDv2{3,4} (some attributes)
* StarOffice sdw (some attributes, see doc/)
* man pages (interpret sections for authors, brief description)
* pdf: full-text extraction!

Desirable missing formats:
* mbox / various e-mail formats
* info pages (scan for 'Node: %s^?ID' - see end of .info files!)
* sources (Java, C, C++, see doxygen!)
* EXIF (www.exif.org)
* a.out (== ar?)
* rtf 
* EXE
* APEv2 (MPC file format, www.personal.uni-jena.de/~pfk/mpp/sv8/apetag.html)
* PRC (Palm module, http://web.mit.edu/tytso/www/pilot/prc-format.html)
* KOffice
* TGA

==============

UTF-8 conversion (only listing what is left to do):
* DVI: special headers are in what format? (rest is ASCII)
* SDW: needs to be done (need info about charsets)
* JPEG: presumably ASCII (or not specified)
* PS?
* WAV?
* ZIP?
* TAR?
* RIFF?
* MAN: presumably ASCII/Utf-8
* DEB: to be done
* ASF: ?
* HTML: to be done
* OLE2: to be done
* OO: to be done