File: ToDo

package info (click to toggle)
docx2txt 1.4-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 188 kB
  • sloc: perl: 391; sh: 49; makefile: 35
file content (13 lines) | stat: -rw-r--r-- 797 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
1. Heuristics based cleanup of damaged document content. [Looking for more test samples.]

2. Extract images. Now there has been a user request as well. [target pre v2.0]
3. Handle footnotes.
4. Improve table and short line justification handling. Ideally table columns
   in a single row should be separated by pipe. Short line justification needs
   to be adjusted to situations when tab occurs in line. A quick look into these
   issues suggests that logic/code will need to be reorganised to handle these.

5. Create a simple manpage, hopefully after resolving footnote and list issues.
6. Implement simple state-machine for speedup [partially worked towards it].
7. XML parsing??? and making things more efficient. When it has matured enough,
   may be a C/C++ version should be looked into.