File: ChangeLog

package info (click to toggle)
docx2txt 1.4-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 188 kB
  • sloc: perl: 391; sh: 49; makefile: 35
file content (138 lines) | stat: -rw-r--r-- 5,204 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
v1.4 : 15/05/2014

New feature:
- Added configuration variable config_unzip_opts. This removes dependency on
  unzip program, and allows users to use unzipping programs like 7z, pkzipc,
  winzip as well.

Updates:
- Fixed list numbering.
- Improved list/paragraph indentation and corresponding code.
- Updated README with brief guidance on how this utility can be used to recover
  text from corrupted docx file.


v1.3 : 07/04/2014

New feature:
- Added support for handling lists (bullet, decimal, letter, roman) along
  with (attempt at) indentation.

Updates:
- Added configuration variable config_twipsPerChar.
- Removed configuration variables: config_listIndent, config_exp_extra_deEscape.
- Text output omits deleted text. This matters in case changes are being
  tracked in docx document.
- Text output omits non-document_text content marked by wp/wp14 tags.


v1.2 : 15/01/2012

New features:
- Perl script usage is extended to accept docx file from standard input. It also
  works with input/output redirection now. Please refer to the documentation for
  more information.
- Script files and configuration file can be installed in separate directories
  on (non-Windows) systems using Makefile for installation.
- Linux Makefile also attempts to update the system configuration directory to
  desired directory in installed Perl script.
- User specific and system wide configuration files can be maintained separately
  even on windows.

Updates:
- "-h" has to be given as the first argument to Perl script to get usage help.
- Added new configuration variable "config_tempDir".
- Configuration file is uniformly looked for in current directory, user
  configuration directory (APPDATA on Windows and HOME on non-Windows), system
  configuration directory (same location as script files on Windows, /etc or as
  set during installation on non-Windows systems) in the specified order.
- Documentation has been updated with usage examples and information on how
  .docx file text content can directly be viewed using Vim and Emacs editors.
- Improved handling of special (non-text) characters, along with support for
  more non-text characters like fractions.
- Fixed Bug #3463033: added ' and " to docx specific escape character
  conversions.
- Fixed the wrong code that had got committed during earlier fixing of
  nullDevice for Cygwin.


v1.1 : 11/12/2011

New features:
- Added a check for existence of unzip command.
- Configuration file is looked for in HOME directory as well.

Updates:
- Configuration variables now begin with config_ .
- Fixed bugs #3003903, #3082018 and #3082035.
- Fixed nulldevice for Cygwin.
- Superscripted cross-references are placed within [...] now.


v1.0 : 04/10/2009

New features:
- Input argument can also be a directory holding the unzipped content of .docx
  file.
- Windows wrapper script, and support for using CakeCmd command line unzipper.
- Configuration file support for easy control over settings.
- Windows installation script.

Updates:
- Hyperlink is not displayed if hyperlink and hyperlinked text are same, even
  though user has enabled hyperlink display.
- Improved handling of short line justification, capturing many cases that were
  missed in earlier approach.
- Path names containing spaces are now handled.

Please refer to the updated documentation for more details.


v0.4 : 06/09/2009

New features: [suggestions from "Sergei Kulakov (sergei>AT<dewia>DOT<com)"].
- user can control display of hyperlink along with linked text.
- TOC related cleanup. TOC was not addressed so far.

Updates:
- many new character conversions (check the script code for details).
- character conversion mappings are now organised in a tabular form.
- currency characters are converted to respective full currency name.
- code tweaks to speedup the conversion process.


v0.3 : 23/09/2008

New features:
- center and right justification of text fitting in a line of (adjustible) 80
  columns.
- indicating hyperlinked text along with the hyperlink.
- BSD makefile [Thanks to "Rene Maroufi" (info>AT<maroufi>DOT<net) for giving
  guest access on an OpenBSD host for it].

Please refer to the release documentation for details.
- docx2txt.pl invocation has been changed a little,
- user involvement during installation is reduced.
- some suggestions on how Windows users can use this tool.


v0.2 : 15/08/2008

Docx text extraction can now be done in two ways (check version README for
further details).
- docx2txt.sh file.docx
- docx2txt.pl infile.docx outfile.txt 


v0.1 : 10/08/2008

Initial Sourceforge release with attempts to handle following features during
text extraction.
- horizontal ruler, line breaks, paragraphs separation, tabs
- naive nested list formatting - assumed 8 level nesting, however if you want
  to deal with further nesting, play comment-uncomment in perl script. :)
- capitalisation of text blocks i.e. in document.xml text is stored either as
  lowercase or in mixed case, but in corresponding text files generated by
  MSOffice it comes as all caps.
- character conversions (" ' < & > - ... etc.). Euro character is converted to
  E, however you can change this behaviour by comment-uncomment in perl script.