Changes to to 0.5.43
* an improved wvLaTeX.xml by email@example.com
* added some of the older 0x08 word 6 stuff to it.
* marvellous set of patches from firstname.lastname@example.org to do
a load of speedups. Including some chpx and papx page caching,
some replacement of unneeded byte by byte reads, and some
element by element copies. Plus a very spiffy token table lookup
scheme which speeds things up a lot.
* some fixes to parse the old word graphic file format, I cannot
use very much of it, but at least I don't crash on it anymore.
* added --dir option to wvHtml so that pictures can be placed in
a seperate directory
* removed some more unnecessary element by element copies
* found the lengths for word7 sprms of 111 112 and 113, but i
dont know what they do, nonetheless they are now defused and
* make configure.in test for memcpy as well and use bcopy if
* define ssize_t in config.h if unistd.h is not available
* mem leaks removed
* made expat use the byteordering results for faster working,
will default correctly to nothing if this cannot be determined
due to cross compiling.
* implemented TIME field
* remove unnecessary expat subdirs
* title done in the correct charset
* implemented HYPERLINK field
* implemented PAGEREF field
* added bookmarks to wvParseStruct
* wmf files are decompressed and extracted correctly
* unicode in FIELDS.
* allow latex or html or raw text conversion options.
* more graphic checks, more escher records etc etc.
* promote old graphics into escher format.
* decompress wmf etc.
* get libwmf back into action and get picts together as well.
* attempt to remove completely empty para's with the IsEmpty code
* Got to think of some way to keep the list stuff in sync with the margins
of the enclosed paragraphs in html output.
* need to add a color Auto thing that figures out what color goes against
what fore or back ground.
* put the default text and char stuff into one single "style", and then
try and figure out a way to save arbitrary no of styles into some nice
structure for later seaching and stylesheet implementation.
* style overrides for each word style, example idea in wvHtml.xml
* comment/footnote and endnote support.
* fully implement tablelooks, the flags in particular, and maybe the
colors for the bg should be farmed out to sperate config options rather
than piggy backing on the other colors, also why did one of the text
foregrounds not work on its own, while the others did, and is there
one of two cases where our grid will not handle every case ?
* add another option for table width, i.e. tableabswidth.
* we need character formatting for the <li> itself, we can do this I
suppose, but i'll hold off on that for a while.
* fontface and size for char runs..
* what about a collision between underline and revision added ? and also
the strike and revision deleted ?, what if a mad user created his own
collisions. In the future there will be problems with links being broken
by references, this is a similar problem.
* install cygwin thingy at home and test configure mechanism for searching
and installing strcasecmp/rint.
* i figured out what the story is with ole embedded files, ill have to
modify my ole code so as to be able in the future to parse embedded docs
and splice them together, which could be a wee bit of a challenge.
* modify configure script so its possible to link against a different
expat lib, or to disable it or something.
* test that continous sections and endnotes at the end of section, and
other things like that do what i think they should do.
* placement of footnotes, what does "treat like endnotes" really mean ?
* make sure captions are alright, especially formatting.
* bits for anld are wrong
* bookmarks embedded in html tags break them, constructs such as e.g
<A href="stuff">stuf<a name="here">f</a></A> are being output even though
thats well wrong in html.
* convert the cross-referenced "above/below", into hyperlinked above and
* optional support for specifying special fonts, not recommended for use
on publishing for internet sites, but useful for internal use for those
of you who have done the funky chicken dance with unix netscape to work
with ms winding etc fonts or are using ie/netscape on windows.
* all the fields, document background colour.
* gnome canvas wysiwyg viewer, output to ps from this
* use incremental zlib functions to do decompressing rather than use mmap,
someone who doesnt have mmap on their system can send me a patch for this one
* doesnt compile under neXt & needs to use gcc for hpux 10.20 ?
* put code thats in both simple and complex together.
* do an autoconf check for mman.h and dont do compression if not there.
* maybe someday we should use #pragme pack(1) if we are being compiled
with gcc under a little endian platform. That might gain a speed up ?
Changes to to 0.5.42
* temporary bmp for older word6/7 document and legacy
structures appears to be working.
* sprmCPicLocation was one too short for word6/7, strange.
* picf modified to use older word6/7 version as well.
* some modifications so that it can handle documents with
incomplete bte tables. This is only in fullsave, because I
doubt the logic behind what ever program is creating them!
Its bloody insane, but im going to support it coz word can
* only put in the paraborders if we need them, makes the
html output smaller, and more importantly works around
a netscape bug where is para indent to the right is x, and
the first line indent to the left is x, and there is a border
(even if of type none!) then the para is indented too far to
the left. This is bug 1524.43 Rules.doc
* supported 0x01 graphics ala broken/001-TETMEI.doc
0x01 graphics are making their way back in, and are looking
better than the old code already.
* fields can be embedded in each other, so the field
ignorer is now capable of realizing this.
* all 0x01 bitmap formats are looking good.
* some 0x08 bitmap formats are coming through correctly as well
* bug in Huge handling
Changes to to 0.5.41
* attempting to support 8 bit russian cp1251 docs as well.
* there is an extra argument to the character handler, this
is the lid of the character. the Language identifier.
* made some changes to the build so that it will work build
correctly outside the source tree.
* added a small iconv implementation which follow the same syntax
as the ordinary iconv. We *must* be able to convert from windows
codepages into unicode, it doesnt matter about the reverse direction
at all. If the native iconv can do this then we use that, if
the native iconv cannot, or does not exist we use our own iconv
which can only handle a conversion from windows codepages into unicode,
* So currently we can always output in utf-8 from just about whatever
input charset word hits us with.
* removed the unnecessary symbolfont dir
* made some more mods so that we convert into 16bit unicode from
all the codepages, we also must convert from 16bit unicode into
all the current outputs such as tis and koi and iso-5589 and also
* I have had the wrong name for my own charset all along :-), a
bit dyslexic of me, iso-8859-15, NOT iso-5589-15 !
* change the charset all the way through the system to a string
so that we can use everything that works with a systems iconv.
* removed unnecessary paramater to wvOutputText
* hooked up all the output system through output, i.e. the
title gets printed the same way as the body text.
* changes to Makefiles to make it build outside of its own
Changes to to 0.5.40
* took a patch from Mitch Davis <email@example.com> to change PAGESIZE to
WV_PAGESIZE, this define already exists under HPUX (oops), and modify
-I./ to -I. which supposedly makes a difference.
* output title in the same output charset as the rest of the document.
* inserted a hack to force lists to end before </td>, rather than
after the </td>
* made a fix to setting the chp istd correctly after an initialization
* the style 10 (Normal) is Generated first if possible, as other styles
(illegally i think) depend on it in the style generation code.
* tables and list were interacting badly with eachother to create invalid
html and incorrect numbering, fixed this.
* doubled up the alignment tag with div align as well as the style
assignment as netscape is having problems with short paragraph alignment.
* made some changes so that the first list start no is always 1 rather
than programmer 0 :-).
* add a <br> as a section break to wvHtml.xml, sometimes a heading
starts after a section break, but because of no <p> it ends up in a
* hacked in some sanity checks to swap between unicode and 8bit in the
stylesheet names, some mac docs are using 8bit names in word8 files.
* hacked in a mechanism to fake a section the size of the document if
there are no sections in the section listing, like there always is
except for some strange mac word8 docs that I received.
* an attempt to make nfc's more like liststartnos so that sublists that
start > 1 levels below the last list entry have the correct nfc code.
* forced a paraend in html mode to close off any open lists
* I wasted a *lot* of time getting multilevel lists to do exactly
the right thing, and to get them html complient. I now submit that
the problem is really actually quite a toughy without scanning the
entire list before printing it (which i do do with tables ). The
interpretation of html lists doesnt help the matter, its *close*
to what I want but just far enough away to be useless, i.e.
and this gives
I reckon it should be
What we are currently using is the incorrect
Which is not optimum but the best we can do without scanning through the entire list
before printing a single entry. Attempting to see if a list entry will ever be
used, and if not then bumping up the start value by 1. Noone will notice the
incorrect values for the most part. I may at a later date sidestep the issue by allowing
the list entries to be output as ordinary text and be damned with html list
* It became necessary to duplicate the paraending code for the end of a piece
in the simple mode as well as complex. THe simple code is now almost exactly
the same as the complex, ah well.
* I believe I have correctly worked out how to determine when word 6 and 7 files use
Changes to to 0.5.39
* made a new wvHtml conversion page, looks nice to me, online bug listing,
its hardly a bugzilla bug it serves better for my needs.
* added placeholder.png and wvOnline.xml to cvs, neither of which are of
any real importance except for the interim.
* added <filename/> variable, handy for the online converter.
* added three sprms of (now) known length and unknown purpose to word7
* NONE of the word documents that I have (4747 of them, 556Megs) now crash
with the current version, this is not to say there there are not serious
crashable bugs, or that the output is sane, just that it is now quite
* versioning enum extended and renumbered to handle all word formats in
the future, hardcoded 0 and 1 changed to WORD8 and WORD6.
* finally hacked in preliminary stylesheet code to get the dependancies
in the correct order, its a bit crufty (!), but it does the trick for
Changes to to 0.5.38
* added the symbol mapping to unicode as best as I could, I made one or two
mods from the proper unicode so as to get a few more to work with the
current generation of web browsers. very bad behaviour I know and the sort
of stuff that got the world into this mess, but at least you can recompile
wv at a later to date to fix it, replace the commented out bits of symbol.c
to do it.
* added messages for conversion table request for special fonts (the spawn
of the devil as far as I am concerned).
* added a character property end and start at the beginning of a new
paragraph, this is necessary in many cases, funny i never noticed it before
* figured out some rules to handle placement of graphics, abandoned
stylesheet placement as netscape is too much of a mess to be of any use
there, and thats the target audience.
* the CHP code didnt work for word 7 and 8 sprms, this ironically means
that rather than falling through the default case and being ignored, each
chp sprm is now parsed leading to certainly more crashes and bugs as we
find differences one by one between word 8 and the previous versions
character property sprms. * fixed sprmCSymbol for pre word 8, there might
be problems with fonts not named "Symbol", like wingdings.
* due to serious oddities I have added a TABLEOVERRIDES option in
wvHtml.xml which allows the margins before and after and paragraph, and the
first line indent to be turned off inside tables, as having them on creates
a real mess in netscape, in the future when this ability is supported by
browsers you can just remove tableoverrides and ta-da all will work fine.
* fixed table row scanner bug
* fix for last para scan in complex mode
* make mods to table.c to allow cells within 3 units of eachother be
considered the same.
* hmm, added a workaround for missing the beginning of a para in complex
mode under certain conditions.
* some incredible hackery to differciate between 16 and 8 bit character
modes in word 95 and 6, real dodgy stuff, but its working so far. Though
its certainly a point of failure in the future.
* fix for table colspan mistakes
* modified sprmTTableBorder to work with the smaller word 6 BRC's, also
fixed bug where I thought the sprm was variable
* had to fix sprmTTableBorder again, because it *is* variable under word 8
despite the docs to the contrary !!, gagh.
* aaaargh!, wvGetLFOLVL and that wvInvalidLFOLVL has struck again, this
time I think I have it sorted out once and for all (but i bet not), this
new layout fixes quite a number of crashes.
* incredibly hard to find overflow in U8 in wvGetPAPX, silly me, must
really pay more attention to these things, you tend to forget that U8 are
a really small type, left to my own devices i'd use int, but for this
program I slavishly follow the types in the spec, and overlook the
workarounds that are obvious in the struct definitions for PAPX.
* fixed the rather ugly empty paragraph skipping code to only go to the
next cell when a para level check is done.
Im having terrible problems with sprmDefTableShd, it always follows
sprmDefTable, and there is something wrong with Shd, maybe its me, maybe
its word. Either way im working around the problem.
* I had broken the word97 decryption, fixed again.
* cleaned things up to create a version enum and associated obvious names
with the versions so that its more obvious to read and more extendable,
encryption is marked in the version by the base version ored with 0x8000.
* some mods to the old list conversion to new list format, removes at
least one crash and might solve others, possibly not a full solution.
* added html names for umlautte characters.
* found a sprm 0x6646 which appears to be 0x6645 HugePAPX where the papx
is stored in the data stream, it only occurs for PAP's and only for FKP
papx's. Nonetheless it has requireed the addition of a data file stream
argument to many sprm related functions, nearly always NULL except for
fkp PAP papx's.
* sprmPHugePapx implemented, another nasty bug fixed because of this
Changes to to 0.5.37
* para indentation, first line indentation, top bottom left and right
* border code started, mountain of tags included.
* border color added for paragraphs
* we can handle individual sides to the border rather than just taking the
top for all sides.
* supporting brcBetween required that we repeat the table style lookahead
for brc's as well, this is very annoying, and seeing as netscape doesnt even allow
margin support correctly I hate putting it in as noone can use it, makes me feel
more complete to support it myself though, maybe mozilla will sort this out.
Changes to to 0.5.36
* mem leaks plugged, word 6 and 7 section sprms added correctly.
* crushed a few out by ones, twos and threes :-), flattened a few
more pesky buglets and leaks.
* purify now reports no problems of any kind with any of the examples
* modified simple and complex saves to only do the *main* body text,
no comment and footnotes etc being put in when they shouldnt.
* made some headway into understanding undocumented version
information. Theres 22 ununderstood bytes per version.
* added wvSetSpecialCharHandler, special chars now have their own
seperate callback which feeds you the char and the associated CHP,
this might require some more work.
* doh !, made silly mistake with ending the doc at the ccpText limit.
* added wvGetGrpXst converts strings groups to nice STTBF's
* added wvGetBKL_PLCF
* implemented the COMMENT BEGIN and END, and applied it to the "simple code",
the actual comment itself and so on is part of the subdocuments which are
not implemented yet, but will be soon.
* overlooked complex sep properties, added them in.
* added dirty tag to the elehandler, 1 means that the property might (more
than likely it is) be modified from the original style as indicated by the
istd, this is implemented in simple and complex for PAP,CHP and SEP.
Changes to to 0.5.35
* wvToggle was still in todo, it was fixed a while ago
* made decode_complex stsh into a ps->stsh, I obviously missed
it before, this is the problem with having both decode_complex and
decode_simple with unshared components, gagh!
* complex mode tables might even work now.
* table relative widths now in as well, percentage of screen, uses
the sep, so sep has been put into the expand_data struct, which needs
to be cleaned up, i propose putting chp pap and sep into wvParseStruct
rather than props, and making expand_data have a pointer to wvparsestruct
* fixed section code begin and end crash.
* complex mode colspan and rowspan, and various support functions.
* wvHtml can find the config file on its own, and has a command line
option (-x) to find it.
* some fixes to wvConvert which I havnt looked at in a while, so as
to get it to work, and to include the password and other changes that
made their way into wvHtml.
Changes to to 0.5.34
* added cellwidth percentage as well to wvHtml.xml to make the cells the
same ratio width in html as in the original word.
* changed Dk Colors to Dark Colors in wvHtml.xml to get the right colors
* tested rowspan lots and lots on examples.
* tested colspan.
* wow my god !, there can be either no tc's in sprmTDefTable, word 6 ones,
or word 8 ones, you have to work it out depending on the length of the
* small doc addition by Karl F. Larsen <firstname.lastname@example.org>
* tweaked wvCharBegin to ignore empty rowend paragraphs, squeaks us past
the html validation service :-)
* basic tablelooks implemented, and basic background color for cells
* changed a few more colours in wvHtml.xml to get ones that work in
netscape, must change them all to #?????? values.
* removed signal and wait stuff from configure
* added searching for wvHtml.xml, i also install it, so you can wrap
this in a rpm and it will work fine for the average user.
Changes to to 0.5.33
* tested row and col span with fullsave and fixed many
many bugs, sprmTDefTable is not as simple as it looks.
Changes to to 0.5.32
* multilevel word6/95 lists appear to work fine, needs verification
* use new cellwidth thing in wvHtml, wvConvert and wvConfig
* colspan probably works in general at least
* AHA!!!!, sprmTDefTable contains some TC structs, *but* only 10 bytes
are allocated for each one, a word 8 TC is 20 bytes long, a word 6 TC
is 10 bytes long, so we can point out another location where word 8
is disconnected from its own spec completely, gagh!!!!
* completed rowspan support, now wvParseStruct and expand_data are
exceedingly messy, and theres a stack of static in wvConfig, it
might be a good idea to move them into one location and stuff this
pointers from the parsestruct to the expand_data struct nonsense
Changes to to 0.5.31
* wvSummary bug fix.
* word 95 decrypting from the password added as well !, theres no
stopping me somedays :-). Though I have to verify that, as its a
bit messy and some bits of it might be unnecessary, and i have no
idea how nonenglish languages might affect it. And maybe its
based on the percularities of one particular word95 program that
I have. Also it would be reasonably easy to make a password cracker
for word95 instead of requiring a password to be added in.
Changes to to 0.5.30
* removed crypt dir and references to it.
* removed crypt from configure script
* made check so as not to close NULL FILE * in decrypt.c
* modified decrypt.c to be big endian safe, in this vein
and in an attempt to make it more readable I have used the
standard md5 code snarfed from the rfc instead of the
original md5 code, its all the same in function, just endian
* modified the SetPassword and password string promotion to
be a utf-8 to unicode conversion, this of course will only
work if the input, like an xterm, supports utf-8, in any
other case its exactly the same as an ascii to unicode, so
its the same as ever, except I feel a lot better at least
in theory supporting the full unicode suite.
Changes to to 0.5.29
* NUMBER 1 CHANGE: we now have the ability to decrypt word 97 documents, yippee!!
* more koi8.c changes from Sergey V. Udaltsov <email@example.com>
* removed all the lex rubbish, and took mswordview itself out of the default
* some changes to semisupport word 95/6 lists, it does appear that word 95
lists are the exact same as word 6 lists.
* word 6 and 95 lists were different, and there is supposedly cases there the
word 97 can use word 6 style lists, though its supposedly unlikely.
* We have a problem with word6/95 lists, while we have the information about
each list entry, i cannot figure out how to tell if one particular entry
belongs to a particular list, i.e. I can quite happily pump out lists where
every entry is a seperate list consisting of a single entry, this is very
annoying. As a temporary measure I have done a checksum on the list information
and if the checksum is the same as another entry, then I assume that it is a
member of the same list, it works so far on very very simple lists, and I
imagine that it will explode when i investigate more complex word6/95 lists.
* now lists...
lists come with number information and also with character formatting
which applies to the number text itself, and paragraph information
that applies to the paragraph that is the list entry itself. Every
list entry is a paragraph.
So if we are not interested in the character properties of the
number text itself we can quite happily convert the list into
html with numbering correct and so on.
If we want the char formatting of the number text we have to
loose the html correctness of list handling.
The other final case is those weird windows symbols that might be
used, we cannot do them in correct html, they must either use
the three symbols available to use, or just become bullets.
We can apply the para stuff to the actual paragraph and some
checking shows that a div is a valid element to put in a list
so thats what I have done
* with the word6 list problem, I have been unable in word95 to create
a list underneath another list with the exact same formatting without
putting a space between, I have also been unable to create a list
to continues from another list. In short I cannot create a list that
can break the admittedly insanely hacked mechanism I have devised to
leverage word6 lists into the word97 model used internally by the wv
* some mods to make multilevel word6/95 lists work correctly, completely mad
stuff entirely, dragons be here and so on.
* minor change to summary.c to allow slightly dodgy but ok docs through the
system, happens with msword version 6.0.1 ( a mac version ?)
* explicit ul end ala ol end, if the para is the last para of the doc.
Changes to to 0.5.28
* added sprmPNLvlAnm into sprm.c for compatibility with word6/95
* sorted out where there are two lists under each other at the same
level but of different types.
* Now the list code has become very tied down to being html output, i
have been keeping things reasonably flexible with the config file, ah
well, its not a serious problem at all.
* well now interesting, supported-list-features.doc is now a very
bloody awkware set of lists, and its encouraging to note that word97
makes a real mess out of it. While an argument can be made that there
should not be a seperate para for each <li> element, compare the
word97 output against the wvHtml output. word97 restarts each of the
lists from scratch, hur hur.
* removed lex dependancies from the Makefile, and split some of the
olderstuff into temporary old* files, which will all be removed one
* make does not make mswordview by default, time to wean everyone off
* mswordview itself probably doesnt work anymore, use the stable
version if you want this program
Changes to to 0.5.27
* expanded the list info wvParseStruct to include all of the structures.
* made the stylesheet code safe, but its a fix until i do the out of
sequence istd initalizing correctly.
* removed blank line from expat Makefile, Keith Wear <firstname.lastname@example.org>
* get list info extracted, make ul vs ol descision, get entry begin
* continued with lists, maybe change struct to include chp and pap simultaneously
as i might need it for the lists, extract start value for html, and number nfc
to use as well, for the case of symbols (nfc tells me i think ?) swap to ul
rather than ol, thus we need a reciprocol mechanism in the config file.
* lists look good, releaseing to the world
Changes to to 0.5.26
* some checking showed that I had the wrong name for the koi encoding,
koi8-r is the correct name, and ive changed it to that.
* wvHtml dumps graphics, and wvGraphicConvert is a standlone little app
for hacking purposes to open up graphics to external hackers.
Changes to to 0.5.25
* added date and author id to revisions, found bug in DTTM. added
wvDTTMtoUnix to dttm.c
* added animations to config file as blink (hur hur)
* i added (even though i have no idea what it is) DispFldRMark to everywhere
* that basically completes everything in the chp that makes any kind of sense
in html except for font face and size.
* well seeing as the output passes the w3c validator test, the html output
be default announces this fact.
* added charset option to wvHtml, documented in new wvHtml.1 manpage
* added koi-8 from Sergey V. Udaltsov <email@example.com> and added
a howto in notes/internationalization/Charsets-HOWTO
* changed lists to be html 4 complient.
Changes to to 0.5.24
* righteo, I made some (hopefully) final changes to fast saved handling, and
it looks a lot better now. Char attributes are correct, and the issue
of para begins and ends being missing from paras that begin in fastsave section
appears to be cleared up. There is still spurious character runs being created
in this location, but they appear within paragraph blocks, not outside them,
and they have no contents so they only create reduntant tags in the html output,
or in the case of the lib makes it more inefficient. So its not 100% but its close
enough that it'll make absolutely not difference in the case of an abiword-like
app, and only someone looking at the source of the html output will make rude i
noises about how crap and ineffecient wv is because it outputs empty tags. So the
bottom line it that it is a known misfeature that in the case of fastsaved files
that there is the duplication of empty char attributes in a small limited number
of cases. If you really dislike this, then set options in msword to only create
fullsaved files, which you should do anyway, because thats the major reason your
word documents are so huge if you ever wondered about that, and its also a huge
security hole, e.g. if you edited a confidential document to remove the confidential
bits, then you can edit the doc with a hex editor and read all the deleted confidential
material !. At some stage i believe i might add a feature to show the original
document that a fastsaved document was based on, it can sometimes scare you to
* my resetting char properties at a new para was slightly out, i wasn't fully
regenerating the exception run limits.
Changes to to 0.5.23
* added RMarkDel & strike & outline to wvHtml support, handle empty tags
correctly now as well.
* added lowercase, shadow,vanish, rmark,caps, outline and smallcaps to
wvHtml, though many are empty and caps,smallcaps and lowercase need
further code to actually do the deed
* added includedir to mkinstalldir list, coz of (Marko Rauhamaa)
* the toggle (cases 128 and 129 for fBold and loads of others), works by
taking a look at the original style that the current one is based on. It
was until now not actually looking at the original one at all, but the
current one, thats fixed now.
* another one was that if we were based upon a char style we weren't
getting initalized correctly at all, this too is fixed.
* changes have also been made to sprmCMajority and sprmCMajority50 along
a similiar line. These three or 4 changes together make a huge difference
to the output. So this should clear up a *mountain* of mismatched output,
i'm so proud, the best way to track down these differences is to take a
fastsaved file and save it as fullsave and compare wv output for the
* colour in html output.
* hmm, real real stupid thing in fastsaved mode where i was completely
fecking up the fcLim by changing it in a subfunc and then thinking that
it was the original and using it as that again!
Changes to to 0.5.22
* new development release
Changes to to 0.5.21
* fix for bad sprm handlers so font changes now occur.
* fix for having no summary stream in wvSummary.
* added protection support for istd out of sequence, we should in the
future handle them correctly
* added simple word95 file support, gets all text correctly and at least
pretends to get the paragraph properties, needs much much checking, i
treated them exactly the same as word6 and that appears to work reasonably
* I have added a sample import filter for abiword in the abi dir, basically
it's up to the abi folk to integrate that in at their leisure.
* added contents to sep.c anlv.c & olst.c
* fixed the length of sprmTDefTable, solves some word6 crashes.
* finally noticed that the BRC is of a different len and layout with word 6
* note to self, the EatSprm only works for true word97 features, ones that d
in word6 and 95 have to implemented or things will crash, this is not a real
problem as all these sprms should be implemented one by one.
* found two TAP sprm's that differ from 6 to 8 and have updated.
* implemented sprmCLid which doesnt exist in word97 but does on older vers
* added ole code to viewer.
* the program named mswordview is depreciated, it still does far more than
wvHtml but this is a warning that wvHtml is the new html converter for
msword docs. wvConvert is a generic converter that currently defaults
to abiword xml so that i can examine a richer set of properties, I wonder
how generic i have actually made it, a tex converter would be nice
* wvHtml now uses html output so < & > will work now, i had overlooked
that aspect (whops), my focus was on other types of properties,
wvHtmlOutputChar might need more work, keep an eye on it.
* stuck a stack of structs that i havn't used yet into the header files,
and some implementations of readers that i might need someday :-)
* added char properties (Justin did all of this one, and good stuff it is too)
* merged together two vers
* finish SEP, and friends, added a mountain of structs, the remainer of
what was not already in the header file, and added some stub files for
* added simple file support for Section begins and ends, moved the
char handling code around a slight bit so as to be in a nice looking order
* continued sections in complex mode, brought my standalone abiword converter up
to speed with sections.
* implemented all of the SEP sprms, word 6 conflicts not fully checked yet.
* Jeff@abisource made it more portable by modifying the wvError/wvTrace macros
and putting in defs for rint and strcasecmp.
* purified sep code.
* fixed fastsaved chp init from pap istd (i think)
* fixed finding first para bounds with complex mode if the first para is a
new fast saved chunk (i think)
* ffn sttbf was wrong for word95 & word6, is fixed now.
* Squashed one the bugs that was causing one of my annoying problems with
complex files and incorrect para fcLims. This one was driving me completely mad,
i don't know if i have fixed it fully correct though, but i think so..
* changed laolareplace.old.c to put isprint test at the end.
* added bold and italic char prop handling to simple mode wvHtml
* added bold and italic char prop handling to complex mode wvHtml
Changes to to 0.5.20
* the checking for end of a piece was all wrong, i was looking
at the beginning of the next piece for that information which while
always correct failed horribly in the case of the last piece.
* fixed some more bugs
* fixed wvConvertCPToFC ala end of piece.
* fixed text *after* the final para in simple mode related to above.
* fixed oversight in len of UPX stuff in stylesheet
* fixed some style eating problems.
* cleanup up some bits and pieces with pointers and styles.
* added strcasecmp check and inclusion route.
* more bigfixes throughout chp and friends.
* added a simple fib6 reader that reads into a fib8 struct.
* word 6 doesnt appear to have a sep table stream so we'll have to
look closely at that sort of thing.
* modified STSHI handler to allow word6, modified STD to allow word6
* put in a word6 to word8 sprm converter, might even work. we won't
know for quite a while, implemented for pap and chp.
* reran purify, reworked the binary tree code section for that real
complex chp sprm.
* made the complex pap search start with the current piece, rather
than the next one. Seems to be the right approach.
* fixed a small offset problem in word 6 sprm translations.
* clx now can load in a word 6 complex piecetable (in theory anyway)
* identify word 7 files.
* word 6 thing appears shafted.
* prm complex option was the wrong way around !
* fixed all bugs that cause crashes on doc collection.
* word 6 had to have a seperate BX and fkp and so on for itself, but
now i believe fullsaved word6 files are as supported as word97 files.
* can extract raw text of fastsaved word 6 files..
* and now we can get the para properties of word 6 fast saved files
* basically brought fastsaved up to fullsave quality, though im not 100
happy with them.
* some more purify found problems.
* implemented chpx in stylesheet for word 6.
* did some nasty hackery to munge word 6 chp sprms in word 8 ones, appears
Changes to to 0.5.19
* renamed libwv, and stuck in aviword cvs
* this version probably doesnt work, and almost certainly doesnt do
what it says on the tin, dont use this until i get to at least the next
version, this is basically a cvs test.
./configure --without-zlib --without-ttf --without-xpm --without-wmf --without-x
change gcc to g++ in Makefile
and make a libwv.a suitable for abiword, (yeah i know i know, but im working on it)
to get a simple -lwv
* whoppee, nearly working fine as an abiword filter..
* moved fib into the parsestruct, changed over existing programs to use
wvInitParse rather then handcode for each one.
* mad mods to make it compile cleanly under c++
* changed over the simple decodation to use the parsestruct and
propugated the changes throughout the system
* right, use wvSetCharHandler to set what function will be called with
each character of document text.
* found my word 5 spec, which is a bit of a relief, coz i don't think i could
replace it if i lost it. Made a few copies of it, i need some good ocr software
though as i got it sent to me in scanned in tiff files !, and the original docs
were obviously a bit crumpled.
* we can now read the text of simple word files in abiword
* finalized paragraph element handling
* made wvConvert and wvHtml use new paragraph element handling
* got the plugin to do the same
* compiles fine with g++ as well, which is a bonus.
* created hook into the the charcode in wvOutputText for abiword, and
other lib users.
* created an abiword filter with what we have already, need the ability to
register handlers for events and so on.
* got rid of most of the compile warnings
* we can do now do para props of complex files, though we have to
confirm this as its always a bit flaky (also in old mswordview btw)
Changes to to 0.5.18
* made a release to show off the devel version to the abiword folk.
* modified xml code to unexpand < etc etc, so that i can defer
processessing of some of the tags until later, im probably making a
complete arse of the whole thing, but at least it gives me something
to do, and keeps me out of trouble neh ?
* created a variable expansion mechanism using xml parser, seems ok.
* make wvHtml load up wvHtml.xml and confirm that document begin works
completely fine, and that the title is being expanded.
* do end as well
* attempt the paragraph stuff, and call wvHtml a basic wrap
* so now we can output simple files in very basic html with para noted
correcly, and the title supported, we can do the same for abiword with
document begin/end and para begin/end
* charset supported as well.
* variables (?!) are now <charset/> & <title/>
* right aligned some #defines
* finish adding version var, use purify to find problems with adding entries
to TT table (debug only i believe)
* modify justification so as to call wvExpand again to get the full string
* create an abiword config, got document start and finish and paragraph start
and finish working as well.
* we can now output good html and abiword format docs with basic paragraph
* converted most of the U8 name:s to U32 name:s (non critical), i never knew that
using anything less that an int was not technically correct, well what d'ya
know, some other minor stylistic changes.
* wrote tiny stub of an abiword importer.
* modify OLEdecode to take a FILE * rather than a filename,
* standardized ret codes from OLEdecode.
* added an error explanation table.
Changes to to 0.5.17
* added clx.c, pcd.c, prm.c
* clx.c is the successor to piecetable.c,
* debuged clx
* added GetPHE,fkp.c,bte.c,bx.c
* debugged decode.c, all ok now.
* paragraph begin and end marks now found for full saved files.
* added codepage-1252.c, iso-5589-15.c & text.c
if you want to add your own fontencoding conversion do...
1) add the language name to the charsets enum in wv.h
2) create a function like wvConvert1252Toiso8859_15 which converts
cp1252 into your language
3) add to text.c in wvOutputFromCP1252 an extra case statement to
call wvConvert1252To[YourEncoding] if outputtype == YourEncoding
4 create a function like U16 wvConvertUnicodeToiso8859_15 which
converts unicode into your language.
5) add to text.c in wvOutputFromUnicode an extra case statement to
call wvConvertUnicodeTo[YourEncoding] if outputtype == YourEncoding
Be warned that converting from unicode to your language, which is the most
likely scenario will only work out correctly if the unicode actually maps
to your charset, so obviously converting unicode that was japanese characters
into russian koi-8 is only go to give a page of ?, so watch out for that. Later
on i'll add in some ability to check the language.
* added wvSimpleCLX program which determines if a file is complex (fast-save)
or simple (full-save)
* basic character handling, converted windows "compressed unicode" into
html as far as possible.
* fixed size mistake in PCD PLCF.
* tested wvSimpleCLX on all word docs, made a mod or two to the ole code to
avoid segfaults identified by the test.
* moved decode to decode_simple
* added decode_complex
* debugged the decode_complex para begin code, and extended to find the para end,
though this might be a little wrong, but we'll see.
* added the wvText program, primarily for testing the new mechanisms, but it can
be a useful program in its own right to get the main document text from a word
document in its raw form, obviously its not going to handle tables and any kind
of complex word artifact, only the text in the correct order. Which considering
the whole complex file format question makes still makes it a very sophisticated
* wvSummary bugfix.
* debugged wvText so that it doesn't crash on any of the 3735 sample files.
* added ability to text code to remove field codes, and just output the previous
results of the fields.
* added some changes to the error output code, now use wvTrace to output debugging
messages, its a macro that will dissappear when compiled normally, unlike the old
* changed the FKP code to pull in the total data
* created wvAssembleSimplePAP
* release the FKP on each cycle in the decode_simple
* fixed a few sprms from doc investigation that were wrong or dodgy in the
* stupid bug in EatSprm.
* debugged wvAssembleSimplePAP and FKP code for crashes.
* fixed bugs in sprm.c and numrm.c, changed a few constants to the cb equivalents.
* applied the PAPX to the PAP correctly (simple mode, i havent even tried complex yet).
* confirmed that code does the right thing, and gets the right properties for
the simple pap.
* reran checks.
* create a test with wvHtml to output some of the interesting paragraph properties
in the correct place.
* added expat the xml parser to the tree, im going to use xml for my config file, which
may or maynot be a good idea, but seeing as my lex code created *such* problems on
different implentations i'm well and truly sick of it, so im going to try xml instead.
* reran autoconf with the latest version
* wvConfig changes...
1) created a release for the config list table
2) malloced correctly
3) created an append for <title/>
4) pass the userData into wvConfig.c
5) convert main into orinary call
6) moved wvText to wvConvert, and make wvText a
Changes to to 0.5.16
* added anld.c, changed over from old ANLD to new ANLD. added wvGetANLD and
* cleaned up bad chp entries. allowedfont removed, may cause problems in
* added some stylesheet definitions.
* trivally added version.c,and modified it to become wv rather than
* added wvGetSTSHI,wvGetSTD,wvReleaseSTD,wvGetSTSH,wvReleaseSTSH
* short tests show that the new stylesheet code appears stable.
* added dcs.c, shd.c , numrm.c, asumy.c
* defined TAP, TLP, and TC and PAP
* added lspd.c,phe.c,tlp.c,tc.c,tap.c
* added InitPAP, and all dependancies, for istdNIL stylesheet.
* addded ANLV,OLST,SEP
* ive completed the new set of PAP sprm handlers and support, this
consists of wvGetSprmFromU16,wvEatSprm,wvApplySprmFromBucket,and a myriad
of wvApplysprm* functions, with the exception of one or two old sprms that
have no documentation, and the hugesprm, which ive left until i get an
example of it.
* added wvCopyCHP, & wvAddCHPXFromBucket, and most of CHP in sprm handling.
* added wvApplysprmCMajority + wvApplysprmCMajority50, but i really don't
like the look of them, im very unsure as to whether or not they are right.
* finished CHP in sprm code
* confirmed correct para style basics, started into char style code.
* complex merged CHPX done, only found one trivial example so far, so uncertain
as to if it works.
* modified wvEatSprm to ret the len.
* modified wvEatSprm to handle the three special len cases in it as well.
* got wvReleaseSTSH to release its grupe's and sub components as well.
* temporarily nailed new stylesheet struct in as part of the old one, so that
i can experiment with the new one in conjunction with the old one.
Changes to to 0.5.15
* made yet more changes to the configure script, maybe itll all be
in the right order now (hah i doubt it!)
* added wvWideStrToMB,wvGetFontnameFromCode
* added small patch from Barry D Benowitz <firstname.lastname@example.org>
who noted an uninitialized pointer.
* fixed a bug where a $ showing up in a title would shaft the whole thing.
* fixed the default value for the html font string, unlikely to have ever
* a parser.lex and man page fix from email@example.com
* removed references to the ffn struct, and replace with the appropiate FFN
* added fld.c, wvGetFLD, wvGetFLD_PLCF, wvWarning, wvFree.
* added wvGetDOP, wvGetDTTM , wvCreateDTTM,wvGetCOPTS,wvGetDOPTYPOGRAPHY,
wvGetDOGRID, wvGetASUMYI & dttm.c.
* modified dop.c with new interface.
* added wvGetSTTBF, wvGetBKF_PLCF,wvGetBKF, bkf.c, sttbf.c
* added xst.c,fspa.c. Modified wvWhichTableStream, added wvGetFSPA,
* correct STTBF handling, and sorted out decode_bookmarks ala new form.
* added lex problems to the install file/faq.
* added lfo.c, lst.c, lvl.c,wvGetLSTF,wvGetLSTF_PLCF,wvGetLVLF,wvGetLVL,
wvGetLFO_records & wvReleaseLFO_records. Which are all to do with parsing
lists, which is possibly the second most complex part of word documents
to understand. (the first being fastsaved of course).
* added wvSearchLST, began converting list code over to new cleaner "by
the spec" code.
* wvGetListInfo will probably be the workhorse function which will sort out
lists given a correct pap.
* added the slightly silly ordinal.c file along with nfc.c.
* changed references to mswordview.h to wv.h, to get the changeover moving.
* ok, i can currently get a lot of the simple list stuff correct the new
* most of the list string is now done, as is the nfc and starting position.
* added a another entry to the list stuff, to keep track of the current no
for the list entry, would work for at least simple lists.
* figured out how to correlate the appropiate lfolvl with the correct
* i now use the linked character and paragraph properties linked to the list
* the new list code is now integrated into the code, but it still is new and
probably flaky. I'll do bug testing and so on and work that out in a short
Changes to to 0.5.14
* i have to make changes to the configure script to link -lXpm in the
* scream, i had to put back in part of the signal configure script, bear
with be, why does *everything* work on my machine but nowhere else :-),
Changes to to 0.5.13
* a mad person reports that it can be compiled under vms !, im awaiting
* changed doc version testing to the knowledge base article on the
* removed duplicate fib code from mswordview.c
* added wvGetEmpty_PLCF,wvGetFRD,wvGetFRD_PLCF.
* added wvGetFFN,wvGetFFN_STTBF,wvReleaseFFN_STTBF,wvGetFONTSIGNATURE &
* removed the reinstall handlers from the configure script, that should
sort out the configure problems on some systems, irix in point.
Changes to to 0.5.12
* patch from Cliff Miller <firstname.lastname@example.org> to
fix TTF_CFLAGS in configure and Makefile.
* small bug with ending tables. Seeing as you cant place text tags
like bold and italic between cell elements in html and expect them to
do the right thing, you have to do a little dance where character properties
are stopped and restarted for each character cell. I had forgotten to
reenable the ordinary nontable mechanism immediately after the end of the
Changes to to 0.5.11
* we now extract the document title and display it
in the title field, using the default config.
* add bold and italic element handling, you can change these
html tags to you hearts content now.
* I confirmed that $title works fine.
* I ported over Somar Software's summaryinfo stream stuff, so
now wvSummary can print the title and last saved date of an
ole document according to the summaryinfo stream.
* added bit shifting to awk script.
* added warning for duplicate offset in script.
* i have a spiffy logo.
* added more stuff to the summary into thing, it might very
well be complete, the previews of summary info are stored as
a wmf file, so in conjunction with libwmf you can get all
* added a wv-incconfig and wv-libconfig and installed the
appropiate include and lib files, so as to start making the
process of using mswordview as a lib more possible. this
still needs quite a bit of work.
* allowed optional sections in element string, use  for them.
* worked font config into the main code.
* bw wanted and got ...
1 $title fix
2 element support (bold&italic&font)
3 --configfile switch
* fixed an amazingly stupid bug that crept in with the introduction
* noticed that new doc start code wasnt occuring in fastsaved files.
* aaaaaagh!!!, i had forgotten to munge the wierd long offsets into
their correct halved form, no wonder so much wierdness crept into
fast saved files, its amazing how well it worked nonetheless, this
should at the least make parsing fastsaved files with tables much
Changes to to 0.5.10
* added document header and footers to the config file.
* addded pixels per twip to the config file.
* allowed " as part of a string if escaped.
* added code to use the beginning and ending tags.
* allowed multiline strings in config file.
* use the two twip values.
Changes to to 0.5.9
* i never reran autoconf !
* added a patch i got ages ago and forgot to add
dos/windows support for .exe extension to the configure
* added some deep magic to blip handling.
* addded check for wmf record sizes < 3 in libwmf.
* fixed BSE record to eat empty space, and resync.
* fixed Makefile.in in oledecod dir.
* much purify related thingies found.
* remove last bug to fix last buggy file of current run.
Changes to to 0.5.8
* blip code changed, new one looks much better.
* would you believe that i was always one out when decoding
styles, great bullet proof code though :-), it kept on trucking
and resynced itself with the data again for the most part, that
bug must be in there for months at this stage !
* new blip code now in operation, appears to do at least the
old blip codes functionality for 0x08 blips, how did i get 0x01
* made configure script get heroic when searching for components,
checks for for includes and libs both below a --with-stuff dir, and
also inside it as well.
* finished 0x01, checked offsets.
* had to add guessing code to figure out whether to use a delay_stream
* allow resized images (well let netscape do it) for 0x01 graphics.
* tested wmf's with text with readonly font dir, no problem there.
Changes to to 0.5.7
* fixed bug that causes crashes on tables.
Changes to to 0.5.6
* variable handling, add a subst function that substitutes
real things for variables in the config file.
* updated my homepage, god i love the gimp. All i have to
do to change the graphics on my page is to load a different
set of text files to the scheme interpreter in the gimp and
ta-da out pops my new pages, in the bad old days i'd have
been at it for days.
* have a mechanism to expand variables in place, only recogonized
variable is patterndir, will have more later of course :-).
* some magic dohickying to get the libz in /usr/lib to be tested
before ending up with the possibly crap one that some systems
stick in /usr/X11R6/lib.
* do a for loop to install the graphics now, should sort out
some people;s broken install scripts, gagh!
* cleaned up config file with purify, all systems are go for
first public release with basic config file support.
* remembered to add ttf support to mswordview as well.
* added support for variables in the lex code.
* fixed zlib configure script again.
Changes to to 0.5.5
* added in support for an external config file. The external
file allows a start and end to a style to be user defined, i.e
h1 for the start of a heading 1 style. Its possible to disable
or enable handling of bold, italic and font size/face changes
inside of a style, this is only started now, so its far from
finished. Please *dont* use this file for the moment, im working
* this is an interim release to fix the configure script problem
that i had, and to add to the documentation as to the libwmf stuff.
Changes up to 0.5.4
* well now, ive been away for a while working on libwmf, which is now
complete enough to use. download it from
http://www.csn.ul.ie/~caolan/docs/libwmf.html, and install it and
run mswordview's configure and compile and ta-da, mswordview can now
handle wmf files.
* added a fallback from a failure to find -lz to -lgz, a problem on
SuSE linux im told.
* found that old redhat's appear to have a libz in the X lib dir, that
is old and crappy and doesnt link to my thing, didnt put in a word
around, but mentioned it in the documentation.
* created file with h1 to h9, verfied that the lex code and so on
works together fine with mswordview.
Changes up to 0.5.3
* begun adding all fields to structures, and marking them implemented or
* strikethrough and underline for revision text
* found the bounds of the comment in the main document, i put a name tags
on them, and place comment begin and end graphics around them, at this
stage remember that the -a option to remove comments exists, as even one
comment in a doc can make the whole thing pretty unreadable :-), but the
support is in there if you need it.
* revisions are given underline for added text, and the strikethrough color
for deleted text the same as word does it.
* begin and end for deleted and added revision text is shown with graphic
tags, added a -r --norevisions option to ignore that stuff.
* names for revision text
* put revisions authors names in yellow text.
* i dont even *pretend* that im outputting good html btw, just working
html under netscape. once everything is working i might go back
through and work out correctly the dependancies between all the html
outputting code, that'll be part of the overall cleanup im doing to make
this modular enough to be used with abiword as a word97 importer.
* time and date of the revisions are included as well.
* think that ive completed revision text, but i need more tests before
ill be sure.
* in comments theres always a pagenum field that word itself doesnt
show in comments, so ive stuck code in that disables this field if
its at the beginning of a comment, also verified that comments work
in fastsaved mode, though what is the story with that page number
in annotations, hmm its bothering me somewhat.
* titchy bug where i included the wrong end of comment graphic.
* put square brackets around comment links, i believe this completes
* titchy bug in the time field for revisions.
* properties of text that change during a revision are listed as well.
* found the location of what sets the footnote & endnote styles of
numbering and other settings for endnotes and footnotes in the DOP, there
were missing from the copy that www.wotsit.org has, ive sent them the
* extracted the DOP fully.
* footnotes and endnotes now get the correct formatting of the numbers,
i.e lettered, roman or arabic etc, damn missing page of the spec, i
was searching for that for ages.
* i have some old code that gives the correct starting point for endnotes
and footnotes so im leaving it in for now, but i can now use the DOP
instead for this info.
* endnotes should now be put either at the end of the doc, or at the end
of the section depending on what word does, needs testing.
Changes up to 0.5.2
* implemented auto text color colour check for table cells, no more
black on black, or black on blue. i must look closely at what other
auto changes word makes, and where else i might have to put that code.
* some uber-simple greyscaling code when table look says no-color.
* verified it works under AIX, made a few changes that showed up due
to its stricter malloc, theres probably a few more malloc related
issues hiding in there.
* column breaks show up as well now.
* the various types of section breaks are distinguisable from the
others, and from page breaks.
* a few changes to make sure formatting and tables get on better
* sequence field supported, i.e caption numbering, i just use the last
fields that msword left in there.
* changed hyperlinking so that it works with bookmarks that are in
* i now support multiple bookmarks that end on the same location.
* multiple bookmarks that start on the same location should be supported,
but no examples yet.
* the comment author initials are extracted and used in the main document
when referencing comments.
* comments now end when they are supposed to, only the correct comments get
included, should work for fastsave, not tested.
* removed unused variables, sorted out a few other warnings, maybe itll
squeak by the irix compiler now ?
* names and initial info for comments is extracted as well, and stuck in a
table at the end of the document.
* fixed the <a name= for comments, should work in fast saved.
* custom graphics for annotations.
Changes up to 0.5.1
* forgot to change the version no in the source.
* damn sunsite broke connection half way through uploading.
Changes up to 0.5.0
* Martin Kalms <email@example.com>, configure fix for sunos 4.1 in
relation to strerror.
* added option where you can ignore table widths.
* custom graphics for comments.
* endnote autonumbering now works, now defaults in roman numerals.
* fast save footnote problem fixed, though i think things might be
even more complex that i thought, so keep an eye on that area.
* footnotes are in a colour of their own.
* symbols as footnotes, required a change to the 4a30 sprm that might fix
a few other char formatting issues.
* restarting footnotes on each page, and each section works, this is
encoded in the the number itself it appears, a href and a name, and some
invalid html code fixed in the footnote area as well, footnotes are now in
a colour of their own *but* the location of whatever sets the footnote &
endnote styles of numbering is unknown, i havent figured it out.
* all endnotes are listed at the end of the section rather than optionally
at the end of the document, i dont know how this is done, doesnt appear
* textmarks / bookmarks and explicit hyperlinking supported, bugs in
old code removed hopefully and internal hyperlinks put in via insert
hyperlink are supported.
* support for bookmarks, i.e they are converted to <a name>[text]</a> html
* converted cross-referenced textmarks/bookmarks into hyperlinks.
* wmf files can now be decompressed thanks to firstname.lastname@example.org
now i need a wmf --> something useful converter. i see that theres a new
one available off the gimp plugin page, with some uberhacking it might
do the trick, the notes/wmf dir has a goodly chunk of info on the format if
anyone wants to do it for me.
* when bookmarks are embedded in bookmarks something odd appears to occur,
but nonetheless the ms save as html does the same, so im assuming that its
* added bookmark support to fastsaved, should work fine, not tested.
* pagebreak gifs are correctly centered if the next para is a centered etc
* author field supported.
* proper positioning of page numbers, general layout of headers appears
to be fine, except that tab stops are used in headers to center, left
and right align headers, which doesnt work so well in html mode.
* added defensive code to some sort of list bug.
* mimic strike-through and double st by setting the text color to either
#ed32ff or #ff7332
* disallow height commands inside tables, as the model of paragraph heights
doesnt fit well with the architecture for tables, so im ignoring them in
tables, hopefully noone will notice :-)
* fixed a small bug in sprm which was causing errors later in lists.
* tables and paragraph formatting were misaligned across td boundries.
so now i clear specials and fonts on entry to a table, and on exit of each
cell, hopefully i broke nothing else on doing so.
* at least one really bad conversion with a file called RESUME.doc, but in
my defence i looked at the msword conversion of this to html, and its just
as buggered up so rasp ;-P
* added credits file
* found problem in decompress code, i didnt make it good enough for real
world usage, i now use mmapping so make my life easier, dont know if this
is fully portable, works on linux and solaris.
* oledecod had bugs on cleanup, so sent filters group wmf.doc and
Contribu.doc to demo the problems.
* i now use oledecod 0.0.4 which fixes cleanup problems, but Contribu.doc
style problems continue, they return 5 but laola can extract the streams
nonetheless while oledecode cannot, i modified the original laolareplace.c
to handle this as well.
* oledecod 0.0.4 has a bug in relation to 1812bb.doc, laolareplace.old.c
hasnt this bug, so im back to using that again.
* those ffffffff's in lists that haunted me in earlier releases are *back*
grrrrr!!, anyway ive another massive nasty workaround that im using that
hasnt crashed any docs, and appears to do the right thing, at least in
* wmf decompression code changed to use mmap, replaces the original code
that ate memory, if mmapping doesnt work try looking at the zlib docs
and change the code to fixed buffer incremental decompression.
* added a bailout to ignore encrypted documents, wonder how id decrypt
them if i had the correct password, anyone know ?
* added a bug fix for crossreference parsing.
* beginnings of tables of contents included, doesnt always work yet.
* bug where if the word file ends on a table, the table wasnt closed off is
* bug where non built in graphic types were causing hangs.
* im now often happily (if slowly) converting 90 and 100 page documents,
the only thing i really am unhappy with is table handling, which is
also one of the reasons the conversion is *soooo* slow sometimes, the
other reason is those godforsaken fastsaved files.
* fixed some other mem related bugs, converted sucessfully the last two
problem docs without crashes.
* table looks are somewhat supported, though theres no support for last
row and last column different from the rest of the cells as of yet, this
will have to wait until multi pass on tables is implemented.
* the foregrounds and character attributes in general for tables appear
to always set correctly in general, but i believe i have to look into
how the "auto" text color selects is final colour, as ive been assuming
that it gets set to black, which is a fairly valid assumption most
of the time, but not always, so a few docs will have black text on
black backgrounds in table cells, but the situation is much improved.
* ran purify over mswordview, removed a load of dodgy code out of it, theres
still a bug or two hiding in the list code, which i belive is the reason
that lists are sometimes missing in complex documents, e.g meeting.doc
i think i love purify, its the bees knees.
* dib's are now extracted as well, though i dont do anything with them yet,
this fixes yet more crashes.
* fixed laolareplace.old.c, which is the version im going to use for this
release, to work on 64bit platforms, a few longs had crept into the code
there which shagged the whole thing up. I havent done extensive tests on
64bit yet, but im confident that itll work.
* fixed defines to make it work if theres no zlib present.
* no crashes after running mswordview on 300 megs of uploaded files.
* good enough to upload to sunsite, version number reflects this.
changes up to 0.4.9
--This is an interim release while im in scotland until later this november--
added features are that the gateway is included, endnotes are supported,
pagebreaks that split tables are supported and some more bugs are fixed,
especially in relation to graphics.
* added -o - option to gateway, like i should have about 4 releases
* fixed graphics again, forgot to reset the extra amount that some have
before the graphic data begins, means more jpgs and pngs should work.
* endnote text done in simple saved
* cleaned up beginning whitespace from footnotes/endnotes/comments.
* endnotes in complex mode is in, needs testing.
* changed url code to match the other field code, fixes a big bug there.
* header and footer colours were wrong again, fixed.
* indent drift is fixed again, moved do_indent into decode_?_specials
* pagebreaks can occur in the middle of a table, this sort of confusion
is fixed for full saved files, and is probably fixed for fastsaved files
* pagebreaks now look like they occur after footers,footnotes and endnotes.
* custom graphics replace <hr>'s as there were too many of them at the
bottom of a page to figure out what was what.
* custom graphics for footnotes, and comments
changes up to 0.4.8
* this has a slew of bug fixes related to graphics and a new option
to put images in a certain directory
* fixed f006 code in blip handling, removing a slew of hangs.
* ignore every graphic that isnt an understood type, removes hangs.
* figured out when theres an extra 16 bytes to delete from the beginning
of a blit, and where one of my magical 17s were coming from
* got a bug fix off Harry Shamansky (email@example.com) as to why
the default make wouldnt work under irix.
* the current spid handling was mismatching spids and the graphics
* i cant handle forms, or ole data, so ive added a check to avoid
doing them, removes crashes.
* also ive added some other code to watch out for unsupported graphic
* msword can include wmf and emf files, these are stored in compressed
form, using lz encoding in a fashion supposedly compatable with the zlib
library, but i havent been able to decompress them yet and even if i
could i dont know of any source to convert wmf/emf files to anything
usable under linux
* ive changed blip handling, so that it works better, well i believe its
more crash resitant, but im still not 100% happy with 0x01 handling.
* if you insert a bmp via insert->picture->from file, it appears to
be converted to png for you, handy.
* paragraph indentation is back in, lists and table were confusing the
* fixed titchy bug so that space at beginning of lists isnt underlined.
* support paragraphs whose first lines indentation is greater that the rest
* support vertical space between paragraphs.
* sorted out end_para for the first paragraph found in complex mode, i think
i have it right now, in passing i reckon a load of those pap searches
in complex mode are unneeded, but i dont want to rock a working boat, if it
aint broke dont fix it as an uncle of mine used say, though we did seem to
spend an awful amount of time panically fixing things that broke
dramaticlly after years of neglect.
* finally settled on dirs for left indentation, blockquotes indent from both
* added an option to put graphics in a specified dir.
* added an option to find the graphics at a specified url.
* updated man page.
* made another change to blip handling, fixes some problems.
changes up to 0.4.7
* warning !, in this release mswordview no longer outputs by default to
the screen. use -o - for this behaviour. This is an interim release to
reassure people that im still working on it, its got quite a few new
features and bug fixes since 0.4.4 read down for them all.
* implemented tabbing with trans gif, optionally use hardspaces or
dont do it at all.
* added some support for borders such that the vertical space between
paragraphs due to width of borders is retained through the use of
vertical trans gif space.
changes up to 0.4.6
* indentation of paragraphs dithered to <blockquote>'s is out again as it
its doing strange things on long complicated documents.
* table cell shading done, fully supported i believe.
* drew all the available table patterns in all available colors,
made small transparent gifs out of them, if someone wants to do
better copies of the ms ones go ahead, use the convert.sh script
in the patterns dir to generate pics in all necessary colors.
* text color support is in
* word underline, which iswhere whitespace isnt underlined is supported.
* courier as an alternative to courier new, times alternative to
times new roman font face, helvetica as an alternative for everything
* all caps supported, Small caps supported, though i want full tests
of those two babies in all modes. Similiar to the fontfaces these two
babies are only supported in ascii languages, as i dont really know how
to convert utf-8 unicode into upper case !
* text animations supported by converting them to blink :-)
features-examples dir added, supported-font-features.doc has what i
believe is all the font features that word supports demonstrated in it.
id be happy to have omissions noted, mswordview now supports
1) font size
2) colored text, (in headers and footers as well)
3) font face in ascii based languages
4) underline, including word underline, where whitespace is nt underlined
5) super and sub script
6) All caps and small caps (ascii based languages only)
7) text animations dithered to blink tag
mswordview doesnt support due to html limitations (at least i dont think
i can do them)
strikethrough,double strikethrough,shadowed and outlined text, embossed
or engraved text.
"hidden text" is shown, coz i dont know the purpose of it yet
all caps, small caps and font face for non ascii languages.
* centralized pap initialization code
* fixed a crash causing blip bug
* fixed a crash due to sep sprms showing up in a papx !!, i ignored them
im sure that will bite me hard in the future, but ive documented it here so
i wont forget.
now we have a problem with paragraph properties which is only making
a difference now that i want to use the paragraph justification codes.
there exist pieces which have fc's greater than the maximum one listed
in the plcfbtePapx !, ive been pushing them around for the last 2 days to
no avail, im beginning to think that maybe this means that they have no
native formatting of their own, the catch is to find the paragaph that they
belong to, the spec says to find that by taking the smallest fc in fkp
tables that is bigger than the current fc, but there *is none* thats bigger.
my thought is to remember if this piece is the beginning of a paragraph
mark and if not inherit the previous piece's formatting, and keep going
backward until we get one. If it is then either im supposed to default to
a new one or go forward to find one.
+ Solution: Ah-ha i believe i have it,
+ firstly varient 1 gpprls have to be supported, and i had some offsetting
in them wrong
+ secondly i had a very subtle bug where i changed the value of the avalrgfc,
from when i didnt know why sometimes they were +400000000, of course i now
use it to determine if the end of the piece if twice the distance of its
reported character len of not, and with the val reset i ocassionally had
the piece recorded as being too long, so the paragraph properties of the
wrong paragraph were being used.
* added is paragraph formatting information, supported well is
1) centering, center
2) right justification , div align=right
* made a closing paragraph thing like the closing chp for the blurb at the
bottom to avoid having the version info centered of justified.
* 0x01 fSpec graphics are now supported in addition to 0x08 graphics
while both of these are draw objects, only non-vector graphics are supported, and
only partial support of those i.e png and jpg.
as with the 0x08 graphics theres a lot of magic emperically derived offsets being used
to put it together, so dont be too surprised at getting corrupt images.
though i *have* fixed a bug in png handling i believe for 0x08 graphic which was the
previous subset i supported.
changes up to 0.4.5
* i now open graphic and doc files in binary mode to support platforms where this
makes a difference.
* replaced laola, perl no longer required, thanks to the mighty
Andrew Scriven who replaced the OLE functionality i needed with C
* got a bug fix off above to handle files with more blocks
* optional support for fontface if the text if an ascii based one,
i.e if were guaranteed that this is a western european language
then we do font faces, fastsaves will probably confuse this test and
mean we wont get faces even when we can handle them correctly.
* changed indent method for outline lists to multiple hard spaces, rather
than <dir>'s, in the future ill make an optional proper html conversion,
but it wont look like the original, so its a TO-DO.
* indentation of paragraphs dithered to <blockquote>'s is in, alpha support.
* absolute width and height of tables is in as well.
* i now default to outputting to a file whose name is the same as the input
file, with .html appended. graphics are output to the files with the same
prefix as the .html file. use -o - to output to stdio
* new ole code was broken on a few files ( 1 :-) ), fixed this.
changes up to 0.4.4
* a good few bug reports in, crashes and what not, i got the use
of purify on a sun box (thanks to martin mellody et al) and sorted
out *all* the uninitilized mem reads there, (3000 of them in the course
of a typical conversion!!), it still leaks memory like a sieve but thats
not important for mswordview, though i will sort that out. purify is
a wonderful piece of work i have to say.
* changed ffffffff handling for lists, i think it means that
the list in question isnt actually there, so to skip it.
* changed blockquotes to dir, looks neater and word itself does
it, biggest software company in the world cant be wrong, can it ?
changes up to 0.4.3
* oops, i shafted the inclusion of getopt for systems that need it.
changes up to 0.4.2
* fixed broken simple mode footnotes (doh!)
* fixed bug in blip where having drawings where none
of them was a picture caused a crash
changes up to 0.4.1
* did some tweaking to remove a crash.
changes up to 0.4.0
* and big breaking news, preliminary graphic support is now in!!
yes, gifs/pngs/jpgs added to a document through the
insert->picture->from file mechanism now convert correctly. They
are stored in the office draw format which ive just cracked the
rough layout of. (through the handy ms spec on the msdn site),
graphic support is messy for now, as the files are generated in
the cwd of mswordview and named graphic*mswv.*, ill tidy it up
later, this news is too good to not get an announcement.
changes up to 0.3.0
* added -m --mainonly option if you dont want headers and footers.
* added a few more places to look for lls-mswordview
search order is now
1 in the path.
2 the same dir as lls was run from if ran absolutely.
3 the current dir.
4 a dir called laola off the absolute path.
5 a dir called laola off the current dir.
but stuff line ../../mswordview isnt in there though, coz folk should
just put lls-mswordview into their path dammit!
* diffent numbering formats for pagenumbering is in, a vs i vs 1 etc.
* gpprls for sep's work now, complex sections are in.
* found some strange code in clx_headers and clx_footers so i blew it
* section support in for simple saved files.
* sections that restart pagenumbering work now.
* sections that have no footers/headers at the beginning work now.
* complex support for sections is in as well, should work hopefully
needs extensive testing.
* TO-DO text color, eventually font faces, but no sleep lost on that i have
* TO-DO shaded cells in a table, think up a better table handling method.
* i now stick a space into an empty cell so that it shows up.
* another U8 wraparound bug removed.
* i now use the piecetable for simple docs, so as to skip over sections
that arent to be processed, i.e the simple format is just as complex as
the complex format :-), i think ive done this right and it wont break
anything, ill have to wait and see though.
* changed slightly the portions of a field that dont get printed,
to make some html ones work, hope i havent shafted anything else.
* hmm, really need to cleanup character handling, unicode &
special reserved ms symbols and so on, im just plinking at
them for the moment.
* aghh, found another U8 overflow, what possessed me to put
them in in the first place ?, i should have guessed that
there would be hundreds of pieces in a file.
* received report that it compiles and runs with
Sparc solaris 2.5.1 - sparcworks compiler
Intel x86 solaris 2.5.1 - gcc compiler
* added patch from diakka <firstname.lastname@example.org> to run
create_bins on a make rather than make install
changes up to 0.2.2
* compiled it on a solaris account i got, and its fine, got
confirmation that it works from Will Renkel <email@example.com>
* changed fastsaved chpnextfc check to be >= rather that >, hope that i
dont break anything cox of it.
* foolish error, U8 used for number of pieces, extended to U16
* changed embedded link handling to not end character properties in
the middle of a URL !
* changed embedded link handling so as to *not* place "" around urls,
as sometimes they are there already, and not having them doesnt hurt,
though it offends my sense as to how they should be done.
* would you *believe* these ms guys, now they are hitting me with
file offsets that are past the end of the file !!, so now i have to
watch out for that, the complex format is *such* a collection of
hacks, ah-ha ive just checked in word, this file crashes word :-)
so this is the first reported case of mswordview being better than
msword, though i have to say that in recovery mode word pulled loads
of text out of it that i didnt get, :-(, still its a corrupt file
so doing anything at all is a success.
* i forgot to reset the higher list levels when changing a lower one,
fixed now, i think ive it right.
* added a define of SA_RESTART to 0 if it isnt there. bash does it so
i should get away with it, sunos seems to need it.
* added a little patch from Zachariah Baum <firstname.lastname@example.org>,
that should help get around folk who run mswordview absolutely and dont
stick lls-mswordview in their path, ie make and then dont make install.
* fixed yet more bugs, for some reason i thought that
the order of evaluation was from right to left !!!!
i.e i was doing
if ((*p == 'a') && (p!=NULL))
* changed web interface so that utf-8 is always on.
* font characteristics turn off when going into tables now.
and turn back on when inside, gets rid of some off look
* checked out corel's wordperfect import functionality with
office 97 files, conversion isnt as good as mswordview i think.
missing header numbers, and one or two didnt convert at all.
though of course corel retains layout which mswordview cant
do with html, and does shading, ill check pictures at some stage.
* have a report that suns pcfileviewer similarly covers about 50%
of mswordview's functionality and vice versa.
* gzipped uploaded word file collection has just hit 120megs :-)
* i now look at this section table so i know whether its a section
break or page break. If its a section break, then the header/footers
revert to the beginning again.
TO-DO, add an space to empty cells to make them look reasonable in
TO-DO check page numbering with sections.
TO-DO, do endnotes, should be easy. make new pic to replace hr
lines, theres too many hrs now at the bottom of a page to make
sense to anyone anymore. if theres no footers, then dont do
TO-DO, continue with the sent files since 0.1.0, and the rest
changes up to 0.2.1
* removed bug that caused lists to drift further and further
1. checked out the blockquote indention for lists, doesnt
appear to be right for srom*.doc, fixed now
took closer look at font scanning in decode_letter,
in particular special chars, the < 39 wasnt precise enough, being
in a wingding/symbol font seems to make you automatically a special
2. something not fully right with lists that take their
text as special chars (i.e sectionnumber), not done by ms in
an obvious fashion. edit doc down to just the 2 headers and then
see what happens.
3 AHA!!!, 1 and 2 are wrong, as was previous ideas to ignore lists
that appear to have nothing in them, they are there to artifically
bump lists up to a different starting number without requiring a
seperate list definition for each one, ms shoves in dummy elements
to get the list up to the right number, the section id just before
one of them threw me entirely, i thought the section number should
have been the text of the list. ive got it now!
* 3 above is *rubbish*, thats not it at all, i was right originally,
ignore those 0 len lists, and the problem was with my list restarting
mechanism which didnt work if there was more that 1 list between list
section that had to continue numbering.
* numerical outline list sublevels will retain the prefix of the
above levels, this required a change of the number figuring out code,
its now rather heavy of silliness, but it works, i dont love it and
im sure lists will be back to get me again at some stage, but outline
lists now work, in particular the
* TO-DO sections, srom*.doc has them, check them out.
* TO-DO change web interface so that the utf-8 can kick in if
* fixed bug where the new piecetable check in simple saved
files fell apart after hitting a footer.
(tempcp = tempcp, rather than realcp=tempcp, doh!)
changes up to 0.2.0
* well arse again, ive revised my ideas as to what consititutes
the end of a piece, rather than the beginning the the next piece as
i was doing, i now believe thats its the beginning of the piece +
the twiddled cp len. makes more sense, and removes crashes from the
latest doc i was given.
* distinguishs between odd & even page footers.
* TO-DO odd & even headers
* added the tm symbol as a special case, theres quite a large
range of unicode that ms is using that is part of the customizable
section, i.e theres loads of glyphs that ms can use that are not
part of the standard unicode set, the tm appears to be one of hundreds.
eventuallly ill have to get a table of them.
* woweee, is ms an evil designer of data formats, they have two
types of simple saved docs i thought, those in 8 bit (basically ascii)
and those in 16bit (unicode), hah bloody hah, ive been given one which is a
mixture of both, and i have to use the damn piecetable to shove it together.
and its not as if the document shifted into a different language of
anything. if this was fastsaved id not blink an eye, but simple saved,
come *on*, why bother calling it simple saved. so i have to keep an eye
on the piecetable to determine what exact offset to use after all.
* added a huge bit filthy hack in for more list twiddlings, the
previously mentioned unknown 4 byte sequence now rears its head
as an optional 8 byte sequence !!, but always ffffffff, it might
be some kind of flag or summat. anyhow i now chew up any 4 bytes
consisting of this if they show up in the place that they might
appear, this removes a large crash that occurs otherewise, as all
the counters get thrown off course by them.
changes up to 0.1.1
* added Makefile patch from Pavel.Roskin@ecsoft.co.uk (says it works
* well the good news is that the unicode utf-8 is working for
taiwanese and im sure other languages, the bad news is that everyones
telling me that noone in their language group is actually using unicode :-)
so i suppose i require a huge unicode --> JIS/EUC/KSC/Big5/GB converter.
* rudimentary support for annotations, i havent too many examples of these
but i think they'll work fairly well.
* rudimentary support for all special ascii codes for time,page no etc.
p.s by rudimentary support i mean that if asked for e.g the current date
in a particular format i output the date, maybe in the correct format
maybe not. i.e the meaning is the same, though the look might be different.
* added a supported sprm, that changes chp information totally to the
chp of a different style.
* added support for custom footnotes, had to do a bit of a hack to
get the <a name> stuff right, hopefully it'll always work, even if it
doesn't itll still be readable.
* twiddled the char formatting dependancies about again, really ill have
to redesign that a bit.
* broke the mswordview.c file down a bit into other files.
changes up to 0.1.0
* hell ive enough done to warrent a new numbering system.
so from now on
x is a stable bug free (hah) release. folk packaging for commercial
unices probably should wait for these releases (none yet, i know)
y is a new feature or enough bugs fixed that you better use this
version if you want to keep up with the jones.
z is some small bug or change that is small enough that i wont upload
it to sunsite et al automatically, itll be mostly for me.
* added a defaultfont size option, so that if you think the output is
too big or small, you can skrink or enlarge it.
* added a horizontal padding option, you have the option of 3 different
ways to handle a run of multiple line breaks, though the default is probably
* tweaked char formatting system, TO-DO overhaul all of that, theres quite
a few dependancies between the tags thats becoming a little to difficult
to do by hand, a little stack is called for methinks.
* added some support for a type of holdover list format found in docs
converted to word8 from older versions. works on the one i have so far
though theres more testing to be done with it. missing bullets and
incorrect numbering may be related to this. pass them on to me.
* battered LFO's into submission, this time they'll stay down (i hope).
found a 4 byte field that i cant figure out where it came from. *shrug*
wouldnt be the first time that happened though.
* changed footer and header handling, i now take notice if the first pages
headers and footers are different that all the others. i still dont get
section breaks, which i think impact on this, i dont have any examples of
this to work against. Theres a discrepency between header/footer documentation
and what i see before me in the hex, maybe im missing something.
* ok theres some difficulty with tables, ive implemented this baby as a
one pass parser, later ill have to add multipass (or backpatch) to figure out
the number of pages so as to get that field right, but with ms tables you can
start off with 2 cols then go to e.g 4 in the same table, you dont know in
advance how many rows and cols there are in maxiumum, or which ones span which,
which is a pain in the butt, really as far as word is concerned each row
is a table into itself, so ive done it this way
- each table has the cols of the first row counted and the widths
figured out in % of the page width, if a subsequent row has a different
number of rows or different widths than the previous row a new table
will be begun. the % width will cause netscape to line them up correctly.
itll do for now. not perfect i know but hey what is. Itll do the job
for the primary task which is making word readable as close to the
original layout as possible within html.
- to get the tap that tells me all the above we have to scan forward
until we find a rowend char, and get the pap of that to get the tap.
and with fastsaved theres the usual complexity
- The problem will be that netscape and other browsers dont take the
width% as their primary factor in determing the actual width of a cell,
if the text in it cannot be broken on a space then the cell is expanded
to fit, breaking the lineing up. Im considering a somewhat more sophisticated
(and questionable) technique where i stick the tables together using
dithering of the cells to a (max 64 cell (msdefined)) cell grid. using colspan
and so on to do it.
* TO-DO theres something called a header text box that i have to figure out
and some companion of it for the main doc. i have to implement something to
handle these beasts.
* TO-DO more testing for bugs and stuff.
* TO-DO code overhaul to simplify it.
* TO-DO support all fields, ive some supporte page no, date and time.
but not perfectly in the same format that word has them in.
* TO-DO,figure out how to extract ole embedded msoffice draw and equation
editors data, and see if i can get them converted as well.
* TO-DO provide alternative outputs, tex/rtf and friends. ive a load of
formatting information that i think i can get into those formats.
* TO-DO provide basic formatting for html, i.e centering.
* TO-DO think about writing word docs :-), now that would be a hunk of work.
so to all you asking me about it i recommend you dont even bother with it,
just write rtf files and get on with it, thats even what ms did for word 8,
saving as word 6/95 just creates a rtf file, if its good enough for them, its
good enough for us.
* TO-THINK-ABOUT i dont keep very much information in memory really, i just work
out what i need for any given instant and drag it out of the file, and then dump it
often to only get it again in a few seconds. this leads to an impressive amounting
of seeking back and forth across the streams. theres a groove burnt in my hd where
im working, its not really optimum behaviour, (works though :-) )
* NEED_HELP-ON, can this compile and work under sgi ?, have success reports
from linux, solaris,hpux,aix,freebsd and one failure to compile under sgi, ive
one message that it compiles under os/2, though it needs some work to do that.
changes up to 0.0.27
* know how to do the right thing with embedded sprm list
gets rid of a few wild bugs.
* found the list documentation after all, maybe i forgot
to download it the last time (doh!), or it wasnt there
when i downloaded it. so i removed all of my rather good
but unnecessary hex determined code.
* added a special case for "*" in lists, make it a bullet
point instead, seems to be the right thing to do (?)
* changed laola commands name to append -mswordview to avoid
overwriting newer lls commands etc.
* changed the INC in perl files to reflect final install dir.
* TO-WORRY-ABOUT, quite a few ??'s displayed in netscape when
dealing with those utf-8 docs, dont know if thats my lack of
correct fonts, or a great big dirty bug. also ive a few special
cases in the decode_letter to translate letters into what *i* think
they should be, its rather questionable and very emperically based.
* added some hook code to protect lists from pagebreaks.
in doing so i notice that my complex code is a wee bit confused, but
it works, so im leaving it alone for now, the added code doesnt make
for reability but hey, neither does any of the rest of the code :-)
* fiddled list interpertation so that ilfo isnt looked at until the
last pap and chp sprms have changed it. fixes difficulties in fast
(list stuff) LFO override not implemented correctly may cause crashes.
this is surely the last major list related thing to do.
restarts are probably incorrect as are a few other minor list
related bits and pieces
changes up to 0.0.26
* changed laola lib to a subdir of mswordview and changed laola
program names to custom mswordview ones, to avoid clashing
with newer versions or original version of laola, as ive
doctored things slightly for my own needs.
* applied Martin Schultze patch to add lib path to perl include
path, though i twiddled it to make a nice tree in my lib.
* lists start on the correct number (well ones that are simple
numerals do anyway).
* understand list continuing and restarting now.
* added a defensive patch from Peter Silva <Peter.Silva@ec.gc.ca>
* lists now get the char formatting that they should get.
* yes!, sorted lists out, have bulleted lists, arabic & roman numerals,
lowercase and uppercase lettering systems done. multilevel also works
i believe, works on all examples i have anyway
* fixed bug that made mswordview fail on files without an extension
* TO-DO look at list indentation, if they are true multilevel then
i blockquote them (for now), but if they have a set indentation value
then like all the other layout constructs i dont preserve this into
* TO-DO fields, table of contents should be easier with lists
* TO-DO find out if my unicode (utf-8) support actually works
for anyone except me. What fonts do various people need, this
is a general netscape question.
* middleterm TO-DO, reorganize tags to external data files, to make extensible
to other formats, i.e raw ascii, an attempt at latex, rtf.
changes up to 0.0.25
* changed list handling slightly, removes a bug where
you get too many list levels inserted
* i believe that most lists will now be handled correctly as to
whether they are numbers or not. I have isolated the undocumented
section and have a handle on the situation so its just a matter
to comparing theory with practice again.
* removed bug where header pap gets used in the main document
following a header
* finished checking all uploaded files beginning with a, yipee.
now theres quite a few elements not addressed yet in those files, but
i understand whats involved, in short, section support, proper list
support, justification support (centering anyway) decoding of the DATE
and TIME fields, would you believe that the TIME field can encode the
DATE, despite the fact that theres a DATE field whos job this is !,
gagh what can you do with people who do this to you. but anyhow the
uploaded all convert without crash, all text is in the right place, and
in the right language ( i think :-) ). all bold,italic,font sizes,
underline, manual page breaks, the content of footnotes,footers
and headers is all shown, albeit not always the way they appear in
word, yeah we're getting there.
* changed utf conversion code as the original code i was using wasnt
quite gpl compatable, anyhow new code is better designed for my needs.
* TO-DO, grr!! is someone reading this log, as after my weeks holidays
i note thats theres a huge amout of files beginning with a to go through
again, i never did make it to b.
changes up to 0.0.24
* fixed NULL complex pap bug.
* supports underline tag now as well :-)
* footnotes supported, all the ones referenced before a
pagebreak get listed at the manual pagebreaks and document
end . (thats a <hr> in my current output, splitting word docs
into different files is a challenge id rather not accept for
now as itd just be guesswork and mess), not checked in fastsave
* TO-DO support sections, so as to know what pages get headers
and which dont, etc.
* TO-DO proper table of contents, the text is now listed
but theres no link between the table of contents and the
text it purports to describe, for the moment.
* TO-DO differenciate between different types of underline
i.e word for word etc
* EVENTUALLY-TO-DO, i have come across one case where a symbol
used in a footnote isnt working !, if i create one of my own
it works fine, but when i alter the given one it still
changes up to 0.0.23
* verified it works on linux, aix and solaris.
* fixed a very silly overflow byte vs int bug.
* overhauled unicode conversion, fixed my sprm
* changed table handling so that tables dont
* fixed img insertion dummying of wingding font
* massively changed my paragraph end detection for
complex files, i had the idea all wrong, but close
enough that it worked on fairly uniformly formatted
* works with all uploaded files beginning with A and a
theres soooo many to go through :-), im looking
forward to getting to b soon.
* TO-DO, continue checking against uploaded files,
verify header and footer support, start on list
information (dum de dum dum dummmm)
changes up to 0.0.22
* check for errno
* fix list related crash bug, found by Wayne Roberts
* TO-DO, go through the 50 megs of uploaded word
files and see do the convert fairly correctly :-)
lists need to be done better. i need to confirm
language conversion. and check out table of
changes up to 0.0.21
* for simple format i now decode to utf-8, when appropiate.
on viewing many docs with windows netscape 4 it works
fine, i dont have the X fonts to do half of the
languages under my own X, but hopefully those
in the various language blocks can figure out
fonts for themselves ?
* complex format non-west-european docs might
still be shagged, id love to hear from an asian
language group as to whether or not the utf8 works
* some bug fixes by Pavel Machek <pavel@Elf.ucw.cz>
changes up to 0.0.20
* headers are fairly correct now, the spec and me
are confused as to headers and footers though, so
while i *can* do headers and footers, it might require
a bit of fine tuning, so i need docs with all sorts
of header and footer types in them until im sure im right
, but its close enough.
* docs with subdocs in them should return the output of
the main doc now.
*to do, from the veritable deluge of documents in languages
i cant read :-), id better handle the non-standard, well
non standard to me anyway ! russian and one or two
others that i hope fall out in the process, asian
would be wonderful.
changes up to 0.0.19
* header support added to complex format
* wingding font hack added like symbol font
* headers are still not right, footers and headers are all
appearing at the top of the document, ive more work to do on
* ive shagged up the parsing of lls output, so docs with
ole inside ole will not work even though theres no good reason
they dont, bear with me on this
* mswordview.wrapper added to allow inline viewing of word docs.
changes up to 0.0.18
* new option to not change msword headings to html headings to
support those dodgy people who dont use them correctly.
* fixed what looks like a specialized case for recognizing tables
* fixed the lack of - sign.
* have a new group of files that convert correctly.
* these are minor changes, ill add header handling to complex
changes up to 0.0.17
* lack of getopt.h on some systems taken into account now.
* sub and super scripting now in for simple format.
* laola.pl changed to continue even if it thinks the file is
the wrong length.
* added option to not attempt to dummy up formatting done with
* using gifs for symbols, this will do for html output, for
other output in the future we'll have to organize something a
little more sohpisticated
* i have some alpha support for headers in at the moment,
if you have headers you "might" see them in russet text.