Need to complete adaptation of MSG to handle the -Z option.
Withdraw code for semi-external sort?
Add record parser for case of separator string rather than single character.
Add reocrd parser for case of record initiator that belongs to record (as in SDF).
Add record parser for stanza format, where each record begins with a fixed string
and ends with another fixed string, neither of which is part of the record. Material
between a stanza terminator and a stanza initiator is ignored.
To add merge sort, need to duplicate or localize global variables like
RecordList so as to support two lists and make sure that there are no problems running
GetFields on two input files. Once we've got the data all processed, basically we
just need to pass it to the Merge routine of the existing merge sort. Probably
should do this and test it as a first step. This won't, however, solve the memory
problem. For that, we've got to arrange to buffer the inputs and have the Merge
routine request refilling of the buffers when it runs out.
Test compatibility normalization options.
Consider adding to documentation an explanation of how to use msort to
sort a spreadsheet by exporting in a suitable format, sorting using msort, and
Add to the documentation an explanation of how to use msort to sort a database
in Standard Dictionary Format.
Check whether we are correctly handling the case in which we use locale comparison
and the system is unable to set the locale to the one we specifiy.
Need to review in more detail the substitution facility. Tests using back references
produce peculiar results. Also, should add examples to manual.
When doing regression tests, how should we distinguish between a true error and
a problem resulting from the lack of locale data? It may be necessary to use a
special exit code to indicate that a locale problem ocurred.
The current "time" comparison uses ISO8601 time zone offsets but not the full ISO8601
format for the time itself. We should probably convert this to ISO8601 time.
Case-folding does not preserve normalization, so we should either normalize
after case-folding or, if it is necessary to normalize earlier, renormalize
Need to update transformations to Unicode 5.1.
Add to test suite:
Consider making use of TRE regexp library optional.
This would entail adding the option to configure.ac
and ifdefing the portions of code relevant to:
(a) tag matching;
In the case of compilation without the regexp library:
For tag matching, we'd have to use exact matching instead.
For substitutions, we'd have to use simple string substitution.
Add IPv6 address sorting:
full form is 8 groups of 4 hex digits separated by colons
The last two groups may be replaced by four decimal bytes using dots as separators
leading 0s within a group may be omitted
any number of groups consisting entirely of 0 may be replaced by a double colon
but only one double colon is permitted in order to avoid ambiguity.
algorithm: (a) locate IPv4 tail if present and convert to IPv6
(b) expand ::
(c) treat as hybrid
Consider adding parallel merge sort (http://www.mweissmann.de/downloads/libpmsort-0.3.tar.bz2)
as an additional algorithm choice. This would yield significant speed improvements
on multi-core systems. Compare must be thread safe - need to check into this.
Any global variables that might change need to be made thread-local by means
of the declaration _thread, e.g.:
_thread int foo;
The only global variable that is written by Compare is ComparisonCnt.
KeyCount and KeyInfo are never modified once the actual sort starts.