1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
|
Collected release notes for the various distributed versions of mmorph
======================================================================
(See also the summary of changes in file 00CHANGES).
Release notes for mmorph version 2.3.1
--------------------------------------
Version 2.3.1 has 2 new options to handle lookup of capitalized words: -b
and -k. See manual page mmorph(1).
Together with option -B, this is a first go at the capitalization problem
and will probably change in the future. For the moment the assumption is
that converting uppercase letters to lowercase in a word will help looking
it up. For now it handles languages where there was no loss of information
during capitalization such as English or Canadian French. A more robust
mechanism should be provided for the cases where there is loss of
information, for example when a letter lost its accent when it was
capitalized like in French ( -> E).
Release notes for mmorph version 2.3
------------------------------------
Version 2.3 has 4 new options to handle record/field mode for lookup: -C
classes , -B class, -U and -E. See manual page mmorph(1).
Here are a few examples of use:
- to lookup words in record/field mode, only those in records of class T,
Compd, Abbr, Enc, Proc, Init, Tit:
mmorph -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules out.seg out.lex
- idem, with annotating the other records and marking unknown words with
??\?? (option -U).
mtlexpunct < out.seg \
| mtlexnum \
| mmorph -U -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules > out.lex
- idem, but with looking up of folded capitalized words starting sentences.
Option -B specifies what is the record class that precede the first word of
a sentence (e.g. Otag).
mmorph -B Otag -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules out.seg out.lex
Does not yet work with capitals that have lost their accent. Conversion
of uppercase to lowercase is done according to the character set in
effect given by the environment variable LC_CTYPE (cf. setlocale(3) and
locale(5)).
- two passes to extend annotations (option -E)
mmorph -C T,Compd,Abbr,Enc,Proc,Init,Tit -m rules out.seg \
| mmorph -E -C Abbr -m rules_abbreviations > out.lex
The number of options starts to get ridiculous. Next version will probably
have three programs: for generation, simple lookup, record/field lookup.
If you have problems with these changes, contact
Mr. Dominique Petitpierre | Internet: petitp@divsun.unige.ch
ISSCO, University of Geneva | X400: C=ch; ADMD=arcom; PRMD=switch;
54 route des Acacias | O=unige; OU=divsun; S=petitp
CH-1227 GENEVA (Switzerland) | Tel: +41/22/705 7117 | Fax: +41/22/300 1086
Release notes for mmorph version 2.2
------------------------------------
This version is faster and creates smaller files, an lets you factorize the
typed feature structures in the lexical declarations.
To allow this factorisation the syntax of the descriptions in @Lexicon
which was like this:
<LexDef> ::= LEXICALSTRING <BaseForm>? <Tfs>+
is replaced by this (cf "man 5 mmorph"):
<LexDef> ::= <Tfs> <Lexical>+
<Lexical> ::= LEXICALSTRING <BaseForm>?
In order to convert your files written for earlier versions of mmorph (2.1
and before), you can use the utilities you'll find in the directory ./util:
swap swaps the strings and typed feature structures of lexical
entries
factorize factorizes the lexical entries' TFS with respect to the strings
For each file containing lexical entries (@Lexicon section only, in whole
or part) you can do the following (description file name is "rules",
lexical entries file name is "lex"):
1) swap lex >lex.new
2) mv lex lex.old
3) factorize rules lex.new >lex
If the lexical entries are at the end of the file "rules", extract them in
a separate file "lex", proceed as above and then replace the lexical
entries in "rules" with the new content of "lex".
The utility "swap" does not handle #include directives. It also might need
some adjustements (it is a sed script) or pre-processing if you have fancy
layout of lexical entries.
Tell me if you have problems with this conversion. If you send me your
mmorph description files I can do this for you.
You should get a substantial reduction on the size of the descriptions and
generated database. Typically the description is only one fifth of the
original size, and the generated database one third. Your mileage may vary
(measure the databases size with "du -a" instead of "ls" to avoid counting
the holes in the files).
The utility "factorize" can be used on it own in order to restructure
lexical entries written independantly, or to merge two lexical description
files.
Version 2.2 has four new options (cf "man mmorph"):
-p to print the list of the projected tfs contained in a database:
mmorph.new -p -m morph-lexicon.fr >tfs
-q to print the list of all forms with their projected tfs:
mmorph.new -q -m morph-lexicon.fr >forms
The forms are not listed in order of generation (for that use
"mmorph -n").
If option "-d 16" is used together with option -p or -q some statistics are
displayed.
-y parse only. Do not generate anything, just check the syntax.
-z normalize, implies -y. Print on standard output the lexical entries, in
normalized form.
If you have problems with these changes, contact
Mr. Dominique Petitpierre | Internet: petitp@divsun.unige.ch
ISSCO, University of Geneva | X400: C=ch; ADMD=400net; PRMD=switch;
54 route des Acacias | O=unige; OU=divsun; S=petitp
CH-1227 GENEVA (Switzerland) | Tel: +41/22/705 7117 | Fax: +41/22/300 1086
|