File: mmorph.1

package info (click to toggle)
mmorph 2.3.4.2-2
  • links: PTS
  • area: main
  • in suites: woody
  • size: 912 kB
  • ctags: 904
  • sloc: ansic: 5,009; yacc: 1,211; lex: 417; makefile: 296; sh: 48; sed: 34; csh: 26
file content (477 lines) | stat: -rw-r--r-- 12,694 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
.\" @(#)mmorph.5  October 1995 ISSCO;
.\" mmorph, MULTEXT morphology tool
.\" Version 2.3, October 1995
.\" Copyright (c) 1994,1995 ISSCO/SUISSETRA, Geneva, Switzerland
.\" Dominique Petitpierre, <petitp@divsun.unige.ch>
.TH MMORPH 5  "Version 2.3, October 1995"

.SH NAME
mmorph \- MULTEXT morphology tool

.SH SYNOPSIS
.HP
information:
.br
\fBmmorph\fP
[
.B \-vh
]
.HP
parse only:
.br
\fBmmorph\fP
.B \-y
|
.B \-z
[
.B \-a
.I addfile
]
.br
.B \-m
.I morphfile
[
.B \-d
.I debug_map
] [
.B \-l
.I logfile
] [
.I infile
[
.I outfile
]]
.HP
generate:
.br
\fBmmorph\fP
.B \-c
|
.B \-n
[
.B \-t
.I trace_level
] [
.B \-s
.I trace_level
] [
.B \-a
.I addfile
]
.br
.B \-m
.I morphfile
[
.B \-d
.I debug_map
] [
.B \-l
.I logfile
] [
.I infile
[
.I outfile
]]
.HP
simple lookup:
.br
\fBmmorph\fP
[
.B \-fi
] [
.B \-b
|
.B \-k
] [
.B \-r
.I rejectfile
]
.br
.B \-m
.I morphfile
[
.B \-d
.I debug_map
] [
.B \-l
.I logfile
] [
.I infile
[
.I outfile
]]
.HP
record/field lookup:
.br
\fBmmorph\fP
.B \-C
.I classes
[
.B \-fU
] [
.B \-E
|
.B \-O
] [
.B \-b
|
[
.B \-k
] [
.B \-B 
.I class
]]
.br
.B \-m
.I morphfile
[
.B \-d
.I debug_map
] [
.B \-l
.I logfile
] [
.I infile
[
.I outfile
]]
.HP
dump database:
.br
\fBmmorph\fP
\-p
|
\-q
.br
.B \-m
.I morphfile
[
.B \-d
.I debug_map
] [
.B \-l
.I logfile
] [
.I infile
[
.I outfile
]]

.SH DESCRIPTION
.LP
In the simplest mode of operation, with just the \fB-m\fP \fImorphfile\fP
option,
.B mmorph
operates in lookup mode:  it will open an existing database called
\fImorphfile.db\fR and lookup all the string segments (usually
corresponding to words) in the input.
.LP
To create the database from the lexical entries specified in "morphfile",
use \fB-c -m\fP \fImorphfile\fP.  The file \fImorphfile\fB.db\fR should not
exist.  When the database is complete it will lookup the segments in the
input. If used ineractively (input and output is a terminal), a prompt
is printed when the program expects the user to type a segment string.
No prompting occurs in record/field mode.
.LP
To test the rule applications on the lexical entries specified in
\fImorphfile\fP, without creating a database and without looking up
segments, use \fB-n -m\fP \fImorphfile\fP.  This automatically sets the
trace level to 1 if it was not specified.
.LP
In order to do the same operations as above, but on the alternate set of
lexical entries in \fIaddfile\fP, use the extra option \fB-a \fIaddfile\fR.  
The
lexical entries in morphfile will be ignored.  This is useful when making
additions to a standard morphological description.  Be aware that 
entries added to the database \fImorphfile.db\fP do not replace
existing ones.
.SS "How to test a morphological description"
Use the \fB-n\fP option.  In the Grammar section, specify goal rules that will
match the desired results.  In the Lexicon section specify the lexical
items you want to test.  When running all rules will be applied
(recursively) to the lexical items, if the rule is a goal, then the result
of the application is printed on the output.
.LP
Suggestion:
Put the two parts mentioned above (goal rules and Lexicon section)
in separate files and reference these files with an \fB#include\fP directive
where they should occur in the main input file.
.LP
If you are using an existing description and want to test only new lexical
entries, use the options \fB-n -a\fP \fIaddfile\fP, and put the lexical
entries in \fIaddfile\fP.
.SH OPTIONS
.TP
\fB\-a\fP \fIaddfile\fP
Ignore lexical entries in morphfile, take them from \fIaddfile\fP instead.
.TP
\fB\-B\fP \fIclass\fP
Specifies the record class that occurs before the beginning of a sentence.
Capitalized words occurring just after such records will also be looked up with
all their letters converted to lowercase (according to LC_CTYPE, see below).
.TP
\fB\-b\fP
fold case before lookup. Uppercase letters are converted to lowercase
letters (according to LC_CTYPE, see below) before a word is looked up.
.TP
\fB\-C\fP \fIclasses\fP
Determines record/field mode. Specifies the record classes that should
be looked up. Class names should be separated by comma ",", TAB, space, bar "|"
or backslash "\e".
.TP
\fB\-c\fP
Create a new database for lookup.  The name of the created file is the name
of \fImorphfile\fP (\fB\-m\fP option) with suffix \fB.db\fP.  It should not
exist; if it exists the user should remove it manually before running
\fBmmorph -c\fP (this is a minimal protection against accidental
overwriting a database that might have taken a long time to create).
.TP
\fB\-d\fP \fIdebug_map\fP
Specify which debug options are wanted. Each bit in \fIdebug_map\fP
corresponds to an option.
.nf
.ta 0.3iR 1.1iR 1.3iL 2.0iL
	bit	decimal	hexadecimal purpose
	no bits	0	0x0	no debug option (default)
	1	1	0x1	debug initialisation
	2	2	0x2	debug yacc parsing
	3	4	0x4	debug rule combination
	4	8	0x8	debug spelling application
	5	16	0x10	print statistics with -p or -q options
	all bits	-1	0xffff	all debug options whatever they are
.fi
To combine options add the decimal or hexadecimal values together.
Example: -t 0x5 specifies bits (options) 1 and 4.
.TP
\fB\-E\fP
In record/field mode, extends the morphology annotations if they already
exist (the default is to leave existing annotations as is).
.TP
\fB\-O\fP
In record/field mode, overwrite the morphology annotations if they already
exist (the default is to leave existing annotations as is).
.TP
\fB\-f\fP
Flush the output after each segment lookup. This is useful only if
input and output are piped from and to a program that needs to
synchronize them.
.TP
\fB\-h\fP
Print help and exit.
.TP
\fB\-i\fP
Prepend the result of each lookup with the identifier of the input segment
it corresponds to. Currently input segments are identified by their
sequential number, starting at 0.
With this indication, the extra newline separating the solutions for
different input segments is not printed because it is not needed.
If a lookup has no solutions, only the segment identifier is printed on the
output. The segment identifier is also prepended to rejected segments.
A tab always follows the segment identifier.
.TP
\fB\-k\fP
fallback fold case.  If a word lookup failed, then convert all
uppercase letters to lowercase and try lookup again.  (conversion is done
according to LC_CTYPE, see below).
.TP
\fB\-l\fP \fIlogfile\fP
Specify the file for writing trace and error messages.
Defaults to standard error.
.TP
\fB\-m\fP \fImorphfile\fP
Specify the file containing the morphology description.  See
\fBmmorph (5)\fP for a description of the formalism's syntax.
.TP
\fB\-n\fP
No database creation or lookup (test mode).
.TP
\fB\-p\fP
Dump the typed feature structure database to outfile (or standard output).
The count of distinct tfs is given in the logfile (or standard error)
if bit 5 of debug option is set.
.TP
\fB\-q\fP
Dump the forms in the database to outfile (or standard output).
Some statistics are given in the logfile (or standard error)
if bit 5 of debug option is set.
.TP
\fB\-r\fP \fIrejectfile\fP
In non record/field mode, specifies the file where to write input segments
that could not be looked up.  Defaults to standard error.
.TP
\fB\-s\fP \fItrace_level\fP
Trace spelling rule application:
.br
0  no tracing (default).
.br
1  trace valid surface forms.
.br
2  trace rules whose lexical part match.
.br
3  trace surface left context match (surface word construction).
.br
4  trace surface right context mismatch and rule blocking.
.br
5  trace rule non blocking.
.br
A trace_level implies all preceding ones.
.TP
\fB\-t\fP \fItrace_level\fP
Specify the level of tracing for rule application:
.br
0  no tracing (default).
.br
1  trace goal rules that apply.
.br
2  trace all rules that apply, indentation indicates the recursion depth.
.br
10 trace also rules that were tried but did not apply
.br
A trace_level implies all preceding ones.
.TP
\fB\-U\fP
In record/field mode, unknown words (i.e. that were unsuccessfully looked
up) are annotated with ??\\??.
.TP
\fB\-v\fP
Print version and exit.
.TP
\fB\-y\fP
Parse only: do not process the description other than for syntax checking.
While developping a morphology description you may use this option to catch
syntax errors quickly after each modification before running it "for real".
.TP
\fB\-z\fP
implies -y. Parse and output the lexical descriptions in normalized form.
.TP
\fIinfile\fP
file containing the segments to lookup, one per line. Defaults to the standard
input.
.TP
\fIoutfile\fP
file in which the output of the program is written.  One line
per solution.  Solutions of different input segments are separated by an empty
line.  Defaults to the standard output.
.SH "WORD GRAMMAR AND SPELLING RULES"
For a detailed account of the principles and mechanisms used in
.B mmorph,
please refer to the documents cited in the SEE ALSO section below.
.LP
Briefly sketched, morphosyntactic descriptions written for mmorph describe
how words are constructed by the concatenation of morphemes, and how this
concatenation process changes the spelling of these morphemes.  The first
part, the word structure grammar, is specified by restricted context free
rewrite rules whose formalism is inspired by unification based systems (cf.
Shieber 1986).  The second part, the spelling changes, is specified by
spelling rules in a formalism based on the two level model of morphology.
This approach to morphology is described in Ritchie, Russell et.  al, 1992
and more concisely in Pulman and Hepple 1993.
.SH "ENVIRONMENT VARIABLES"
To decide which characters are displayable on the output, 
.B mmorph
uses the language specific description that
.BR setlocale (3)
sets according to the environment variable
.B LC_CTYPE.
For the languages
that are dealt with in MULTEXT it is a good idea to have that variable set
to
.B iso_8859_1.
.SH EXAMPLES
Here is a summary of the common usage of mmorph options:
.RS
.ft B
.nf

mmorph -n -m morphfile
.fi
.RE
.ft R
Test mode: reads the whole of morphfile and prints results on standard error.
No database is created, no words are looked up.
.RS
.ft B
.nf

mmorph -c -m morphfile
.fi
.RE
.ft R
Database creation:  reads the whole of morphfile and stores the results in
a database (morphfile.db).  Typed feature structures are collected in a
separate file (morphfile.tfs).  Standard input is read for words to look up
in the new database.
.RS
.ft B
.nf

mmorph -m morphfile
.fi
.RE
.ft R
Lookup mode: reads only the Alphabets, Attributes and Types sections of
morphfile. Standard input is read for words to look up according to the
existing database (mmorphfile.db and morphfile.tfs).
.RS
.ft B
.nf

mmorph -m morphfile -a addfile
.fi
.RE
.ft R
Addition mode:  ignores the Lexicon section of morphfile, but addfile is
consulted, and the results are added to the database.  Standard input is
read for words to look up according to the augmented database
(mmorphfile.db and morphfile.tfs).
.SH DIAGNOSTICS
Error messages should be self explanatory.  Please refer to
.BR mmorph (5)
for a formal description of the syntax.
.SH FILES
.TP
\fImorphfile\fR.db
database file of forms generated for descriptions in file \fImorphfile\fR
given as option \-m.
.TP
\fImorphfile\fR.tfs
database file of typed feature structures associated to \fImorphfile\fR.db.
.SH SEE\ ALSO
.BR mmorph (5),
.BR setlocale (3).
.HP
G. Russell and D. Petitpierre, \fIMMORPH \- The Multext
Morphology Program\fP, Version 2.3, October1995, MULTEXT deliverable
report for task 2.3.1.
.HP
Ritchie, G. D., G.J. Russell, A.W. Black and S.G. Pulman (1992), 
\fIComputational Morphology: Practical Mechanisms for the 
English Lexicon\fP, Cambridge Mass., MIT Press.
.HP
Pulman, S.G. and M.R. Hepple, (1993) ``A feature-based formalism
for two level phonology: a description and implementation'', \fIComputer
Speech and Language\fP 7, pp.333-358.
.HP
Shieber, S.M. (1986), \fIAn Introduction to Unification-Based
Approaches to Grammar\fP, CSLI Lecture Notes Number 4, Stanford University
.SH AUTHOR
Dominique Petitpierre, ISSCO, <petitp@divsun.unige.ch>
.SH ACKNOWLEDGEMENTS
The parser for the morphology description formalism was written
using
.BR yacc (1) 
and
.BR flex (1).  
Flex was written by Vern Paxson, <vern@ee.lbl.gov>, and is
distributed in the framework of the GNU project under the condition of the
GNU General Public License
.LP
The database module in the current version uses the
.B db
library package developed at the University of California, Berkeley by
Margo Seltzer, Keith Bostic <bostic@cs.berkeley.edu> and Ozan Yigit.
.LP
The crc procedures used for taking a signature of the typed feature
structure declarations are taken from the
.B fingerprint
package by Daniel J.\ Bernstein and use code written by Gary S.\ Brown.