File: lookup.man

package info (click to toggle)
lookup 1.08b-5
  • links: PTS
  • area: contrib
  • in suites: woody
  • size: 1,108 kB
  • ctags: 1,305
  • sloc: ansic: 12,634; makefile: 236; perl: 174; sh: 53
file content (1282 lines) | stat: -rw-r--r-- 36,806 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
.if \n1 .ll \n1n \" for page width.. . use cmd line arg -r1# to set width to #
.de Q \" puts quotes around the argument. End previous line with \c
\&\&\\$1\&\&\\$2\&\\c
..
.TH LOOKUP 1
.nr IN 3n
.ce 1
April 22nd, 1994

.SH NAME 
lookup \- interactive file search and display
.SH SYNOPSIS
.B lookup
[
args
]
[
.I file ...
]
.br
.SH DESCRIPTION
.I Lookup
allows the quick interactive search of text files.  It supports ASCII,
JIS-ROMAN, and Japanese EUC Packed formated text, and has an
integrated romajikana converter.
.SH THIS MANUAL
.I Lookup
is flexible for a variety of applications. This manual will, however,
focus on the application of searching Jim Breen's
.I edict
(Japanese-English dictionary) and
.I kanjidic
(kanji database). Being familiar with the content and format of these
files would be helpful. See the INFO section near the end of this
manual for information on how
to obtain these files and their documentation.
.SH OVERVIEW OF MAJOR FEATURES
The following just mentions some major features to whet your appetite
to actually read the whole manual (-:
.TP
Romaji-to-Kana Converter
.I Lookup
can convert romaji to kana for you, even\c
.Q "on the fly"
as you type.
.TP
Fuzzy Searching
Searches can be a bit\c
.Q vague
or\c
.Q fuzzy ", "
so that you'll be able to
find\c
.Q 
even if you try to search for\c
.Q Ȥ
(the proper yomikata being\c
.Q Ȥ礦 ").  "
.TP
Regular Expressions
Uses the powerful and expressive
.I "regular expression"
for searching. One can easily specify complex searches that affect\&\&I want
lines that look like such-and-such, but not like this-and-that, but that
also have this particular characteristic....\&
.TP
Wildcard ``Glob'' Patterns
Optionally, can use well-known filename wildcard patterns instead of
full-fledged regular expressions.
.TP
Filters
You can have
.I lookup
not list certain lines that would otherwise match your search, yet can
optionally save them for quick review. For example, you could have all
name-only entries from
.I edict
filtered from normal output.
.TP
Automatic Modifications
Similarly, you can do a standard search-and-replace on lines just before
they print, perhaps to remove information you don't care to see on
most searches. For example, if you're generally not interested in
.IR kanjidic "'s"
info on Chinese readings, you can have them removed from lines before
printing.
.TP
Smart Word-Preference Mode
You can have
.I lookup
list only entries with
.I "whole words"
that match your search (as opposed to an
.I embedded
match, such as finding\c
.Q the
inside\c
.Q them "), "
but if no whole-word
matches exist, will go ahead and list any entry that matches the
search.
.TP
Handy Features
Other handy features include a dynamically settable and
parameterized prompt, automatic highlighting of that part of the line
that matches your search, an output pager, readline-like input with
horizontal scrolling for long input lines, a\c
.Q .lookup
startup file, automated programability, and much more. Read on!
.SH REGULAR EXPRESSIONS
.I Lookup
makes liberal use of
.I "regular expressions"
(or
.I regex
for short) in controlling various aspects of the searches. If you are
not familiar with the important concepts of regexes, read the tutorial
appendix of this manual before continuing.
.SH JAPANESE CHARACTER ENCODING METHODS
Internally,
.I lookup
works with Japanese packed-format EUC, and all files loaded must be
encoded similarly. If you have files encoded in JIS or Shift-JIS, you
must first convert them to EUC before loading (see the INFO section for
programs that can do this).

Interactive input and output encoding, however,
may be be selected via the -jis, -sjis, and -euc invocation flags
(default is -euc),
or by various commands to the program (described later).

Make sure to use the encoding appropriate for your system.  If you're
using kterm under the X Window System, you can use
.IR lookup "'s"
-jis flag to match kterm's default JIS encoding. Or, you might use
kterm's\c
.Q "-km euc"
startup option (or menu selection) to put kterm into
EUC mode. Also, I have found kterm's scrollbar (\c
.Q "-sb -sl 500" ") "
to be quite useful.

With many\c
.Q English
fonts in Japan, the character that normally prints
as a backslash (halfwidth version of \&) in The States appears as a
yen symbol (the half-width version of \&). How it will appear on your
system is a function of what font you use and what output encoding
method you choose, which may be different from the font and method
that was used to print this manual (both of which may be different
from what's printed on your keyboard's appropriate key).  Make sure to
keep this in mind while reading.

.SH STARTUP
Let's assume that your copy of
.I edict
is in ~/lib/edict. You can start the program simply with
.nf

        lookup ~/lib/edict

.fi
You'll note that
.I lookup
spends some time building an index before the default\c
.Q "lookup>\ "
prompt appears.

.I Lookup
gains much of its search speed by constructing an index of the file(s)
to be searched. Since building the index can be time consuming itself,
you can have
.I lookup
write the built index to a file that can be
quickly loaded the next time you run the program.
Index files will be given a\c
.Q .jin
(Jeffrey's Index) ending.

Let's build the indices for
.I edict
and
.I kanjidic
now:
.nf

        lookup -write ~/lib/edict ~/lib/kanjidic

.fi
This will create the index files
.nf
       ~/lib/edict.jin
       ~/lib/kanjidic.jin
.fi
and exit.

You can now re-start
.I lookup ,
automatically using the pre-computed index files as:
.nf

       lookup ~/lib/edict ~/lib/kanjidic

.fi
You should then be presented with the prompt without having to wait
for the index to be constructed (but see the section on Operating
System concerns for possible reasons of delay).
.SH INPUT
There are basically two types of input: searches and commands.
Commands do such things as tell
.I lookup
to load more files or set flags. Searches report lines of a file that
match some search specifier (where lines to search for are specified by
one or more regular expressions).

The input syntax may perhaps at first seem odd, but has been designed
to be powerful and concise. A bit of time invested to learn it
well will pay off greatly when you need it.
.SH BRIEF EXAMPLE
Assuming you've started
.I lookup
with
.I edict
and
.I kanjidic
as noted above, let's try a few searches. In these examples, the
.nf
    search [edict]> 
.fi
is the prompt.
Note that the space after the\&\&>\&\&is part of the prompt.

Given the input:
.nf

  search [edict]> tranquil

.fi
.I lookup
will report all lines with the string\c
.Q tranquil
in them. There are currently about
a dozen such lines, two of which look like:
.nf

  ¤餫 [䤹餫] /peaceful (an)/tranquil/calm/restful/
  ¤餮 [䤹餮] /peace/tranquility/

.fi
Notice that lines with\c
.Q tranquil
\fIand\fP\c
.Q tranquility
matched? This is because\c
.Q tranquil
was embedded in the
word\&\&tranquility\&\&.
You could restrict the search to only the
\fIword\fP\c
.Q tranquil
by prepending the special\c
.Q "start of word"
symbol\&\&<\&\&and appending the special\c
.Q "end of word"
symbol\&\&>\&\&to the regex, as in:
.nf

  search [edict]> <tranquil>

.fi
This is the regular expression that says\&\&the beginning of a word,
followed by a\&\&t\&\&,\&\&r\&\&, ...,\&\&l\&\&, which is at the end of a word.\&The
current version of
.I edict
has just three matching entries.

Let's try another:
.nf

  search [edict]> fukushima

.fi
This is a search for the\c
.Q English
fukushima -- ways to search for
kana or kanji will be explored later.  Note that among the several
lines selected and printed are:
.nf

   [դ] /Fukushima (pn,pl)/
  ʡ [դ] /Kisofukushima (pl)/

.fi
By default, searches are done in a case-insensitive
manner --\&\&F\&\&and\&\&f\&\&are treated the same by
.IR lookup ,
at least so far as the matching goes.  This is called
.IR "case folding" .

Let's give a command to turn this option off,
so that\&\&f\&\&and\&\&F\&\&won't
be considered the same.  Here's an odd point about
.I "lookup's"
input syntax: the default setting is that all command lines must begin
with a space.  The space is the (default) command-introduction
character and tells the input parser to expect a command rather than a
search regular expression.
.I
It is a common mistake at first to forget the leading space when
issuing a command.  Be careful.

Try the command\c
.Q "\ fold"
to report the current status of case-folding.
Notice that as soon as you type the space, the prompt changes to
.nf
  lookup command> 
.fi
as a reminder that now you're typing a command rather than a search
specification.

.nf

  lookup command>  fold

.fi
The reply should be\c
.Q "file #0's case folding is on"
.br

You can actually turn it off with\c
.Q " fold off" ".  "
Now try the search for\c
.Q fukushima
again. Notice that this time the entries with\c
.Q Fukushima
aren't listed? Now try the search string\c
.Q Fukushima
and see that the entries with\c
.Q fukushima
aren't listed.

Case folding is usually very convenient (it also makes corresponding
katakana and hiragana match the same), so don't forget to turn it back on:
.nf

  lookup command>  fold on

.fi
.SH JAPANESE INPUT
.I Lookup
has an automatic romajikana converter. A leading\&\&/\&\&indicates that
romaji is to follow. Try typing\c
.Q /tokyo
and you'll see it convert to\c
.Q /\&Ȥ
as you type. When you hit return,
.I lookup
will list all lines that have a\&\&Ȥ\&\&somewhere in them. Well, sort
of.  Look carefully at the lines which match. Among them (if you had
case folding back on) you'll see:
.nf

  ꥹȶ [ꥹȤ礦] /Christianity/
   [Ȥ礦] /Toukyou (pl)/Tokyo/current capital of Japan/
  ̶ [Ȥä礦] /convex lens/

.fi
The first one has\&\&Ȥ\&\&in it (as\&\&Ȥ\&\&,
where the katakana\&\&\&\&matches in a case-insensitive
manner to the hiragana\&\&\&\&), but you
might consider the others unexpected, since they don't
have\c
.Q Ȥ
in them.
They're close (\&\&Ȥ\&\&and\&\&Ȥä\&\&),
but not exact. This is the result of
.IR lookup "'s\c"
.Q fuzzification "\&."

Try the command\c
.Q "\ fuzz"
(again, don't forget the command-introduction space).
You'll see that fuzzification is turned on.  Turn it off with\c
.Q "\ fuzz off"
and try\c
.Q /tokyo
(which will convert as you type) again.
This time you only get the lines which have\&\&Ȥ\&\&exactly
(well, case folding is still on, so it might match katakana as well).

In a fuzzy search, length of vowels is ignored --\&\&\&\&is
considered the same as\&\&Ȥ\&\&, for example. Also, the
presence or absence of any\&\&\&\&character is ignored, and the
pairs  ,  ,  , and   are considered identical in a
fuzzy search.

It might be convenient to consider a fuzzy search to be a\c
.Q "pronunciation search" ".  "

Special note: fuzzification will not be performed if a regular expression\c
.Q "*" ,
.Q "+" ,
or\c
.Q "?"
modifies a non-ASCII character. This is not an issue when input patterns
are filename-like wildcard patterns (discussed below).

In addition to kana fuzziness, there's one special case for kanji when
fuzziness is on. The kanji repeater mark\c
.Q ""
will be recognized such that\c
.Q ""
and\c
.Q ""
will match each-other.


Turn fuzzification back on (\&\&fuzz on\&\&), and search for all
.I "whole words"
which sound like\&\&tokyo\&\&. That search would be specified as:
.nf

  search [edict]> /<tokyo>

.fi
(again, the\c
.Q tokyo
will be converted to\c
.Q Ȥ
as you type).
My copy of
.I edict
has the three lines
.nf

   [Ȥ礦] /Toukyou (pl)/Tokyo/current capital of Japan/
  õ [Ȥä] /special permission/patent/
  ̶ [Ȥä礦] /convex lens/

.fi
This kind of whole-word romaji-to-kana search is so common, there's a
special short cut. Instead of typing\&\&/<tokyo>\&\&, you can
type\c
.Q [tokyo] ".  "
The leading\&\&[\&\&means\&\&start romaji\&\&\c
.I and\c
.Q "start of word" ".  "
Were you to type\c
.Q <tokyo>
instead (without a
leading\&\&/\&\&or\&\&[\&\&to indicate romaji-to-kana conversion), you would
get all lines with the
.I English
whole-word\c
.Q tokyo
in them.
That would be a reasonable request as well, but not what we want at the moment.

Besides the kana conversion, you can use any cut-and-paste that your
windowing system might provide to get Japanese text onto the search
line. Cut\c
.Q Ȥ
from somewhere and paste onto the search line. When
hitting enter to run the search, you'll notice that it is done without
fuzzification (even if the fuzzification flag was\c
.Q on ").  "
That's because
there's no leading\&\&/\&\&. Not only does a leading\&\&/\&\&ndicate that you
want the romaji-to-kana conversion, but that you want it done fuzzily.

So, if you'd like fuzzy cut-and-paste, just type a leading\&\&/\&\&efore
pasting (or go back and prepend one after pasting).

These examples have all been pretty simple, but you can use all the
power that regexes have to offer. As a slightly more complex example,
the search\c
.Q <gr[ea]y>
would look for all lines with
the words\c
.Q grey
or\c
.Q gray
in them.  Since the\&\&[\&\&isn't the first character
of the line, it doesn't mean what was mentioned above (start-of-word romaji).
In this case, it's just the regular-expression\c
.Q class
indicator.

If you feel more comfortable using filename-like\c
.Q "*.txt"
wildcard patterns, you can use the\c
.Q "wildcard on"
command to have patterns be considered this way.

This has been a quick introduction to the basics of
.IR lookup .

It can be very powerful and much more complex. Below is a detailed
description of its various parts and features.
.SH READLINE INPUT
The actual keystrokes are read by a readline-ish package that is
pretty standard. In addition to just typing away, the following
keystrokes are available:
.nf

  ^B  / ^F     move left/right one character on the line
  ^A  / ^E     move to the start/end of the line
  ^H  / ^G     delete one character to the left/right of the cursor
  ^U  / ^K     delete all characters to the left/right of the cursor
  ^P  / ^N     previous/next lines on the history list
  ^L or ^R     redraw the line
  ^D           delete char under the cursor, or EOF if line is empty
  ^space       force romaji conversion (^@ on some systems)

.fi
If automatic romaji-to-kana conversion is turned on (as it is by
default), there are certain situations where the conversion will be
done, as we saw above. Lower-case romaji will be converted to
hiragana, while upper-case romaji to katakana.  This usually won't
matter, though, as case folding will treat hiragana and katakana the
same in the searches.

In exactly what situations the automatic conversion will be done is
intended to be rather intuitive once the basic idea is learned.
However, at
.IR "any time" ,
one can use control-space to convert the ASCII to the left of the
cursor to kana. This can be particularly useful when needing to enter
kana on a command line (where auto conversion is never done; see below)

.SH ROMAJI FLAVOR
Most flavors of romaji are recognized. Special or non-obvious items are
mentioned below. Lowercase are converted to hiragana, uppercase to katakana.

Long vowels can be entered by repeating the vowel, or with\&\&-\&\&or\&\&^\&\&.

In situations where an\&\&n\&\&could be vague, as
in\&\&na\&\&being  or \&, use a single quote to force \&.
Therefore,\&kenichi\&עˤ while\&ken'ichi\&ע󤤤\&.

The romaji has been richly extended with many non-standard
combinations such as դ or \&, which are represented in
intuitive ways:\&fa\&עդ\&,\&che\&ע\&. etc.

Various other mappings of interest:
.nf

  wo      we      wi
  VA    VI    VU      VE    VO
  di      dzi     dya¤   dyu¤   dyo¤
  du      tzu     dzu

(the following kana are all smaller versions of the regular kana)

  xa      xi      xu      xe      xo
  xu      xtu     xwa     xka     xke
  xya     xyu     xyo

.fi
.SH INPUT SYNTAX
Any input line beginning with a space (or whichever character is set as
the command-introduction character) is processed as a command to
.I lookup
rather than a search spec.
.I Automatic
kana conversion is never done on these lines (but
.I forced
conversion with control-space may be done at any time).

Other lines are taken as search regular expressions, with the
following special cases:
.TP
?
A line consisting of a single question mark will report the current
command-introduction character (the default is a space, but can be
changed with the\c
.Q cmdchar
command).
.TP
=
If a line begins with\&\&=\&\&, the line (without the\&\&=\&\&) is taken as a
search regular expression, and no automatic (or internal -- see below)
kana conversion is done anywhere on the line (although again,
conversion can always be forced with control-space).  This can be used
to initiate a search where the beginning of the regex is the
command-introduction character, or in certain situations where automatic kana
conversion is temporarily not desired.
.TP
/
A line beginning with\&\&/\&\&indicates romaji input for the whole line.
If automatic kana conversion is turned on, the conversion will be done
in real-time, as the romaji is typed. Otherwise it will be done
internally once the line is entered.
.IR Regardless ,
the presence of the leading\&\&/\&\&indicates that any kana (either
converted or cut-and-pasted in) should be\c
.Q fuzzified
if fuzzification is turned on.

As an addition to the above, if the line doesn't begin with\&\&=\&\&or the
command-introduction character (and automatic conversion is turned
on),\&\&/\&\&
.I anywhere
on the line initiates automatic conversion for the following word.
.TP
[
A line beginning with\&\&[\&\&is taken to be romaji (just as a line
beginning with\&\&/\&\&, and the converted romaji is subject to
fuzzification (if turned on).  However, if\&\&[\&\&is used rather
than\&\&/\&\&, an implied\&\&<\&\&\c
.Q "beginning of word"
is prepended to the resulting
kana regex.  Also, any ending\&\&]\&\&on such a line is converted to the\c
.Q "ending of word"
specifier\&\&>\&\&in the resulting regex.
.PP
In addition to the above, lines may have certain prefixes and suffixes
to control aspects of the search or command:
.TP
!
Various flags can be toggled for the duration of a particular search
by prepending a\c
.Q !!
sequence to the input line.

Sequences are shown below, along with commands related to each:
.nf

 !F!   Filtration is toggled for this line (filter)
 !M!   Modification is toggled for this line (modify)
 !w!   Word-preference mode is toggled for this line (word)
 !c!   Case folding is toggled for this line (fold)
 !f!   Fuzzification is toggled for this line (fuzz)
 !W!   Wildcard-pattern mode is toggled for this line (wildcard)
 !r!   Raw. Force fuzzification off for this line
 !h!   Highlighting is toggled for this line (highlight)
 !t!   Tagging is toggled for this line (tag)
 !d!   Displaying is on for this line (display)

.fi
The letters can be combined, as in\c
.Q "!cf!" .


The final\&\&!\&\& can be omitted if the first character
after the sequence is not an ASCII letter.

If no letters are given (\c
.Q !! ").\c"
.Q !f!
is the default.

These last two points can be conveniently combined in the common case of\c
.Q !/romaji
which would be the same as\c
.Q !f!/romaji ".  "


The special sequence\c
.Q !?
lists the above, as well as indicates which are currently turned on.

Note that the letters accepted in a\c
.Q !!
sequence are many of the indicators shown by the\c
.Q files
command.
.TP
+
A\&\&+\&\&prepended to anything above will cause the final search
regex to be printed. This can be useful to see when and what kind of
fuzzification and/or internal kana conversion is happening. Consider:
.nf

  search [edict]> +/狼
  a match isȤ[]*?[]*[]*

.fi
Due to the\c
.Q leading "\&/\, "
the kana is fuzzified, which explains the
somewhat complex resulting regex. For comparison, note:
.nf

  search [edict]> +狼 
  a match isȤ狼
  search [edict]> +!/狼
  a match isȤ狼

.fi
As the\&\&+\&\&shows, these are not fuzzified. The first one has no
leading\&\&/\&\&or\&\&[\&\&to induce fuzzification, while the second has
the\&\&!\&\&line prefix (which is the default version of\c
.Q !f! "), "
which toggles fuzzification mode to\c
.Q off
for that line.
.TP
\&,
The default of all searches and most commands is to work with the
first file loaded (\fIedict\fP in these examples). One can change this
default (see the\c
.Q select
command) or, by appending a comma+digit
sequence at the end of an input line, force that line to work with
another previously-loaded file. An appended\c
.Q ,1
works with first
extra file loaded (in these examples, \fIkanjidic\fP).  An appended\c
.Q ,2
works with the 2nd extra file loaded, etc.

An appended\c
.Q ,0
works with the original first file (and can be useful
if the default file has been changed via the\c
.Q select
command).

The following sequence shows a common usage:
.nf

  search [edict]> [Ȥ]    
   [Ȥ礦] /Tokyo Metropolitan area/

.fi
cutting and pasting the  from above, and adding a\c
.Q ,1
to search
.IR kanjidic :
.nf

  search [edict]> ,1
   4554 N4769 S11  .....   ߤ䤳 {metropolis} {capital} 

.fi

.SH FILENAME-LIKE WILDCARD MATCHING
When wildcard-pattern mode is selected, patterns are considered as
extended\
.Q "*.txt" "-like"
patterns. This is often more convenient for users not familiar with
regular expressions. To have this mode selected by default, put
.nf

   default wildcard on

.fi
into your\c
.Q ".lookup"
file (see\c
.Q "STARTUP FILE"
below).

When wildcard mode is on, only \c
.Q "*" ,
.Q "?" ,
.Q "+" ,
and\c
.Q "." ,
are effected.
See the entry for the
.Q wildcard
command below for details.

Other features, such as the multiple-pattern searches (described below)
and other regular-expression metacharacters are available.

.SH MULTIPLE-PATTERN SEARCHES
You can put multiple patterns in a single search specifier.
For example consider
.nf

  search [edict]> china||japan

.fi
The first part (\&\&china\&\&) will select all lines that have\c
.Q china
in them. Then,
.IR "from among those lines" ,
the second part will select lines that have\c
.Q japan
in them.  The\c
.Q ||
is not part of any pattern -- it is
.IR lookup "'s\c"
.Q pipe
mechanism.

The above example is very different from the single pattern
\&\&china|japan\&\&which would select any line that
had either\&\&china\&\&\c
.I or\c
.Q japan ".  "
With\c
.Q china||japan ", "
you get lines that have\c
.Q china
.I "and then also"
have\c
.Q japan
as well.

Note that it is also different from the regular expression\c
.Q china.*japan
(or the wildcard pattern\c
.Q china*japan ")"
which would select lines having\c
.Q "china, then maybe some stuff, then japan" ".  "
But consider the case when\c
.Q japan
comes on the line before\c
.Q china .

Just for your comparison, the multiple-pattern
specifier\&\&china||japan\&\&is pretty
much the same as the single regular
expression\&\&china.*japan|japan.*china\&\&.

If you use\&\&|!|\&\&instead of\&\&||\&\&,
it will mean\&\&...and then lines
.I not
matching...\&\&.

Consider a way to find all lines of
.I kanjidic
that do have a Halpern number, but don't have a Nelson number:
.nf

    search [edict]> <H\\d+>|!|<N\\d+>

.fi
If you then wanted to restrict the listing to those that
.I also
had a\&\&jinmeiyou\&\&marking (\fIkanjidic\fP's\&\&G9\&\&field)
and had a reading of , you could make it:
.nf

    search [edict]> <H\\d+>|!|<N\\d+>||<G9>||<>

A prepended+would explain:

    a match is<H\\d+>
    and not<N\\d+>
    and<G9>
    and<>

.fi
The\&\&|!|\&\&and\&\&||\&\&can be used to make up to ten
separate regular expressions in any one search specification.

Again, it is important to stress that\&\&||\&\&does not
mean\&\&or\&\&(as it does in a C program,
or as\&\&|\&\&does within a regular expression).
You might find it convenient to read\&\&||\&\&as\&\&\fIand\fP also\&\&,
while reading\&\&|!|\&\&as\&\&but \fInot\fP\&\&.

It is also important to stress that any whitespace around the\c
.Q ||
and\c
.Q |!|
construct is
.I not
ignored, but kept as part of the regex on either side.
.SH COMBINATION SLOTS
Each file, when loaded, is assigned to a\c
.Q slot
via which subsequent references to the file are then made.
The slot may then be searched, have filters and flags set, etc.

A special kind of slot, called a\c
.Q "combination slot" ,
rather than representing a single file, can represent multiple
previously-loaded slots. Searches against a combination slot
(or\c
.Q "combo slot"
for short) search all those previously-loaded slots associated with it
(called\c
.Q "component slots" "). "

Combo slots are set up with the
.I combine
command.

A Combo slot has no filter or modify spec, but can have a local prompt
and flags just like normal file slots.  The flags, however, have
special meanings with combo slots. Most combo-slot flags act as a mask
against the component-slot flags; when acted upon as a member of the
combo, a component-slot's flag will be disabled if the corresponding
combo-slot's flag is disabled.

Exceptions to this are the
.IR autokana ,
.IR fuzz ,
and
.I tag
flags.

The
.I autokana
and
.I fuzz
flags governs a combo slot exactly the same as a regular file slot.
When a slot is searched as a component of a combination slot, the
component slot's
.I fuzz
(and
.IR autokana )
flags, or lack thereof, are ignored.

The
.I tag
flag is quite different altogether; see the
.I tag
command for complete information.

Consider the following output from the
.I files
command:
.nf

  
   0F wcfh da I  2762k/usr/jfriedl/lib/edict
   1FM cf  da I   705k/usr/jfriedl/lib/kanjidic
   2F  cfh@da       1k/usr/jfriedl/lib/local.words
  *3FM cfhtda    combokotoba (#2, #0)
  

.fi
See the discussion of the
.I files
command below for basic explanation of the output.

As can be seen, slot #3 is a
.I "combination slot"
with the name\c
.Q kotoba
with
.I "component slots"
two and zero. When a search is initiated on this slot, first slot #2\c
.Q "local.words"
will be searched, then slot #0\c
.Q edict ".  "

Because the combo slot's
.I filter
flag is
.IR on ,
the component slots'
.I filter
flag will remain on during the search.
The combo slot's
.I word
flag is
.IR off ,
however, so slot #0's
.I word
flag will be forced off during the search.

See the
.I combine
command for information about creating combo slots.
.SH PAGER
.I Lookup
has a built in pager (a'la \fImore\fP).  Upon filling a screen with
text, the string
.nf
    --MORE [space,return,c,q]--
.fi
is shown. A space will allow another screen of text; a return will allow
one more line. A\&\&c\&\& will allow output text to continue unpaged until
the next command. A\&\&q\&\& will flush output of the current command.

If supported by the OS,
.I lookup's
idea of the screen size is automatically set upon startup and window resize.
.I Lookup
must know the width of the screen in doing both the horizontal
input-line scrolling, and for knowing when a long line wraps on the screen.

The pager parameters can be set manually with the\c
.Q pager
command.
.SH COMMANDS
Any line intended to be a command must begin with the
command-introduction character (the default is a space, but can be set
via the\&\&cmdchar\&\&command).  However, that character is not part of
the command itself and won't be shown in the following list of
commands.

There are a number of commands that work with the
.I "selected file" 
or
.I "selected slot"
(both meaning the same thing).
The selected file is the one indicated by an appended comma+digit, as
mentioned above. If no such indication is given, the default
.I "selected file"
is used (usually the first file loaded, but can be changed with
the\&\&select\&\&command).

Some commands accept a
.I boolean
argument, such as to turn a flag on or off. In all such cases,
a\&\&1\&\&or\&\&on\&\&means to turn the flag on,
while a\&\&0\&\&or\&\&off\&\&is used to
turn it off.  Some flags are per-file
(\&\&fuzz\&\&,\&\&fold\&\&, etc.), and a
command to set such a flag
normally sets the flag for the selected file only. However, the
default value inherited by subsequently loaded files can be set
by prepending\c
.Q default
to the command. This is particularly useful in the startup file
before any files are loaded (see the section STARTUP FILE).

Items separated by\&\&|\&\&are mutually exclusive possibilities (i.e. a
boolean argument is\&\&1|on|0|off\&\&).

Items shown in brackets (\&[\&and\&\&]\&\&)
are optional. All commands that
accept a boolean argument to set a flag or mode do so optionally --
with no argument the command will report the current status of the
mode or flag.

Any command that allows an argument in quotes (such as load, etc.)
allow the use of single or double quotes.
.PP
The commands:
.br
.so c_autokana.so
.so c_clear.so
.so c_cmdchar.so
.so c_combine.so
.so c_cmd_debug.so
.so c_debug.so
.so c_describe.so
.so c_encoding.so
.so c_files.so
.so c_filter.so
.so c_fold.so
.so c_fuzz.so
.so c_help.so
.so c_highlight.so
.so c_if.so
.so c_in_code.so
.so c_limit.so
.so c_log.so
.so c_load.so
.so c_modify.so
.so c_msg.so
.so c_out_code.so
.so c_pager.so
.so c_prompt.so
.so c_rdebug.so
.so c_list_size.so
.so c_select.so
.so c_show.so
.so c_source.so
.so c_spinner.so
.so c_stats.so
.so c_tag.so
.so c_verbose.so
.so c_version.so
.so c_wild.so
.so c_word.so
.so c_quit.so
.SH STARTUP FILE
If the file\c
.Q ~/.lookup
is present, commands are read from it during
.I lookup
startup.

The file is read in the same way as the
.I source
command reads files (see that entry for more information on file
format, etc.)

However, if there had been files loaded via command-line arguments,
commands within the startup file to load files (and their associated
commands such as to set per-file flags) are ignored.

Similarly, any use of the command-line flags -euc, -jis, or -sjis
will disable in the startup file the commands dealing with setting the
input and/or output encodings.

The special treatment mentioned in the above two paragraphs only applies
to commands within the startup file itself, and does not apply to commands
in command-files that might be
.IR source d
from within the startup file.

The following is a reasonable example of a startup file:
.nf
  ## turn verbose mode off during startup file processing
  verbose off

  prompt "%C([%#]%0)%!C(%w'*'%!f'raw '%n)> "
  spinner 200
  pager on

  ## The filter for edict will hit for entries that
  ## have only one English part, and that English part
  ## having a pl or pn designation.
  load ~/lib/edict
  filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
  highlight on
  word on

  ## The filter for kanjidic will hit for entries without a
  ## frequency-of-use number.  The modify spec will remove
  ## fields with the named initial code (U,N,Q,M,E, and Y)
  load ~/lib/kanjidic
  filter "uncommon" !/<F\\d+>/
  modify /( [UNQMEY]\S+)+//g

  ## Use the same filter for my local word file,
  ## but turn off by default.
  load ~/lib/local.words
  filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
  filter off
  highlight on
  word on
  ## Want a tag for my local words, but only when
  ## accessed via the combo below
  tag off ""

  combine "words" 2 0
  select words

  ## turn verbosity back on for interactive use.
  verbose on

.fi
.SH "COMMAND-LINE ARGUMENTS"
With the use of a startup file, command-line arguments are rarely needed.
In practical use, they are only needed to create an index file, as in:
.nf

    lookup -write \fItextfile\fP

.fi
Any command line arguments that aren't flags are taken to be files
which are loaded in turn during startup.
In this case, any\&\&load\&\&,\&\&filter\&\&, etc.
commands in the startup file are ignored.

The following flags are supported:
.TP
\-help\ \ \ 
Reports a short help message and exits.
.TP
\-write\ \ \
Creates index files for the named files and exits. No
.I "startup file"
is read.
.TP
\-euc\ \ \ 
Sets the input and output encoding method to EUC (currently the default).
Exactly the same as the\&\&encoding euc\&\&command.
.TP
\-jis\ \ \ 
Sets the input and output encoding method to JIS.
Exactly the same as the\&\&encoding jis\&\&command.
.TP
\-sjis\ \ \ 
Sets the input and output encoding method to Shift-JIS.
Exactly the same as the\&\&encoding sjis\&\&command.
.TP
\-v \-version
Prints the version string and exits.
.TP
\-norc\ \ \ 
.br
Indicates that the startup file should not be read.
.TP
\-rc \fIfile\fP
The named file is used as the startup file, rather than the
default\c
.Q "~/.lookup" ".  "
It is an error for the file not to exist.
.TP
-percent \fInum\fP
.br
When an index is built, letters that appear on more than
.I num
percent (default 50) of the lines are elided from the index.  The
thought is that if a search will have to check most of the lines in a
file anyway, one may as well save the large amount of space in the
index file needed to represent that information, and the time/space
tradeoff shifts, as the indexing of oft-occurring letters provides a
diminishing return.

Smaller indexes can be made by using a smaller number.
.TP
\-noindex
.br
Indicates that any files loaded via the command line should
not be loaded with any precomputed index, but recalculated on the fly.
.TP
\-verbose
.br
Has metric tons of stats spewed whenever an index is created.
.TP
\-port ###
For the (undocumented) server configuration only, tells which port to
listen on.

.SH OPERATING SYSTEM CONSIDERATIONS
I/O primitives and behaviors vary with the operating system. On my
operating system, I can\&\&read\&\&a file by mapping it into memory, which
is a pretty much instant procedure regardless of the size of the file.
When I later access that memory, the appropriate sections of the file
are automatically read into memory by the operating system as needed.

This results in
.I lookup
starting up and presenting a prompt very quickly, but causes the first
few searches that need to check a lot of lines in the file to go more
slowly (as lots of the file will need to be read in). However, once
the bulk of the file is in, searches will go very fast. The win here is
that the rather long file-load times are amortized over the first few
(or few dozen, depending upon the situation) searches rather than always
faced right at command startup time.

On the other hand, on an operating system without the mapping ability,
.I lookup
would start up very slowly as all the files and indexes are read into memory,
but would then search quickly from the beginning, all the file already
having been read.

To get around the slow startup, particularly when many files are loaded,
.I lookup
uses
.I "lazy loading"
if it can: a file is not actually read into memory at the time the
.I load
command is given. Rather, it will be read when first actually accessed.
Furthermore, files are loaded while
.I lookup
is idle, such as when waiting for user input. See the
.I files
command for more information.
.SH REGULAR EXPRESSIONS, A BRIEF TUTORIAL
.so regex.so
.SH BUGS
Needs full support for half-width katakana and JIS X 0212-1990.
.br
Non-EUC (JIS & SJIS) items not tested well.
.br
Probably won't work on non-UNIX systems.
.br
Screen control codes (for clear and highlight commands) are hard-coded
for ANSI/VT100/kterm.
.SH AUTHOR
Jeffrey Friedl (jfriedl@nff.ncl.omron.co.jp)
.SH INFO
Jim Breen's text files
.I edict
and
.I kanjidic
and their documentation can be found in\c
.Q pub/nihongo
on ftp.cc.monash.edu.au (130.194.1.106

Information on input and output encoding and codes can be found in
Ken Lunde's
.I "Understanding Japanese Information Processing"
(\&ܸ\&) published by O'Reilly and Associates.
ISBN 1-56592-043-0.  There is also a Japanese edition published
by SoftBank.

A program to convert files among the various encoding methods is
Dr. Ken Lunde's\c
.IR jconv ,
which can also be found on ftp.cc.monash.edu.au.
.I Jconv
is also useful for converting halfwidth katakana (which
.I lookup
doesn't yet support well) to full-width.