File: man.tmp

package info (click to toggle)
lookup 1.08b-5
  • links: PTS
  • area: contrib
  • in suites: woody
  • size: 1,108 kB
  • ctags: 1,305
  • sloc: ansic: 12,634; makefile: 236; perl: 174; sh: 53
file content (2442 lines) | stat: -rw-r--r-- 98,256 bytes parent folder | download | duplicates (9)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442



LOOKUP(1)                                               LOOKUP(1)


                        April 22nd, 1994


NNAAMMEE
   lookup - interactive file search and display

SSYYNNOOPPSSIISS
   llooookkuupp [ args ] [ _f_i_l_e _._._.  ]

DDEESSCCRRIIPPTTIIOONN
   _L_o_o_k_u_p  allows the quick interactive search of text files.  It
   supports ASCII, JIS-ROMAN, and Japanese  EUC  Packed  formated
   text, and has an integrated romaji/c_akana converter.

TTHHIISS MMAANNUUAALL
   _L_o_o_k_u_p  is flexible for a variety of applications. This manual
   will, however, focus  on  the  application  of  searching  Jim
   Breen's   _e_d_i_c_t  (Japanese-English  dictionary)  and  _k_a_n_j_i_d_i_c
   (kanji database). Being familiar with the content  and  format
   of these files would be helpful. See the INFO section near the
   end of this manual for information  on  how  to  obtain  these
   files and their documentation.

OOVVEERRVVIIEEWW OOFF MMAAJJOORR FFEEAATTUURREESS
   The  following  just mentions some major features to whet your
   appetite to actually read the whole manual (-:

   Romaji-to-Kana Converter
      _L_o_o_k_u_p can convert romaji to kana  for  you,  even,i`Eon  the
      fly,i'Eas you type.

   Fuzzy Searching
      Searches  can  be a bit,i`Evague,i'Eor,i`Efuzzy,i'E, so that you'll
      be  able  to  find,i`EoA`i,upb,i'Eeven  if  you   try   to   search
      for,i`Eox`Eox-ox,c,i'E(the proper yomikata being,i`Eox`Eox|ox-ox,cox|,i'E).

   Regular Expressions
      Uses  the  powerful  and  expressive _r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n for
      searching. One can easily  specify  complex  searches  that
      affect,i`EI  want lines that look like such-and-such, but not
      like this-and-that, but  that  also  have  this  particular
      characteristic....,i'E

   Wildcard ``Glob'' Patterns
      Optionally,  can  use well-known filename wildcard patterns
      instead of full-fledged regular expressions.

   Filters
      You can have _l_o_o_k_u_p not list certain lines that would  oth-
      erwise  match your search, yet can optionally save them for
      quick review. For example, you  could  have  all  name-only
      entries from _e_d_i_c_t filtered from normal output.





                                                                1





LOOKUP(1)                                               LOOKUP(1)


   Automatic Modifications
      Similarly,  you  can  do  a  standard search-and-replace on
      lines just before they print, perhaps to remove information
      you  don't  care  to  see on most searches. For example, if
      you're generally not interested in _k_a_n_j_i_d_i_c's info on  Chi-
      nese  readings, you can have them removed from lines before
      printing.

   Smart Word-Preference Mode
      You can have _l_o_o_k_u_p list only entries with _w_h_o_l_e _w_o_r_d_s that
      match your search (as opposed to an _e_m_b_e_d_d_e_d match, such as
      finding,i`Ethe,i'Einside,i`Ethem,i'E), but if no whole-word matches
      exist,  will  go  ahead and list any entry that matches the
      search.

   Handy Features
      Other handy features include  a  dynamically  settable  and
      parameterized  prompt,  automatic highlighting of that part
      of the line that matches  your  search,  an  output  pager,
      readline-like  input  with  horizontal  scrolling  for long
      input lines, a,i`E.lookup,i'Estartup file, automated  programa-
      bility, and much more. Read on!

RREEGGUULLAARR EEXXPPRREESSSSIIOONNSS
   _L_o_o_k_u_p  makes liberal use of _r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n_s (or _r_e_g_e_x for
   short) in controlling various aspects of the searches. If  you
   are  not familiar with the important concepts of regexes, read
   the tutorial appendix of this manual before continuing.

JJAAPPAANNEESSEE CCHHAARRAACCTTEERR EENNCCOODDIINNGG MMEETTHHOODDSS
   Internally, _l_o_o_k_u_p works with Japanese packed-format EUC,  and
   all  files loaded must be encoded similarly. If you have files
   encoded in JIS or Shift-JIS, you must first  convert  them  to
   EUC before loading (see the INFO section for programs that can
   do this).

   Interactive input and output  encoding,  however,  may  be  be
   selected  via  the  -jis,  -sjis,  and  -euc  invocation flags
   (default is -euc), or  by  various  commands  to  the  program
   (described later).

   Make sure to use the encoding appropriate for your system.  If
   you're using kterm under the X  Window  System,  you  can  use
   _l_o_o_k_u_p's  -jis flag to match kterm's default JIS encoding. Or,
   you might use kterm's,i`E-km euc,i'Estartup option (or menu selec-
   tion)  to  put kterm into EUC mode. Also, I have found kterm's
   scrollbar (,i`E-sb -sl 500,i'E) to be quite useful.

   With many,i`EEnglish,i'Efonts in Japan, the  character  that  nor-
   mally  prints  as a backslash (halfwidth version of ,i`A) in The
   States appears as a yen symbol (the half-width version of ,i"i).
   How  it  will appear on your system is a function of what font
   you use and what output encoding method you choose, which  may
   be  different  from the font and method that was used to print



                                                                2





LOOKUP(1)                                               LOOKUP(1)


   this manual (both  of  which  may  be  different  from  what's
   printed  on  your  keyboard's  appropriate key).  Make sure to
   keep this in mind while reading.


SSTTAARRTTUUPP
   Let's assume that your copy of _e_d_i_c_t is  in  ~/lib/edict.  You
   can start the program simply with

           lookup ~/lib/edict

   You'll  note  that  _l_o_o_k_u_p  spends some time building an index
   before the default,i`Elookup> ,i'Eprompt appears.

   _L_o_o_k_u_p gains much of its search speed by constructing an index
   of the file(s) to be searched. Since building the index can be
   time consuming itself, you can have  _l_o_o_k_u_p  write  the  built
   index  to  a file that can be quickly loaded the next time you
   run the program.  Index files  will  be  given  a,i`E.jin,i'E(Jef-
   frey's Index) ending.

   Let's build the indices for _e_d_i_c_t and _k_a_n_j_i_d_i_c now:

           lookup -write ~/lib/edict ~/lib/kanjidic

   This will create the index files
          ~/lib/edict.jin
          ~/lib/kanjidic.jin
   and exit.

   You can now re-start _l_o_o_k_u_p _, automatically using the pre-com-
   puted index files as:

          lookup ~/lib/edict ~/lib/kanjidic

   You should then be presented with the prompt without having to
   wait  for  the index to be constructed (but see the section on
   Operating System concerns for possible reasons of delay).

IINNPPUUTT
   There are basically two types of input: searches and commands.
   Commands  do  such things as tell _l_o_o_k_u_p to load more files or
   set flags. Searches report lines of a  file  that  match  some
   search  specifier  (where lines to search for are specified by
   one or more regular expressions).

   The input syntax may perhaps at first seem odd, but  has  been
   designed to be powerful and concise. A bit of time invested to
   learn it well will pay off greatly when you need it.

BBRRIIEEFF EEXXAAMMPPLLEE
   Assuming you've started _l_o_o_k_u_p  with  _e_d_i_c_t  and  _k_a_n_j_i_d_i_c  as
   noted above, let's try a few searches. In these examples, the
       ,i`Esearch [edict]> ,i'E



                                                                3





LOOKUP(1)                                               LOOKUP(1)


   is  the prompt.  Note that the space after the,iAE>,i,Cis part of
   the prompt.

   Given the input:

     search [edict]> tranquil

   _l_o_o_k_u_p will report all  lines  with  the  string,i`Etranquil,i'Ein
   them.  There  are  currently  about a dozen such lines, two of
   which look like:

     o^Aox'eox<< [ox"aox1ox'eox<<] /peaceful (an)/tranquil/calm/restful/
     o^Aox'eox(R) [ox"aox1ox'eox(R)] /peace/tranquility/

   Notice that  lines  with,i`Etranquil,i'E_a_n_d,i`Etranquility,i'Ematched?
   This is because,i`Etranquil,i'Ewas embedded in the word,i`Etranquil-
   ity,i'E.  You could restrict the search to only the  _w_o_r_d,i`Etran-
   quil,i'Eby   prepending   the   special,i`Estart   of   word,i'Esym-
   bol,iAE<,i,Cand  appending   the   special,i`Eend   of   word,i'Esym-
   bol,iAE>,i,Cto the regex, as in:

     search [edict]> <tranquil>

   This  is  the regular expression that says,i`Ethe beginning of a
   word, followed by a,iAEt,i,C,,iAEr,i,C, ...,,iAEl,i,C, which is at  the
   end  of  a  word.,i'EThe current version of _e_d_i_c_t has just three
   matching entries.

   Let's try another:

     search [edict]> fukushima

   This is a search for the,i`EEnglish,i'Efukushima -- ways to search
   for kana or kanji will be explored later.  Note that among the
   several lines selected and printed are:
              _
     'E^uoA,c [ox~Oox ox.oxIb] /Fukus_hima (pn,pl)/
     `I'U'A3/4^E,ioA,c [ox-ox1/2ox~Oox ox.oxIb] /Kisofukushima (pl)/

   By default, searches are done  in  a  case-insensitive  manner
   --,iAEF,i,Cand,iAEf,i,Care  treated  the same by _l_o_o_k_u_p, at least so
   far as the matching goes.  This is called _c_a_s_e _f_o_l_d_i_n_g.

   Let's  give  a  command  to   turn   this   option   off,   so
   that,iAEf,i,Cand,iAEF,i,Cwon't  be  considered  the same.  Here's an
   odd point about _l_o_o_k_u_p_'_s input syntax: the default setting  is
   that  all command lines must begin with a space.  The space is
   the (default) command-introduction  character  and  tells  the
   input  parser to expect a command rather than a search regular
   expression.  _I_t _i_s _a _c_o_m_m_o_n _m_i_s_t_a_k_e _a_t  _f_i_r_s_t  _t_o  _f_o_r_g_e_t  _t_h_e
   _l_e_a_d_i_n_g _s_p_a_c_e _w_h_e_n issuing a command.  Be careful.

   Try  the command,i`E fold,i'Eto report the current status of case-
   folding.  Notice that as soon  as  you  type  the  space,  the



                                                                4





LOOKUP(1)                                               LOOKUP(1)


   prompt changes to
     ,i`Elookup command> ,i'E
   as  a  reminder that now you're typing a command rather than a
   search specification.


     lookup command>  fold

   The reply should be,i`Efile #0's case folding is on,i'E

   You can actually turn it off with,i`E fold off,i'E.  Now  try  the
   search   for,i`Efukushima,i'Eagain.  Notice  that  this  time  the
   entries with,i`EFukushima,i'Earen't listed?  Now  try  the  search
   string,i`EFukushima,i'Eand      see      that      the     entries
   with,i`Efukushima,i'Earen't listed.

   Case folding is usually very convenient (it also makes  corre-
   sponding  katakana and hiragana match the same), so don't for-
   get to turn it back on:

     lookup command>  fold on


JJAAPPAANNEESSEE IINNPPUUTT
   _L_o_o_k_u_p  has  an  automatic  romaji/c_akana  converter.  A  lead-
   ing,iAE/,i,Cindicates   that   romaji  is  to  follow.  Try  typ-
   ing,i`E/tokyo,i'Eand you'll see  it  convert  to,i`E/ox`Eox-ox,c,i'Eas  you
   type.  When  you  hit  return, _l_o_o_k_u_p will list all lines that
   have a,i`Eox`Eox-ox,c,i'Esomewhere in them. Well, sort of.  Look  care-
   fully  at  the  lines which match. Among them (if you had case
   folding back on) you'll see:

     =Y-=Y^e=Y1=Y`E9|,u [=Y-=Y^e=Y1=Y`Eox-ox,cox|] /Christianity/
     oA`i,upb [ox`Eox|ox-ox,cox|] /Toukyou (pl)/Tokyo/current capital of Japan/
     AE`I9|`A [ox`Eox~Aox-ox,cox|] /convex lens/

   The first one  has,i`Eox`Eox-ox,c,i'Ein  it  (as,i`E=Y`Eox-ox,c,i'E,  where  the
   katakana,i`E=Y`E,i'Ematches  in  a  case-insensitive  manner  to the
   hiragana,i`Eox`E,i'E), but you might consider the others unexpected,
   since   they   don't  have,i`Eox`Eox-ox,c,i'Ein  them.   They're  close
   (,i`Eox`Eox|ox-ox,c,i'Eand,i`Eox`Eox~Aox-ox,c,i'E), but  not  exact.  This  is  the
   result      of      _l_o_o_k_u_p's,i`Efuzzification,i'E.     Try     the
   command,i`E fuzz,i'E(again, don't forget the  command-introduction
   space).   You'll see that fuzzification is turned on.  Turn it
   off with,i`E fuzz off,i'Eand try,i`E/tokyo,i'E(which will  convert  as
   you  type)  again.   This  time  you  only get the lines which
   have,i`Eox`Eox-ox,c,i'Eexactly (well, case folding is still on,  so  it
   might match katakana as well).

   In a fuzzy search, length of vowels is ignored --,i`Eox`E,i'Eis con-
   sidered the same as,i`Eox`Eox|,i'E, for example. Also,  the  presence
   or  absence of any,i`Eox~A,i'Echaracter is ignored, and the pairs ox,
   ox^A, ox_o oxoA, ox" ox~n, and ox_a ox`o  are  considered  identical  in  a
   fuzzy search.



                                                                5





LOOKUP(1)                                               LOOKUP(1)


   It  might  be  convenient  to  consider  a  fuzzy search to be
   a,i`Epronunciation search,i'E.   Special note: fuzzification  will
   not      be     performed     if     a     regular     expres-
   sion,i`E*,i'E,,i`E+,i'E,or,i`E?,i'Emodifies a non-ASCII character. This is
   not  an  issue  when input patterns are filename-like wildcard
   patterns (discussed below).

   In addition to kana fuzziness, there's one  special  case  for
   kanji  when fuzziness is on. The kanji repeater mark,i`E,i1,i'Ewill
   be recognized such that,i`E>>pb,i1,i'Eand,i`E>>pb>>pb,i'Ewill match  each-
   other.


   Turn  fuzzification  back on (,i`Efuzz on,i'E), and search for all
   _w_h_o_l_e _w_o_r_d_s which sound like,i`Etokyo,i'E. That  search  would  be
   specified as:

     search [edict]> /<tokyo>

   (again,   the,i`Etokyo,i'Ewill  be  converted  to,i`Eox`Eox-ox,c,i'Eas  you
   type).  My copy of _e_d_i_c_t has the three lines

     oA`i,upb [ox`Eox|ox-ox,cox|] /Toukyou (pl)/Tokyo/current capital of Japan/
     AE~A,u"o [ox`Eox~Aox-ox,c] /special permission/patent/
     AE`I9|`A [ox`Eox~Aox-ox,cox|] /convex lens/

   This kind of whole-word romaji-to-kana search  is  so  common,
   there's  a  special  short cut. Instead of typing,i`E/<tokyo>,i'E,
   you  can   type,i`E[tokyo],i'E.    The   leading,iAE[,i,Cmeans,i`Estart
   romaji,i'E_a_n_d,i`Estart     of     word,i'E.      Were     you     to
   type,i`E<tokyo>,i'Einstead  (without   a   leading,iAE/,i,Cor,iAE[,i,Cto
   indicate  romaji-to-kana  conversion), you would get all lines
   with the _E_n_g_l_i_s_h whole-word,i`Etokyo,i'Ein them.  That would be  a
   reasonable  request  as  well,  but  not  what  we want at the
   moment.

   Besides the kana conversion, you  can  use  any  cut-and-paste
   that  your windowing system might provide to get Japanese text
   onto the search line. Cut,i`Eox`Eox-ox,c,i'Efrom  somewhere  and  paste
   onto  the  search  line. When hitting enter to run the search,
   you'll notice that it is done without fuzzification  (even  if
   the  fuzzification flag was,i`Eon,i'E).  That's because there's no
   leading,iAE/,i,C. Not only does a leading,iAE/,i,Cndicate  that  you
   want  the romaji-to-kana conversion, but that you want it done
   fuzzily.

   So, if you'd like  fuzzy  cut-and-paste,  just  type  a  lead-
   ing,iAE/,i,Cefore pasting (or go back and prepend one after past-
   ing).

   These examples have all been pretty simple, but  you  can  use
   all  the  power that regexes have to offer. As a slightly more
   complex example, the  search,i`E<gr[ea]y>,i'Ewould  look  for  all
   lines   with   the   words,i`Egrey,i'Eor,i`Egray,i'Ein   them.   Since



                                                                6





LOOKUP(1)                                               LOOKUP(1)


   the,iAE[,i,Cisn't the first character of  the  line,  it  doesn't
   mean what was mentioned above (start-of-word romaji).  In this
   case, it's just the regular-expression,i`Eclass,i'Eindicator.

   If you feel more comfortable using filename-like,i`E*.txt,i'Ewild-
   card  patterns,  you can use the,i`Ewildcard on,i'Ecommand to have
   patterns be considered this way.

   This has been a quick introduction to the basics of _l_o_o_k_u_p.

   It can be very powerful and much  more  complex.  Below  is  a
   detailed description of its various parts and features.

RREEAADDLLIINNEE IINNPPUUTT
   The  actual keystrokes are read by a readline-ish package that
   is pretty standard. In addition to just typing away, the  fol-
   lowing keystrokes are available:

     ^B  / ^F     move left/right one character on the line
     ^A  / ^E     move to the start/end of the line
     ^H  / ^G     delete one character to the left/right of the cursor
     ^U  / ^K     delete all characters to the left/right of the cursor
     ^P  / ^N     previous/next lines on the history list
     ^L or ^R     redraw the line
     ^D           delete char under the cursor, or EOF if line is empty
     ^space       force romaji conversion (^@ on some systems)

   If  automatic romaji-to-kana conversion is turned on (as it is
   by default), there are certain situations where the conversion
   will  be done, as we saw above. Lower-case romaji will be con-
   verted to hiragana, while upper-case romaji to katakana.  This
   usually won't matter, though, as case folding will treat hira-
   gana and katakana the same in the searches.

   In exactly what situations the automatic  conversion  will  be
   done is intended to be rather intuitive once the basic idea is
   learned.  However, at _a_n_y _t_i_m_e, one can use  control-space  to
   convert  the ASCII to the left of the cursor to kana. This can
   be particularly useful when needing to enter kana on a command
   line (where auto conversion is never done; see below)


RROOMMAAJJII FFLLAAVVOORR
   Most  flavors of romaji are recognized. Special or non-obvious
   items are mentioned below. Lowercase are  converted  to  hira-
   gana, uppercase to katakana.

   Long  vowels  can  be  entered  by  repeating  the  vowel,  or
   with,iAE-,i,Cor,iAE^,i,C.

   In situations where an,i`En,i'Ecould be vague, as in,i`Ena,i'Ebeing ox^E
   or   ox'oox/c,   use   a   single   quote  to  force  ox'o.   There-
   fore,,i"Okenichi,ix/c_aox+-ox"Eox'A while,i"Oken'ichi,ix/c_aox+-ox'ooxoxox'A.




                                                                7





LOOKUP(1)                                               LOOKUP(1)


   The romaji has been richly  extended  with  many  non-standard
   combinations  such  as  ox~Oox,i or ox'AoxS, which are represented in
   intuitive ways:,i"Ofa,ix/c_aox~Oox,i,,i"Oche,ix/c_aox'AoxS. etc.

   Various other mappings of interest:

     wo /c_aox`o     we/c_aox~n      wi/c_aox`'o
     VA /c_a=Y^o=Y,i   VI/c_a=Y^o=Y-L    VU/c_a=Y^o      VE/c_a=Y^o=YS    VO/c_a=Y^o=Y(C)
     di /c_aox^A     dzi/c_aox^A     dya/c_aox^Aox~a   dyu/c_aox^Aoxoa   dyo/c_aox^Aox,c
     du /c_aoxoA     tzu/c_aoxoA     dzu/c_aoxoA

   (the following kana are all smaller versions of the regular kana)

     xa /c_aox,i     xi/c_aox-L      xu/c_aox=Y      xe/c_aoxS      xo/c_aox(C)
     xu /c_aox=Y     xtu/c_aox~A     xwa/c_aox^i     xka/c_a=Y~o     xke/c_a=Y"o
     xya/c_aox~a     xyu/c_aoxoa     xyo/c_aox,c


IINNPPUUTT SSYYNNTTAAXX
   Any input line beginning with a space (or whichever  character
   is  set as the command-introduction character) is processed as
   a command to _l_o_o_k_u_p rather than a search spec.  _A_u_t_o_m_a_t_i_c kana
   conversion is never done on these lines (but _f_o_r_c_e_d conversion
   with control-space may be done at any time).

   Other lines are taken as search regular expressions, with  the
   following special cases:

   ?  A line consisting of a single question mark will report the
      current command-introduction character (the  default  is  a
      space, but can be changed with the,i`Ecmdchar,i'Ecommand).

   =  If  a  line begins with,iAE=,i,C, the line (without the,iAE=,i,C)
      is taken as a search regular expression, and  no  automatic
      (or internal -- see below) kana conversion is done anywhere
      on the line  (although  again,  conversion  can  always  be
      forced with control-space).  This can be used to initiate a
      search where the beginning of the  regex  is  the  command-
      introduction  character,  or  in  certain  situations where
      automatic kana conversion is temporarily not desired.

   /  A line beginning with,iAE/,i,Cindicates romaji input  for  the
      whole line.  If automatic kana conversion is turned on, the
      conversion will be done in  real-time,  as  the  romaji  is
      typed.  Otherwise  it will be done internally once the line
      is  entered.   _R_e_g_a_r_d_l_e_s_s,  the  presence  of   the   lead-
      ing,iAE/,i,Cindicates  that any kana (either converted or cut-
      and-pasted in) should  be,i`Efuzzified,i'Eif  fuzzification  is
      turned on.

      As  an  addition  to  the  above, if the line doesn't begin
      with,iAE=,i,Cor the command-introduction character (and  auto-
      matic  conversion is turned on),,iAE/,i,C _a_n_y_w_h_e_r_e on the line
      initiates automatic conversion for the following word.



                                                                8





LOOKUP(1)                                               LOOKUP(1)


   [  A line beginning with,iAE[,i,Cis taken to be romaji (just as a
      line beginning with,iAE/,i,C, and the converted romaji is sub-
      ject to fuzzification (if turned on).  However,  if,iAE[,i,Cis
      used  rather  than,iAE/,i,C,  an  implied,iAE<,i,C,i`Ebeginning  of
      word,i'Eis prepended to the resulting kana regex.  Also,  any
      ending,iAE],i,Con  such  a line is converted to the,i`Eending of
      word,i'Especifier,iAE>,i,Cin the resulting regex.

   In addition to the above, lines may have certain prefixes  and
   suffixes to control aspects of the search or command:

   !  Various flags can be toggled for the duration of a particu-
      lar search by prepending a,i`E!!,i'Esequence to the input line.

      Sequences  are  shown below, along with commands related to
      each:

       !F! ,i"A  Filtration is toggled for this line (filter)
       !M! ,i"A  Modification is toggled for this line (modify)
       !w! ,i"A  Word-preference mode is toggled for this line (word)
       !c! ,i"A  Case folding is toggled for this line (fold)
       !f! ,i"A  Fuzzification is toggled for this line (fuzz)
       !W! ,i"A  Wildcard-pattern mode is toggled for this line (wildcard)
       !r! ,i"A  Raw. Force fuzzification off for this line
       !h! ,i"A  Highlighting is toggled for this line (highlight)
       !t! ,i"A  Tagging is toggled for this line (tag)
       !d! ,i"A  Displaying is on for this line (display)

      The letters can be combined, as in,i`E!cf!,i'E.

      The final,iAE!,i,C can be omitted if the first character after
      the sequence is not an ASCII letter.

      If no letters are given (,i`E!!,i'E).,i`E!f!,i'Eis the default.

      These  last  two points can be conveniently combined in the
      common  case  of,i`E!/romaji,i'Ewhich   would   be   the   same
      as,i`E!f!/romaji,i'E.

      The special sequence,i`E!?,i'Elists the above, as well as indi-
      cates which are currently turned on.

      Note that the letters accepted in a,i`E!!,i'Esequence are  many
      of the indicators shown by the,i`Efiles,i'Ecommand.

   +  A,iAE+,i,Cprepended  to  anything  above  will cause the final
      search regex to be printed. This can be useful to see  when
      and what kind of fuzzification and/or internal kana conver-
      sion is happening. Consider:

        search [edict]> +/ox"iox<<ox"e
        a match is,i`Eox"i[ox,iox/c,i1/4]*ox~A?ox<<[ox,iox/c,i1/4]*ox"e[ox=Yox|ox_aox(C),i1/4]*,i'E

      Due  to  the,i`Eleading,i'E/  the  kana  is  fuzzified,   which



                                                                9





LOOKUP(1)                                               LOOKUP(1)


      explains the somewhat complex resulting regex. For compari-
      son, note:

        search [edict]> +ox"iox<<ox"e
        a match is,i`Eox"iox<<ox"e,i'E
        search [edict]> +!/ox"iox<<ox"e
        a match is,i`Eox"iox<<ox"e,i'E

      As the,iAE+,i,Cshows, these are not fuzzified. The  first  one
      has  no leading,iAE/,i,Cor,iAE[,i,Cto induce fuzzification, while
      the second has the,iAE!,i,Cline prefix (which is  the  default
      version   of,i`E!f!,i'E),   which  toggles  fuzzification  mode
      to,i`Eoff,i'Efor that line.

   ,  The default of all searches and most commands  is  to  work
      with  the  first file loaded (_e_d_i_c_t in these examples). One
      can change this default (see the,i`Eselect,i'Ecommand)  or,  by
      appending  a  comma+digit  sequence  at the end of an input
      line, force that line  to  work  with  another  previously-
      loaded  file.  An appended,i`E,1,i'Eworks with first extra file
      loaded    (in    these     examples,     _k_a_n_j_i_d_i_c).      An
      appended,i`E,2,i'Eworks with the 2nd extra file loaded, etc.

      An  appended,i`E,0,i'Eworks  with  the original first file (and
      can be useful if the default  file  has  been  changed  via
      the,i`Eselect,i'Ecommand).

      The following sequence shows a common usage:

        search [edict]> [ox`Eox-ox,cox`E]
        oA`i,upboA^O [ox`Eox|ox-ox,cox|ox`E] /Tokyo Metropolitan area/

      cutting and pasting the oA^O from above, and adding a,i`E,1,i'Eto
      search _k_a_n_j_i_d_i_c:

        search [edict]> oA^O,1
        oA^O 4554 N4769 S11  ..... =Y`E =Y"A oxBox"aox3 {metropolis} {capital}



FFIILLEENNAAMMEE--LLIIKKEE WWIILLDDCCAARRDD MMAATTCCHHIINNGG
   When wildcard-pattern mode is selected, patterns  are  consid-
   ered  as  extended.Q  "*.txt"  "-like" patterns. This is often
   more convenient for users not familiar  with  regular  expres-
   sions. To have this mode selected by default, put

      default wildcard on

   into your,i`E.lookup,i'Efile (see,i`ESTARTUP FILE,i'Ebelow).

   When  wildcard mode is on, only ,i`E*,i'E,,i`E?,i'E,,i`E+,i'E,and,i`E.,i'E,are
   effected.  See the entry for the ,i`Ewildcard,i'Ecommand below for
   details.




                                                               10





LOOKUP(1)                                               LOOKUP(1)


   Other   features,   such   as  the  multiple-pattern  searches
   (described below) and other regular-expression  metacharacters
   are available.


MMUULLTTIIPPLLEE--PPAATTTTEERRNN SSEEAARRCCHHEESS
   You  can  put  multiple patterns in a single search specifier.
   For example consider

     search [edict]> china||japan

   The  first  part  (,i`Echina,i'E)  will  select  all  lines   that
   have,i`Echina,i'Ein them. Then, _f_r_o_m _a_m_o_n_g _t_h_o_s_e _l_i_n_e_s, the second
   part will select lines that have,i`Ejapan,i'Ein them.  The,i`E||,i'Eis
   not part of any pattern -- it is _l_o_o_k_u_p's,i`Epipe,i'Emechanism.

   The  above  example  is very different from the single pattern
   ,i`Echina|japan,i'Ewhich  would   select   any   line   that   had
   either,i`Echina,i'E_o_r,i`Ejapan,i'E.    With,i`Echina||japan,i'E,  you  get
   lines that have,i`Echina,i'E_a_n_d _t_h_e_n _a_l_s_o have,i`Ejapan,i'Eas well.

   Note that it  is  also  different  from  the  regular  expres-
   sion,i`Echina.*japan,i'E(or        the        wildcard        pat-
   tern,i`Echina*japan,i'E)which would  select  lines  having,i`Echina,
   then  maybe  some  stuff, then japan,i'E.  But consider the case
   when,i`Ejapan,i'Ecomes on the line before,i`Echina,i'E. Just for  your
   comparison,  the  multiple-pattern specifier,i`Echina||japan,i'Eis
   pretty  much  the  same  as   the   single   regular   expres-
   sion,i`Echina.*japan|japan.*china,i'E.

   If  you  use,i`E|!|,i'Einstead of,i`E||,i'E, it will mean,i`E...and then
   lines _n_o_t matching...,i'E.

   Consider a way to find all lines of _k_a_n_j_i_d_i_c that  do  have  a
   Halpern number, but don't have a Nelson number:

       search [edict]> <H\d+>|!|<N\d+>

   If  you then wanted to restrict the listing to those that _a_l_s_o
   had a,i`Ejinmeiyou,i'Emarking (_k_a_n_j_i_d_i_c's,i`EG9,i'Efield)  and  had  a
   reading of ox/cox-, you could make it:

       search [edict]> <H\d+>|!|<N\d+>||<G9>||<ox/cox->

   A prepended,iAE+,i,Cwould explain:

       a match is,i`E<H\d+>,i'E
       and not,i`E<N\d+>,i'E
       and,i`E<G9>,i'E
       and,i`E<ox/cox->,i'E

   The,i`E|!|,i'Eand,i`E||,i'Ecan be used to make up to ten separate reg-
   ular expressions in any one search specification.




                                                               11





LOOKUP(1)                                               LOOKUP(1)


   Again,  it  is  important   to   stress   that,i`E||,i'Edoes   not
   mean,i`Eor,i'E(as it does in a C program, or as,iAE|,i,Cdoes within a
   regular  expression).   You  might  find  it   convenient   to
   read,i`E||,i'Eas,i`E_a_n_d also,i'E, while reading,i`E|!|,i'Eas,i`Ebut _n_o_t,i'E.

   It  is  also  important  to  stress that any whitespace around
   the,i`E||,i'Eand,i`E|!|,i'Econstruct is _n_o_t ignored, but kept as  part
   of the regex on either side.

CCOOMMBBIINNAATTIIOONN SSLLOOTTSS
   Each file, when loaded, is assigned to a,i`Eslot,i'Evia which sub-
   sequent references to the file are then made.   The  slot  may
   then be searched, have filters and flags set, etc.

   A  special  kind  of slot, called a,i`Ecombination slot,i'E,rather
   than representing a single file, can represent multiple previ-
   ously-loaded   slots.  Searches  against  a  combination  slot
   (or,i`Ecombo slot,i'Efor short) search all those previously-loaded
   slots  associated  with it (called,i`Ecomponent slots,i'E).  Combo
   slots are set up with the _c_o_m_b_i_n_e command.

   A Combo slot has no filter or modify  spec,  but  can  have  a
   local  prompt  and  flags  just  like  normal file slots.  The
   flags, however, have special meanings with combo  slots.  Most
   combo-slot  flags  act  as  a  mask against the component-slot
   flags; when acted upon as a member of the combo, a  component-
   slot's flag will be disabled if the corresponding combo-slot's
   flag is disabled.

   Exceptions to this are the _a_u_t_o_k_a_n_a, _f_u_z_z, and _t_a_g flags.

   The _a_u_t_o_k_a_n_a and _f_u_z_z flags governs a combo slot  exactly  the
   same  as  a  regular  file slot.  When a slot is searched as a
   component of a combination slot,  the  component  slot's  _f_u_z_z
   (and _a_u_t_o_k_a_n_a) flags, or lack thereof, are ignored.

   The  _t_a_g  flag is quite different altogether; see the _t_a_g com-
   mand for complete information.

   Consider the following output from the _f_i_l_e_s command:

     "(R)"~"3"~"~"~"~","~"~"3"~"~"~"3"~"~"~"~"~"~"~"~"~"~"~"~"~"~
     "- 0"-F wcfh d"/ca I "- 2762k"-/usr/jfriedl/lib/edict
     "- 1"-FM cf  d"/ca I "-  705k"-/usr/jfriedl/lib/kanjidic
     "- 2"-F  cfh@d"/ca   "-    1k"-/usr/jfriedl/lib/local.words
     "-*3"-FM cfhtd"/ca   "- combo"-kotoba (#2, #0)
     "+-"~",u"~"~"~"~"_o"~"~",u"~"~"~",u"~"~"~"~"~"~"~"~"~"~"~"~"~"~

   See the discussion of the _f_i_l_e_s command below for basic expla-
   nation of the output.

   As  can  be  seen,  slot  #3  is  a  _c_o_m_b_i_n_a_t_i_o_n _s_l_o_t with the
   name,i`Ekotoba,i'Ewith _c_o_m_p_o_n_e_n_t _s_l_o_t_s two and zero. When a search
   is initiated on this slot, first slot #2,i`Elocal.words,i'Ewill be



                                                               12





LOOKUP(1)                                               LOOKUP(1)


   searched, then slot #0,i`Eedict,i'E.   Because  the  combo  slot's
   _f_i_l_t_e_r  flag  is  _o_n,  the  component  slots' _f_i_l_t_e_r flag will
   remain on during the search.  The combo slot's  _w_o_r_d  flag  is
   _o_f_f, however, so slot #0's _w_o_r_d flag will be forced off during
   the search.

   See the _c_o_m_b_i_n_e command for information about  creating  combo
   slots.

PPAAGGEERR
   _L_o_o_k_u_p  has  a  built  in  pager  (a'la _m_o_r_e).  Upon filling a
   screen with text, the string
       --MORE [space,return,c,q]--
   is shown. A space will allow another screen of text; a  return
   will  allow  one  more line. A,iAEc,i,C will allow output text to
   continue unpaged until the next command.  A,iAEq,i,C  will  flush
   output of the current command.

   If  supported  by  the OS, _l_o_o_k_u_p_'_s idea of the screen size is
   automatically set upon startup and window resize.  _L_o_o_k_u_p must
   know  the  width  of  the  screen in doing both the horizontal
   input-line scrolling, and for knowing when a long  line  wraps
   on the screen.

   The pager parameters can be set manually with the,i`Epager,i'Ecom-
   mand.

CCOOMMMMAANNDDSS
   Any line intended to be a command must begin with the command-
   introduction character (the default is a space, but can be set
   via the,i`Ecmdchar,i'Ecommand).  However, that  character  is  not
   part of the command itself and won't be shown in the following
   list of commands.

   There are a number of commands that  work  with  the  _s_e_l_e_c_t_e_d
   _f_i_l_e  or  _s_e_l_e_c_t_e_d  _s_l_o_t  (both  meaning the same thing).  The
   selected file is the one indicated by an appended comma+digit,
   as  mentioned  above.  If  no  such  indication  is given, the
   default _s_e_l_e_c_t_e_d _f_i_l_e is used (usually the first file  loaded,
   but can be changed with the,i`Eselect,i'Ecommand).

   Some  commands  accept  a  _b_o_o_l_e_a_n argument, such as to turn a
   flag on or off. In all such cases, a,i`E1,i'Eor,i`Eon,i'Emeans to turn
   the  flag  on,  while  a,i`E0,i'Eor,i`Eoff,i'Eis  used to turn it off.
   Some flags are per-file (,i`Efuzz,i'E,,i`Efold,i'E, etc.), and a  com-
   mand  to  set  such  a  flag  normally  sets  the flag for the
   selected file only. However, the default  value  inherited  by
   subsequently   loaded   files   can   be   set   by   prepend-
   ing,i`Edefault,i'Eto the command. This is particularly  useful  in
   the  startup file before any files are loaded (see the section
   STARTUP FILE).

   Items separated by,iAE|,i,Care mutually  exclusive  possibilities
   (i.e. a boolean argument is,i`E1|on|0|off,i'E).



                                                               13





LOOKUP(1)                                               LOOKUP(1)


   Items  shown  in  brackets (,iAE[,i,Cand,iAE],i,C) are optional. All
   commands that accept a boolean argument to set a flag or  mode
   do  so  optionally -- with no argument the command will report
   the current status of the mode or flag.

   Any command that allows an argument in quotes (such  as  load,
   etc.)  allow the use of single or double quotes.

   The commands:

   [default] autokana [_b_o_o_l_e_a_n]
      Automatic  romaji  /c_a kana conversion for the _s_e_l_e_c_t_e_d _f_i_l_e
      is  turned  on  or   off   (default   is   on).    However,
      if,i`Edefault,i'Eis specified, the value to be inherited as the
      default by subsequently-loaded files is set (or  reported).

      Can   be   temporarily  disabled  by  a  prepended,iAE=,i,C,as
      described in the INPUT SYNTAX section.

   clear|cls
      Attempts to clear the screen. If you're using a kterm it'll
      just output the appropriate tty control sequence. Otherwise
      it'll try to run the,i`Eclear,i'Ecommand.

   cmdchar ['_o_n_e_-_b_y_t_e_-_c_h_a_r']
      The default command-introduction character is a space,  but
      it  may be changed via this command. The single quotes sur-
      rounding the character are  required.  If  no  argument  is
      given, the current value is printed.

      An  input  line  consisting  of a single question mark will
      also print the current value (useful  for  when  you  don't
      know the current value).

      Woe to the one that sets the command-introduction character
      to one of the other  special  input-line  characters,  such
      as,iAE+,i,C,,iAE/,i,C, etc.

   combine ["name"] [ _n_u_m += ] _s_l_o_t_n_u_m ...
      Creates  or  adds file slots to a combination slot (see the
      COMBINATION SLOTS section for general  information).   Note
      that,i`Ecombo,i'Emay be used as the command as well.

      Assuming  for  this  example that slots 0-2 are loaded with
      the files _c_u_r_l_y, _m_o_e, and _l_a_r_r_y, we can create  a  combina-
      tion slot that will reference all three:

        combo "three stooges" 2, 0, 1

      The command will report

        creating combo slot #3 (three stooges): 2 0 1





                                                               14





LOOKUP(1)                                               LOOKUP(1)


      The  _n_a_m_e  is  optional, and will appear in the _f_i_l_e_s list,
      and also maybe be used to specify the slot as  an  argument
      to the _s_e_l_e_c_t command.

      A  search  via the newly created combo slot would search in
      the order specified on the _c_o_m_b_o command line: first _l_a_r_r_y,
      then _c_u_r_l_y, and finally _m_o_e.

      If  you  later load another file (say, _j_e_f_f_r_e_y to slot #4),
      you can then add it to the previously made combo:

        combo 3 += 4

      (the,i`E+=,i'Ewording comes from  the  C  programming  language
      where  it  means,i`Eadd  on  to,i'E).   Adding to a combination
      always adds slots to the end of the list.

      You can take the opportunity of adding  the  slot  to  also
      change the name, if you like:

        combo "four stooges" 3 += 4

      The reply would be
        adding to combo slot #3(four stooges): 4

      A file slot can be a component of any particular combo slot
      only once.  When reporting the created or added  slot  num-
      bers,  the  number  will  appear  in  parenthesis if it had
      already been a member of the list.

      Furthermore, only _f_i_l_e slots can be  component  members  of
      _c_o_m_b_o  slots.  Attempting  to combine combo slot _X to combo
      slot _Y will result  in  having  _X's  component  file  slots
      (rater than the combo slot itself) added to _Y.

   command debug [_b_o_o_l_e_a_n]
      Sets  the  internal command parser debugging flag on or off
      (default is off).

   debug [_b_o_o_l_e_a_n]
      Sets the internal general-debugging flag on or off (default
      is off).

   describe _s_p_e_c_i_f_i_e_r
      This command will tell you how a character (or each charac-
      ter in a string) is encoded in the various  encoding  meth-
      ods:


          lookup command>  describe ",uox"
          ,i`E,uox,i'Eas  EUC  is 0xb5a4 (181 164; 265 \244)
                as  JIS  is 0x3524 ( 53  36;  65 \044 "5$")
                as KUTEN is   2104 ( 0x1504;  25 \004)
                as S-JIS is 0x8b1f (139  31; 213 \037)



                                                               15





LOOKUP(1)                                               LOOKUP(1)


      The  quotes surrounding the character or string to describe
      are optional.  You can also give a regular ASCII  character
      and   have   the  double-width  version  of  the  character
      described....   indicating,i`EA,i'E,   for    example,    would
      describe,i`E-L'A,i'E.    _S_p_e_c_i_f_i_e_r can also be a four-digit kuten
      value, in which case the character with that kuten will  be
      described.

      If  a  four-digit _s_p_e_c_i_f_i_e_r has a hex digit in it, or if it
      is preceded by,i`E0x,i'E, the value is taken as a JIS code. You
      can    precede    the   value   by,i`Ejis,i'E,,i`Esjis,i'E,,i`Eeuc,i'E,
      or,i`Ekuten,i'Eto force interpretation to the requested code.

      Finally, _s_p_e_c_i_f_i_e_r can be a string of stripped JIS (JIS w/o
      the  kanji-in  and  kanji-out  codes, or with the codes but
      without  the  escape  characters  in  them).    For   exam-
      ple,i`EF|K\,i'Ewould describe the two characters AE"u and "E"U.

   encoding [euc|sjis|jis]
      The same as the -euc, -jis, and -sjis command-line options,
      sets the encoding method for interactive input  and  output
      (or reports the current status).  More detail over the out-
      put encoding can be achieved with the _o_u_t_p_u_t _e_n_c_o_d_i_n_g  com-
      mand.  A  separate  encoding  for input can be set with the
      _i_n_p_u_t _e_n_c_o_d_i_n_g command.

   files [ - | long ]
      Lists what files are loaded in what slots, and some  status
      information about them, as with:

      "-*0"-F wcfh d"/ca I "- 3749k"-/usr/jeff/lib/edict
      "- 1"-FM cf  d"/ca I "-  754k"-/usr/jeff/lib/kanjidic

        "(R)"~"3"~"~"~"~"~","~"~"3"~"~"~"3"~"~"~"~"~"~"~"~"~"~"~"~"~"~
        "- 0"-F wcf h d "/ca I "- 2762k"-/usr/jfriedl/lib/edict
        "- 1"-FM cf   d "/ca I "-  705k"-/usr/jfriedl/lib/kanjidic
        "- 2"-F  cfWh@d "/ca   "-    1k"-/usr/jfriedl/lib/local.words
        "-*3"-FM cf htd "/ca   "- combo"-kotoba (#2, #0)
        "- 4"-   cf   d "/ca   "-  205k"-/usr/dict/words
        "+-"~",u"~"~"~"~"~"_o"~"~",u"~"~"~",u"~"~"~"~"~"~"~"~"~"~"~"~"~"~

      The first section is the slot number, with a,i`E*,i'Ebeside the
      _d_e_f_a_u_l_t _s_l_o_t (as set by the _s_e_l_e_c_t command).

      The second section shows per-slot flags and status. Letters
      are  shown  if  the flag is on, omitted if off. In the list
      below, related commands are given for each item:

        F ,i"A if there is a filter {but '#' if disabled}. (filter)
        M ,i"A if there is a modify spec {but '%' if disabled}. (modify)
        w ,i"A if word-preference mode is turned on. (word)
        c ,i"A if case folding is turned on. (fold)
        f ,i"A if fuzzification is turned on. (fuzz)
        W ,i"A if wildcard-pattern mode is turned on (wildcard)



                                                               16





LOOKUP(1)                                               LOOKUP(1)


        h ,i"A if highlighting is turned on. (highlight)
        t ,i"A if there is a tag {but @ if disabled} (tag)
        d ,i"A if found lines should be displayed (display)
        ",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i
        a ,i"A if autokana is turned on (autokana)
        P ,i"A if there is a file-specific local prompt (prompt)
        I ,i"A if the file is loaded with a precomputed index (load)
        d ,i"A if the display flag is on (display)
      Note that the letters in the upper section directly  corre-
      spond  to  the,i`E!!,i'Esequence  characters  described  in the
      INPUT SYNTAX section.

      If there is a digit at the end  of  the  flag  section,  it
      indicates  that  only  #/10  of the file is actually loaded
      into memory (as opposed to the file having been  completely
      loaded).  Unloaded  files  will  be  loaded while _l_o_o_k_u_p is
      idle, or when first used.

      If the slot is a combination slot (as slot  #3  is  in  the
      example above), that is noted in the third section, and the
      combination name and component slot numbers  are  noted  in
      the fourth. Also, for combination slots (which have no _f_i_l_-
      _t_e_r or _m_o_d_i_f_y specifications, only the flags), _F  and/or  _M
      are  shown  if  the  corresponding  mode  is allowed during
      searches via the combo slot. See the _t_a_g command  for  info
      about _t with respect to combination slots.

      If an argument (either,i`E-,i'Eor,i`Elong,i'Ewill work) is given to
      the command, a short message about what the flags  mean  is
      also printed.

   filter ["_l_a_b_e_l"] [!] /_r_e_g_e_x/[i]
      Sets the filter for the _s_e_l_e_c_t_e_d _s_l_o_t (which must contain a
      file and not a combination).  If a filter is set and active
      for  a  file, any line matching the given _r_e_g_e_x is filtered
      from the output (if the,iAE!,i,Cis put before the  _r_e_g_e_x,  any
      line  _n_o_t  matching  the  regex  is filtered).  The _l_a_b_e_l _,
      which isn't required, merely acts as documentation in vari-
      ous diagnostics.

      As   an   example,   consider   that   _e_d_i_c_t   lines  often
      have,i`E(pn),i'Eon them to indicate that the given English is a
      place  name. Often these place names can be a bother, so it
      would be nice to elide them from the output unless specifi-
      cally requested.  Consider the example:

        lookup command>  filter "name" /(pn)/
        search [edict]> [ox-ox^I]
        ,u,i,C1/2 [ox-ox^Iox|] /function/faculty/
        ,u/c,C1/4 [ox-ox^Iox|] /inductive/
        _o`oAE"u [ox-ox^Iox|] /yesterday/
        /c~a3 "name" lines filtered/c"a

      In  the  example,,iAE/,i,Ccharacters  are  used to delimit the



                                                               17





LOOKUP(1)                                               LOOKUP(1)


      start and stop of the regex (as is common  with  many  pro-
      grams).  However, any character can be used. A final,iAEi,i,C,
      if present, indicates that the regex should be applied in a
      case-insensitive manner.

      The  filter,  once set, can be enabled or disabled with the
      other form of the,i`Efilter,i'Ecommand  (described  below).  It
      can  also  be temporarily turned off (or, if disabled, tem-
      porarily turned on) by the,i`E!F!,i'Eline prefix.

      Filtered lines can optionally be saved and  then  displayed
      if     you     so     desire.     See    the,i`Esaved    list
      size,i'Eand,i`Eshow,i'Ecommands.

      Note that if you have saving  enabled  and  only  one  line
      would  be filtered, it is simply printed at the end (rather
      than print a one line message about how one line  was  fil-
      tered).

      By the way, a better,i`Ename,i'Efilter for _e_d_i_c_t would be:

        filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#

      as  it  would  filter all entries that had only one English
      section, that section being a name.  It is also an  example
      of  using  something other than,iAE/,i,Cto delimit a regex, as
      it makes things a bit easier to read.

   filter [_b_o_o_l_e_a_n]
      Enables or disables the filter for the _s_e_l_e_c_t_e_d  _s_l_o_t.   If
      no  argument is given, displays the current filter and sta-
      tus.

   [default] fold [_b_o_o_l_e_a_n]
      The _s_e_l_e_c_t_e_d _s_l_o_t's  case  folding  is  turned  on  or  off
      (default  is  on),  or reported if no argument given.  How-
      ever, if,i`Edefault,i'Eis specified, the value to be  inherited
      as  the  default  by  subsequently-loaded  files is set (or
      reported).

      Can be temporarily toggled by the,i`E!c!,i'Eline prefix.

   [default] fuzz [_b_o_o_l_e_a_n]
      The _s_e_l_e_c_t_e_d _s_l_o_t's  fuzzification  is  turned  on  or  off
      (default  is  on),  or reported if no argument given.  How-
      ever, if,i`Edefault,i'Eis specified, the value to be  inherited
      as  the  default  by  subsequently-loaded  files is set (or
      reported).

      Can be temporarily toggled by the,i`E!f!,i'Eline prefix.

   help [_r_e_g_e_x]
      Without an argument gives a short help list. With an  argu-
      ment, lists only commands whose help string is picked up by



                                                               18





LOOKUP(1)                                               LOOKUP(1)


      the given _r_e_g_e_x.

   [default] highlight [_b_o_o_l_e_a_n]
      Sets matched-string highlighting on or off for the _s_e_l_e_c_t_e_d
      _s_l_o_t  (default  off),  or  reports the current status if no
      argument is given.  However, if,i`Edefault,i'Eis specified, the
      value to be inherited as the default by subsequently-loaded
      files is set (or reported).

      If on, shows in bold or reverse video (see below) that part
      of the line which was matched by the search _r_e_g_e_x.  If mul-
      tiple regexes were given, that part matched  by  the  first
      regex is show.

      Note  that a regex might match a portion of a line which is
      later removed by a _m_o_d_i_f_y parameter. In this case, no high-
      lighting is done.

      Can be temporarily toggled by the,i`E!h!,i'Eline prefix.



   highlight style [_b_o_l_d | _i_n_v_e_r_s_e | _s_t_a_n_d_o_u_t | _<_______>]
      Sets  the  style  of  highlighting for when highlighting is
      done.  _I_n_v_e_r_s_e (inverse video) and _s_t_a_n_d_o_u_t are  the  same.
      The  default  is _b_o_l_d.  You can also give an HTML tag, such
      as,i`E<BOLD>,i'Eand items will be wrapped by  <BOLD>...</BOLD>.
      This  would be particularly useful when the output is going
      to a CGI, as when lookup has been built in a server config-
      uration.

      Note  that  the  highlighting  is  affected  by  using  raw
      VT100/xterm control sequences. This isn't particularly very
      nice if your terminal doesn't understand them. Sorry.


   if {_e_x_p_r_e_s_s_i_o_n} _c_o_m_m_a_n_d_._._.

      If  the  evaluated _e_x_p_r_e_s_s_i_o_n is non-zero, the _c_o_m_m_a_n_d will
      be executed.

      Note that {} rather than () surround the _e_x_p_r_e_s_s_i_o_n.

      _E_x_p_r_e_s_s_i_o_n may be comprised of numbers,  operators,  paren-
      thesis,  etc.   In  addition  to the normal +, -, *, and /,
      are:

         !_x  ,i"A yields 0 if _x is non-zero, 1 if _x is zero.
         _x && _y ,i"A
         !_x    ,i"A,iAEnot,i,CYields 1 if _x is zero, 0 if non-zero.
         _x & _y ,i"A,iAEand,i,CYields 1 if both _x and _y are non-zero, 0 otherwise.
         _x | _y ,i"A,iAEor,i,C Yields 1 if _x or _y (or both) is non-zero, 0 otherwise





                                                               19





LOOKUP(1)                                               LOOKUP(1)


      There may also be the special tokens _t_r_u_e and  _f_a_l_s_e  which
      are 1 and 0 respectively.

      There are also _c_h_e_c_k_e_d, _m_a_t_c_h_e_d, _p_r_i_n_t_e_d, _n_o_n_w_o_r_d, and _f_i_l_-
      _t_e_r_e_d which correspond to the values printed by  the  _s_t_a_t_s
      command.

      An  example  use might be the following kind of thing in an
      computer-generated script:


        !d!expect this line
        if {!printed} msg Oops! couldn't find "expect this line"



   input encoding [ euc | sjis ]
      Used to set (or report) what encoding  to  use  when  8-bit
      bytes  are  found  in the interactive input (all flavors of
      JIS are always recognized).  Also see the _e_n_c_o_d_i_n_g and _o_u_t_-
      _p_u_t _e_n_c_o_d_i_n_g commands.

   limit [_v_a_l_u_e]
      Sets  the number of lines to print during any search before
      aborting (or reports the current number if no value given).
      Default is 100.

      Output limiting is disabled if set to zero.


   log [ to [+] _f_i_l_e ]
      Begins  logging  the  program  output to _f_i_l_e (the Japanese
      encoding method being  the  same  as  for  screen  output).
      If,i`E+,i'Eis given, the log is appended to any text that might
      have previously been in  _f_i_l_e,  in  which  case  a  leading
      dashed line is inserted into the file.

      If no arguments are given, reports the current logging sta-
      tus.

   log  - | off
      If only,i`E-,i'Eor _o_f_f is given, any currently-opened log  file
      is closed.



   load [-now|-whenneeded] "_f_i_l_e_n_a_m_e"
      Loads the named file to the next available slot.  If a pre-
      computed index is found (as,i`E_f_i_l_e_n_a_m_e.jin,i'E)it is loaded as
      well.  Otherwise, an index is generated internally.

      The  file  to  be loaded (and the index, if loaded) will be
      loaded during idle times. This allows  a  startup  file  to
      list many files to be loaded, but not have to wait for each



                                                               20





LOOKUP(1)                                               LOOKUP(1)


      of them to load in turn. Using the ,i`E-now,i'Eflag causes  the
      load  to  happen  immediately,  while  using  the  ,i`E-when-
      needed,i'Eoption (can be shortened to ,i`E-wn,i'E)causes the load
      to happen only when the slot is first accessed.

      Invoke _l_o_o_k_u_p as
         % lookup -writeindex _f_i_l_e_n_a_m_e
      to  generate  and  write  an index file, which will then be
      automatically used in the future.

      If the file has already been loaded, the file  is  not  re-
      read,  but the previously-read file is shared. The new slot
      will, however, have its own separate flags, prompt, filter,
      etc.

   modify /_r_e_g_e_x/_r_e_p_l_a_c_e/[ig]
      Sets the _m_o_d_i_f_y parameter for the _s_e_l_e_c_t_e_d _f_i_l_e.  If a file
      has a  modify  parameter  associated  with  it,  each  line
      selected  during  a  search will have that part of the line
      which matches _r_e_g_e_x (if any) replaced  by  the  _r_e_p_l_a_c_e_m_e_n_t
      string before being printed.

      Like  the  _f_i_l_t_e_r command, the delimiter need not be,iAE/,i,C;
      any non-space character is fine.  If a final,iAEi,i,Cis given,
      the  regex  is  applied  in a case-insensitive manner. If a
      final,iAEg,i,Cis given, the replacement is done to all matches
      in  the  line,  not  just  the  first part that might match
      _r_e_g_e_x.

      The _r_e_p_l_a_c_e_m_e_n_t may have embedded,i`E1,i'E, etc. in it to refer
      to  parts  of the matched text (see the tutorial on regular
      expressions).

      The modify parameter, once set, may be enabled or  disabled
      with  the  other  form  of  the  modify  command (described
      below).   It  may   also   be   temporarily   toggled   via
      the,i`E!m!,i'Eline prefix.

      A silly example for the ultra-nationalist might be:
        modify /<Japan>/Dainippon Teikoku/g
      So that a line such as
        AE"u9|"a [ox"Eox'Aox(R)ox'o] /Bank of Japan/
      would come out as
        AE"u9|"a [ox"Eox'Aox(R)ox'o] /Bank of Dainippon Teikoku/

      As a real example of the modify command with _k_a_n_j_i_d_i_c, con-
      sider that it is likely that one is not interested  in  all
      the  various  fields  each entry has.  The following can be
      used to remove the info on the U, N, Q, M, E, B, C,  and  Y
      fields from the output:

        modify /( [UNQMECBY]\S+)+//g,1

      It's  sort  of  complex,  but  works.   Note  that here the



                                                               21





LOOKUP(1)                                               LOOKUP(1)


      _r_e_p_l_a_c_e_m_e_n_t part is empty, meaning  to  just  remove  those
      parts  which  matched.   The result of such a search of AE"u
      would normally print

          AE"u 467c U65e5 N2097 B72 B73 S4 G1 H3027 F1 Q6010.0 MP5.0714 ,i`A
          MN13733 E62 Yri4 P3-3-1 =Y"E=Y'A =Y,=Y"A ox`O -ox'O -ox<< {day}

      but with the above modify spec, appears more simply as

          AE"u 467c S4 G1 H3027 F1 P3-3-1 =Y"E=Y'A =Y,=Y"A ox`O -ox'O -ox<< {day}


   modify [_b_o_o_l_e_a_n]
      Enables or disables the modify parameter for  the  _s_e_l_e_c_t_e_d
      _f_i_l_e, or report the current status if no argument is given.

   msg _s_t_r_i_n_g
      The given _s_t_r_i_n_g is printed.

      Most likely used in a script as the target command of an _i_f
      command.

   output encoding [ euc | sjis | jis...]
      Used  to  set  exactly what kind of encoding should be used
      for program output (also see the _i_n_p_u_t  _e_n_c_o_d_i_n_g  command).
      Used  when  the _e_n_c_o_d_i_n_g command is not detailed enough for
      one's needs.

      If no argument is given, reports the current output  encod-
      ing.   Otherwise,  arguments  can usually be any reasonable
      dash-separated combination of:

        euc
           Selects EUC for the output encoding.

        sjis
           Selects Shift-JIS for the output encoding.

        jis[78|83|90][-ascii|-roman]
           Selects JIS for the output encoding.  If no year  (78,
           83,  or  90) given, 78 is used. Can optionally specify
           that,i`EEnglish,i'Eshould be encoded as regular _A_S_C_I_I (the
           default when JIS selected) or as _J_I_S_-_R_O_M_A_N.

        212
           Indicates  that  JIS  X0212-1990  should  be supported
           (ignored for Shift-JIS output).

        no212
           Indicates that JIS X0212-1990 should be  not  be  sup-
           ported  (default setting).  This places JIS X0212-1990
           characters under the domain of _d_i_s_p, _n_o_d_i_s_p, _c_o_d_e,  or
           _m_a_r_k (described below).




                                                               22





LOOKUP(1)                                               LOOKUP(1)


        hwk
           Indicates  that  _half  _width _kana should be left as-is
           (default setting).

        nohwk
           Indicates that _half _width _kana should be stripped from
           the output.  _(_n_o_t _y_e_t _i_m_p_l_e_m_e_n_t_e_d_)_.

        foldhwk
           Indicates  that  _half  _width  _kana should be folded to
           their full-width counterparts.  _(_n_o_t _y_e_t _i_m_p_l_e_m_e_n_t_e_d_)_.

        disp
           Indicates that _n_o_n_-_d_i_s_p_l_a_y_a_b_l_e characters (such as JIS
           X0212-1990 while the output encoding method is  Shift-
           JIS)  should  be  passed  along  anyway  (most  likely
           resulting in screen garbage).

        nodisp
           Indicates that _n_o_n_-_d_i_s_p_l_a_y_a_b_l_e  characters  should  be
           quietly stripped from the output.

        code
           Indicates  that  _n_o_n_-_d_i_s_p_l_a_y_a_b_l_e  characters should be
           printed as their octal codes (default setting).

        mark
           Indicates that _n_o_n_-_d_i_s_p_l_a_y_a_b_l_e  characters  should  be
           printed as,i`E,i'u,i'E.

        Of  course,  not  all  options make sense in all combina-
        tions, or at all times.  When the current (or new) output
        encoding is reported, a complete and exact specifier rep-
        resenting the output encoding selected.  An example might
        be,i`Ejis78-ascii-no212-hwk-code,i'E.

   pager [ _b_o_o_l_e_a_n | _s_i_z_e ]
      Turns  on  or  off  an  output pager, sets it's idea of the
      screen size, or reports the current status.

      _S_i_z_e can be a single number indicating the number of  lines
      to  be printed between,i`EMORE?,i'Eprompts (usually a few lines
      less than the total screen height,  the  default  being  20
      lines).  It can also be two numbers in the form,i`E#x#,i'Ewhere
      the first number is the width  (in  half-width  characters;
      default  80) and the second is the lines-per-page as above.

      If the pager is on, every page of  output  will  result  in
      a,i`EMORE?,i'Eprompt,   at   which   there  are  four  possible
      responses. A space will allow one more full page to  print.
      A  return  will  allow  one  more  line.  A,iAEc,i,C(for,i`Econ-
      tinue,i'E) will all the rest of the output (for  the  current
      command)     to     proceed     without     pause,    while
      a,iAEq,i,C(for,i`Equit,i'E) will flush the output for the  current



                                                               23





LOOKUP(1)                                               LOOKUP(1)


      command.

      If  supported  by the OS, the pager size parameters are set
      appropriately from the window size upon startup  or  window
      resize.

      The default pager status is,i`Eoff,i'E.

   [local] prompt "_s_t_r_i_n_g"
      Sets  the prompt string.  If,i`Elocal,i'Eis indicated, sets the
      prompt string for the _s_e_l_e_c_t_e_d _s_l_o_t only.  Otherwise,  sets
      the global default prompt string.

      Prompt  strings  may  have  the  special  %-sequences shown
      below, with related commands given in parenthesis:

         %N ,i"A the _d_e_f_a_u_l_t _s_l_o_t's file or combo name.
         %n ,i"A like %N, but any leading path is not shown if a filename.
         %# ,i"A the _d_e_f_a_u_l_t _s_l_o_t's number.
         %S ,i"A the,i`Ecommand-introduction,i'Echaracter (cmdchar)
         %0 ,i"A the running program's name
         %F='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if filtering enabled (filter)
         %M='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if modification enabled (modify)
         %w='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if word mode on (word)
         %c='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if case folding on (fold)
         %f='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if fuzzification on (fuzz).
         %W='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if wildcard-pat. mode on (wildcard).
         %d='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if displaying on (display).
         %C='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if currently entering a command.
         %l='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if logging is on (log).
         %L ,i"A the name of the current output log, if any (log)

      For the  tests  (%f,  etc),  you  can  put,iAE!,i,Cjust  after
      the,iAE%,i,Cto  reverse  the  sense  of the test (i.e. %!f="no
      fuzz").  The reverse of %F is if a filter is installed  but
      disabled  (i.e.   _s_t_r_i_n_g will never be shown if there is no
      filter for the default file).  The modify %M works compara-
      bly.

      Also,  you  can  use an alternative form for the items that
      take an argument string. Replacing the quotes  with  paren-
      theses  will  treat _s_t_r_i_n_g as a recursive prompt specifier.
      For example, the specifier

           %C='command'%!C(%f='fuzzy 'search:)

      would result in a,i`Ecommand,i'Eprompt if entering  a  command,
      while  it  would  result  in  either  a,i`Efuzzy  search:,i'Eor
      a,i`Esearch:,i'Eprompt if not entering a command.   The  paren-
      thesized constructs may be nested.

      Note  that  the letters of the test constructs are the same
      as the letters for the,i`E!!,i'Esequences  described  in  INPUT
      SYNTAX.



                                                               24





LOOKUP(1)                                               LOOKUP(1)


      An example of a nice prompt command might be:

              prompt "%C(%0 command)%!C(%w'*'%!f'raw '%n)> "

      With  this  prompt specification, the prompt would normally
      appear as,i`E_f_i_l_e_n_a_m_e> ,i'Ebut when fuzzification is turned off
      as,i`Eraw  _f_i_l_e_n_a_m_e> ,i'E.   And if word-preference mode is on,
      the whole thing has a,i`E*,i'Eprepended.  However if a  command
      is  being  entered, the prompt would then become,i`E_n_a_m_e com-
      mand,i'E, where _n_a_m_e was the program's  name  (system  depen-
      dent, but most likely,i`Elookup,i'E).

      The   default   prompt   format   string   is,i`E%C(%0   com-
      mand)%!C(search [%n])> ,i'E.

   regex debug [_b_o_o_l_e_a_n]
      Sets the internal regex debugging flag (turn on if you want
      billions of lines of stuff spewed to your screen).

   saved list size [_v_a_l_u_e]
      During  a search, lines that match might be elided from the
      output due to filters or word-preference mode.   This  com-
      mand  sets  the number of such lines to remember during any
      one search, such that they may be later  displayed  (before
      the next search) by the _s_h_o_w command.

      The default is 100.

   select [ _n_u_m | _n_a_m_e | . ]
      If _n_u_m is given, sets the _d_e_f_a_u_l_t _s_l_o_t to that slot number.
      If _n_a_m_e is given, sets the _d_e_f_a_u_l_t _s_l_o_t to the  first  slot
      found  with  a file (or combination) loaded with that name.
      The incantation,i`Eselect .,i'Emerely sets the default slot  to
      itself,  which can be useful in script files where you want
      to indicate that any subsequent flags changes  should  work
      with  whatever  file was the default at the time the script
      was _s_o_u_r_c_ed.

      If no argument is given, simply reports the current _d_e_f_a_u_l_t
      _s_l_o_t (also see the _f_i_l_e_s command).

      In  command  files loaded via the _s_o_u_r_c_e command, or as the
      startup file, commands dealing with per-slot items  (flags,
      local  prompt,  filters,  etc.)  work with the file or slot
      last  _s_e_l_e_c_ted.   The  last  such  selected  slot   remains
      selected once the load is complete.

      Interactively,  the  default  slot will become the _s_e_l_e_c_t_e_d
      _s_l_o_t for subsequent searches and commands that aren't  aug-
      mented  with  an  appended,i`E,#,i'E(as  described in the INPUT
      SYNTAX section).

   show
      Shows any lines elided from the previous search (either due



                                                               25





LOOKUP(1)                                               LOOKUP(1)


      to a _f_i_l_t_e_r or _w_o_r_d_-_p_r_e_f_e_r_e_n_c_e _m_o_d_e).

      Will  apply any modifications (see the,i`Emodify,i'Ecommand) if
      modifications  are  enabled  for  the  file.  You  can  use
      the,i`E!m!,i'Eline  prefix  as  well with this command (in this
      case, put the,i`E!m!,i'E_b_e_f_o_r_e  the  command-indicator  charac-
      ter).

      The  length  of  the  list is controlled by the,i`Esaved list
      size,i'Ecommand.



   source "_f_i_l_e_n_a_m_e"
      Commands are read from _f_i_l_e_n_a_m_e and executed.

      In the file, all lines beginning  with,i`E#,i'Eare  ignored  as
      comments (note that comments must appear on a line by them-
      selves, as,i`E#,i'Eis a reasonable  character  to  have  within
      commands).

      Lines       whose      first      non-blank      characters
      is,i`E=,i'E,,i`E!,i'E,or,i`E+,i'Eare  considered  searches,  while  all
      other  non-blank  lines  are  considered  _l_o_o_k_u_p  commands.
      Therefore, there is no need for lines  to  begin  with  the
      command-introduction character. However, leading whitespace
      is always OK.

      For search lines, take care that any trailing whitespace is
      deleted if undesired, as trailing whitespace (like all non-
      leading whitespace) is kept as part of the regular  expres-
      sion.

      Within  a command file, commands that modify per-file flags
      and such always work  with  the  most-recently  loaded  (or
      selected) file. Therefore, something along the lines of

        load "my.word.list"
        set word on

        load "my.kanji.list"
        set word off
        set local prompt "enter kanji> "

      would word as might make intuitive sense.

      Since  a script file must have a _l_o_a_d, or _s_e_l_e_c_t before any
      per-slot flag is set, one can use,i`Eselect .,i'Eto  facilitate
      command scripts that are to work with,i`Ethe current slot,i'E.

   spinner [_v_a_l_u_e]
      Set  the value of the spinner (A silly little feature).  If
      set to a non-zero value, will cause a spinner to spin while
      a  file  is being checked, one increment per _v_a_l_u_e lines in



                                                               26





LOOKUP(1)                                               LOOKUP(1)


      the file actually checked  against  the  search  specifier.
      Default is off (i.e. zero).

   stats
      Shows  information  about  how  many lines of the text file
      were checked against the last  search  specifier,  and  how
      many lines matched and were printed.

   tag [_b_o_o_l_e_a_n] ["_s_t_r_i_n_g"]
      Enable, disable, or set the tag for the _s_e_l_e_c_t_e_d _s_l_o_t.

      If  the slot is not a combination slot, a tag _s_t_r_i_n_g may be
      set (the quotes are required).

      If a tag string is set and enabled for a file,  the  string
      is prepended to each matching output line printed.

      Unlike  the  _f_i_l_t_e_r and _m_o_d_i_f_y commands which automatically
      enable the function when a parameter is set, a _t_a_g  is  not
      automatically  enabled  when  set.  It can be enabled while
      being set via,i`E'tag,i'Eonor could be enabled subsequently via
      just,i`Etag  on,i'E If the selected slot is a combination slot,
      only the  enable/disable  status  may  be  changed  (on  by
      default). No tag string may be set.

      The  reason  for  the special treatment lies in the special
      nature of how tags work  in  conjunction  with  combination
      files.

      During  a  search  when  the selected slot is a combination
      slot, each file which is a member of  the  combination  has
      its  per-file flags disabled if their corresponding flag is
      disabled in the original combination slot. This allows  the
      combination  slot's  flags  to  act as a,i`Emask,i'Eto blot out
      each component file's per-file flags.

      The tag flag, however, is special  in  that  the  component
      file's  tag flag is turned _o_n if the combination slot's tag
      flag is turned on (and, of course, the component file has a
      tag string registered).

      The intended use of this is that one might set a (disabled)
      tag to a file, yet _d_i_r_e_c_t searches against that  file  will
      have no prepended tag.  However, if the file is searched as
      part of a combination slot (and the combination slot's  tag
      flag  is  on),  the  tag _w_i_l_l be prepended, allowing one to
      easily understand from which file an output line comes.

   verbose [_b_o_o_l_e_a_n]
      Sets verbose mode on or off, or reports the current  status
      (default  on).   Many commands reply with a confirmation if
      verbose mode is turned on.





                                                               27





LOOKUP(1)                                               LOOKUP(1)


   version
      Reports the current version of the program.

   [default] wildcard [_b_o_o_l_e_a_n]
      The _s_e_l_e_c_t_e_d _s_l_o_t's patterns are  considerd  wildcard  pat-
      terns  if turned on, regular expressions if turned off. The
      current status is reported if no argument given.   However,
      if,i`Edefault,i'Eis specified, the pattern-type to be inherited
      as the default by  subsequently-loaded  files  is  set  (or
      reported).

      Can be temporarily toggled by the,i`E!W!,i'Eline prefix.

      When  wildcard patterns are selected, the changed metachar-
      acters  are:,i`E*,i'Emeans,i`Eany   stuff,i'E,,i`E?,i'Emeans,i`Eany   one
      character,i'E,while,i`E+,i'Eand,i`E.,i'Ebecome unspecial. Other regex
      items such as,i`E|,i'E,,i`E(,i'E,,i`E[,i'E,etc. are unchanged.

      What,i`E*,i'Eand,i`E?,i'Ewill actually match depends upon the  sta-
      tus  of  word-mode,  as  well as on the pattern itself.  If
      word-mode is on, or if the pattern begins with  the  start-
      of-word,i`E<,i'Eor,i`E[,i'E,only non-spaces will be matched. Other-
      wise, any character will be matched.

      In summary,when wildcard mode is on, the input  pattern  is
      effected in the following ways:

         * is changed to the regular expression .* or
         ? is changed to the regular expression . or    + is changed to the regular expression +
         . is changed to the regular expression .



      Because   filename   patterns  are  often  called,i`Efilename
      globs,i'E,the command,i`Eglob,i'Ecan be used in  place  of,i`Ewild-
      card,i'E.

   [default] word|wordpreference [_b_o_o_l_e_a_n]
      The  selected  file's  word-preference mode is turned on or
      off (default is off), or reports the current setting if  no
      argument is specified.  However, if,i`Edefault,i'Eis specified,
      the value to be inherited as the default  by  subsequently-
      loaded files is set (or reported).

      In word-preference mode, entries are searched for _a_s _i_f the
      search  regex  had  a  leading,iAE<,i,Cand  a  trailing,iAE>,i,C,
      resulting  in  a list of entries with a whole-word match of
      the regex.  However, if there are none, but there _a_r_e  non-
      word  entries,  the  non-word entries are shown (the,i`Esaved
      list,i'Eis used for this -- see that command). This  make  it
      an,i`Eif  there are whole words like this, show me, otherwise
      show me whatever you've got,i'Emode.

      If there are both word and non-word entries,  the  non-word



                                                               28





LOOKUP(1)                                               LOOKUP(1)


      entries  are  remembered in the saved list (rather than any
      possible filtered entries being remembered there).

      One caveat: if a search matches a line  in  more  than  one
      place,  and the first is _n_o_t a whole-word, while one of the
      others _i_s, the line will  be  listed  considered  non-whole
      word.  For example, the search,i"Ojapan,ixwith word-preference
      mode on will not list an entry such  as,i`E/Japanese/language
      in Japan/,i'E, as the first,i`EJapan,i'Eis part of,i`EJapanese,i'Eand
      not a whole word.   If  you  really  need  just  whole-word
      entries, use the,iAE<,i,Cand,iAE>,i,Cyourself.

      The mode may be temporarily toggled via the,i`E!w!,i'Eline pre-
      fix.

      The rules defining what  lines  are  filtered,  remembered,
      discarded,  and  shown  for  each permutation of search are
      rather complex, but the end result is rather intuitive.

   quit | leave | bye  | exit
      Exits the program.

SSTTAARRTTUUPP FFIILLEE
   If the file,i`E~/.lookup,i'Eis present, commands are read from  it
   during _l_o_o_k_u_p startup.

   The  file  is read in the same way as the _s_o_u_r_c_e command reads
   files (see that entry for more  information  on  file  format,
   etc.)

   However, if there had been files loaded via command-line argu-
   ments, commands within the startup file  to  load  files  (and
   their  associated  commands such as to set per-file flags) are
   ignored.

   Similarly, any use of the command-line flags  -euc,  -jis,  or
   -sjis  will  disable  in the startup file the commands dealing
   with setting the input and/or output encodings.

   The special treatment mentioned in the  above  two  paragraphs
   only  applies  to commands within the startup file itself, and
   does not apply to commands  in  command-files  that  might  be
   _s_o_u_r_c_ed from within the startup file.

   The following is a reasonable example of a startup file:
     ## turn verbose mode off during startup file processing
     verbose off

     prompt "%C([%#]%0)%!C(%w'*'%!f'raw '%n)> "
     spinner 200
     pager on

     ## The filter for edict will hit for entries that
     ## have only one English part, and that English part



                                                               29





LOOKUP(1)                                               LOOKUP(1)


     ## having a pl or pn designation.
     load ~/lib/edict
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     highlight on
     word on

     ## The filter for kanjidic will hit for entries without a
     ## frequency-of-use number.  The modify spec will remove
     ## fields with the named initial code (U,N,Q,M,E, and Y)
     load ~/lib/kanjidic
     filter "uncommon" !/<F\d+>/
     modify /( [UNQMEY])+//g

     ## Use the same filter for my local word file,
     ## but turn off by default.
     load ~/lib/local.words
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     filter off
     highlight on
     word on
     ## Want a tag for my local words, but only when
     ## accessed via the combo below
     tag off ",i~O"

     combine "words" 2 0
     select words

     ## turn verbosity back on for interactive use.
     verbose on


CCOOMMMMAANNDD--LLIINNEE AARRGGUUMMEENNTTSS
   With  the  use  of  a startup file, command-line arguments are
   rarely needed.  In practical use, they are only needed to cre-
   ate an index file, as in:

       lookup -write _t_e_x_t_f_i_l_e

   Any  command  line arguments that aren't flags are taken to be
   files which are loaded in turn during startup.  In this  case,
   any,i`Eload,i'E,,i`Efilter,i'E, etc.  commands in the startup file are
   ignored.

   The following flags are supported:

   -help
      Reports a short help message and exits.

   -write  Creates index files for the named files and exits. No
      _s_t_a_r_t_u_p _f_i_l_e is read.

   -euc
      Sets the input and output encoding method to EUC (currently
      the   default).    Exactly   the   same   as  the,i`Eencoding



                                                               30





LOOKUP(1)                                               LOOKUP(1)


      euc,i'Ecommand.

   -jis
      Sets the input and output encoding method to JIS.   Exactly
      the same as the,i`Eencoding jis,i'Ecommand.

   -sjis
      Sets  the  input  and  output encoding method to Shift-JIS.
      Exactly the same as the,i`Eencoding sjis,i'Ecommand.

   -v -version
      Prints the version string and exits.

   -norc
      Indicates that the startup file should not be read.

   -rc _f_i_l_e
      The named file is used as the startup file, rather than the
      default,i`E~/.lookup,i'E.   It  is an error for the file not to
      exist.

   -percent _n_u_m
      When an index is built, letters that appear  on  more  than
      _n_u_m  percent  (default 50) of the lines are elided from the
      index.  The thought is that if a search will have to  check
      most  of  the  lines in a file anyway, one may as well save
      the large amount of space in the index file needed to  rep-
      resent   that  information,  and  the  time/space  tradeoff
      shifts, as the indexing of oft-occurring letters provides a
      diminishing return.

      Smaller indexes can be made by using a smaller number.

   -noindex
      Indicates that any files loaded via the command line should
      not be loaded with any precomputed index, but  recalculated
      on the fly.

   -verbose
      Has  metric  tons of stats spewed whenever an index is cre-
      ated.

   -port ###
      For the (undocumented)  server  configuration  only,  tells
      which port to listen on.


OOPPEERRAATTIINNGG SSYYSSTTEEMM CCOONNSSIIDDEERRAATTIIOONNSS
   I/O  primitives  and behaviors vary with the operating system.
   On my operating system, I can,i`Eread,i'Ea file by mapping it into
   memory, which is a pretty much instant procedure regardless of
   the size of the file.  When I later access  that  memory,  the
   appropriate  sections  of the file are automatically read into
   memory by the operating system as needed.



                                                               31





LOOKUP(1)                                               LOOKUP(1)


   This results in _l_o_o_k_u_p starting up  and  presenting  a  prompt
   very  quickly,  but causes the first few searches that need to
   check a lot of lines in the file to go more slowly (as lots of
   the  file  will need to be read in). However, once the bulk of
   the file is in, searches will go very fast. The  win  here  is
   that  the  rather  long file-load times are amortized over the
   first  few  (or  few  dozen,  depending  upon  the  situation)
   searches  rather  than  always  faced right at command startup
   time.

   On the other hand, on an operating system without the  mapping
   ability,  _l_o_o_k_u_p  would  start up very slowly as all the files
   and indexes are  read  into  memory,  but  would  then  search
   quickly  from  the beginning, all the file already having been
   read.

   To get around the slow startup, particularly when  many  files
   are  loaded, _l_o_o_k_u_p uses _l_a_z_y _l_o_a_d_i_n_g if it can: a file is not
   actually read into memory at the  time  the  _l_o_a_d  command  is
   given.  Rather,  it will be read when first actually accessed.
   Furthermore, files are loaded while _l_o_o_k_u_p is  idle,  such  as
   when  waiting  for  user input. See the _f_i_l_e_s command for more
   information.

RREEGGUULLAARR EEXXPPRREESSSSIIOONNSS,, AA BBRRIIEEFF TTUUTTOORRIIAALL
   _R_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n_s (,i`Eregex,i'Efor short) are a,i`Ecode,i'Eused  to
   indicate  what  kind  of text you're looking for.  They're how
   one  searches   for   things   in   the   editors,i`Evi,i'E,,i`Este-
   vie,i'E,,i`Emifes,i'Eetc.,  or  with  the  grep commands.  There are
   differences among the various regex flavors  in  use  --  I'll
   describe  the flavor used by _l_o_o_k_u_p here. Also, in order to be
   clear for the common case, I might tell a few lies, but  noth-
   ing too heinous.

   The  regex,i"Oa,ixmeans,i`Eany  line  with  an,iAEa,i,Cin it.,i'E Simple
   enough.

   The regex,i"Oab,ixmeans,i`Eany line with  an,iAEa,i,Cimmediately  fol-
   lowed by a,iAEb,i,C,i'E.  So the line
       I am feeling flabby
   would,i`Ematch,i'Ethe    regex,i"Oab,ixbecause,    indeed,    there's
   an,i`Eab,i'Eon that line. But it wouldn't match the line

       this line has no a followed _immediately_ by a b

   because, well, what the lines says is true.

   In most cases, letters and numbers in a regex just  mean  that
   you're  looking  for  those  letters  and numbers in the order
   given. However, there are some special characters used  within
   a regex.

   A  simple example would be a period. Rather than indicate that
   you're looking for a period, it  means,i`Eany  character,i'E.   So



                                                               32





LOOKUP(1)                                               LOOKUP(1)


   the  silly regex,i"O.,ixwould mean,i`Eany line that has any charac-
   ter on it.,i'EWell, maybe not so silly... you can use it to find
   non-blank lines.

   But  more  commonly  it's used as part of a larger regex. Con-
   sider the regex,i"Ogray,ix. It wouldn't match the line

       The sky was grey and cloudy.

   because of the different spelling (grey vs.  gray).   But  the
   regex,i"Ogr.y,ixasks  for,i`Eany  line  with  a,iAEg,i,C,,iAEr,i,C,  some
   character,   and    then    a,iAEy,i,C,i'E.     So    this    would
   get,i`Egrey,i'Eand,i`Egray,i'E.   A special construct somewhat similar
   to,iAE.,i,Cwould be  the  _c_h_a_r_a_c_t_e_r  _c_l_a_s_s.   A  character  class
   starts  with  a,iAE[,i,Cand ends with a,iAE],i,C, and will match any
   character given in between. An example might be

       gr[ea]y

   which  would  match  lines  with  a,iAEg,i,C,,iAEr,i,C,   an,iAEe,i,C_o_r
   an,iAEa,i,C,  and then a,iAEy,i,C.  Inside a character class you can
   list as many characters as you want to.

   For example the simple regex,i"Ox[0123456789]y,ixwould match  any
   line with a digit sandwiched between an,iAEx,i,Cand a,iAEy,i,C.

   The order of the characters within the character class doesn't
   really    matter...,i"O[513467289],ixwould    be     the     same
   as,i"O[0123456789],ix.

   But   as   a   short   cut,   you   could  put,i"O[0-9],ixinstead
   of,i"O[0123456789],ix.   So  the  character   class,i"O[a-z],ixwould
   match    any    lower-case   letter,   while   the   character
   class,i"O[a-zA-Z0-9],ixwould match any letter or digit.

   The character,iAE-,i,Cis special within a  character  class,  but
   only  if  it's  not  the first thing. Another character that's
   special in a character class is,iAE^,i,C,  if  it  _i_s  the  first
   thing.  It,i`Einverts,i'Ethe class so that it will match any char-
   acter _n_o_t listed.  The  class,i"O[^a-zA-Z0-9],ixwould  match  any
   line with spaces or punctuation on them.

   There  are  some  special short-hand sequences for some common
   character classes. The  sequence,i"O\d,ixmeans,i`Edigit,i'E,  and  is
   the  same  as,i"O[0-9],ix.  ,i"O\w,ixmeans,i`Eword element,i'Eand is the
   same as,i"O[0-9a-zA-Z_],ix. ,i"O\s,ixmeans,i`Espace-type thing,i'Eand is
   the same as,i"O[ \t],ix(,i"O\t,ixmeans tab).

   You  can  also use,i"O\D,ix,,i"O\W,ix, and,i"O\S,ixto mean things _n_o_t a
   digit, word element, or space-type thing.

   Another special character would  be,iAE?,i,C.  This  means,i`Emaybe
   one of whatever was just before it, not is fine too,i'E.  In the
   regex ,i"Obikes? for rent,ix, the,i`Ewhatever,i'Ewould be  the,iAEs,i,C,



                                                               33





LOOKUP(1)                                               LOOKUP(1)


   so   this   would   match   lines   with   either,i`Ebikes   for
   rent,i'Eor,i`Ebike for rent,i'E.

   Parentheses are also special, and can group  things  together.
   In the regex

   big (fat harry)? deal

   the,i`Ewhatever,i'Efor  the,iAE?,i,Cwould  be,i`Efat  harry,i'E.   But be
   careful to pay attention to details... this regex would match
       I don't see what the big fat harry deal is!
   but _n_o_t
       I don't see what the big deal is!

   That's because if you take away  the,i`Ewhatever,i'Eof  the,iAE?,i,C,
   you end up with
       big  deal
   Notice  that  there  are _t_w_o spaces between the words, and the
   regex didn't allow for that.  The regex  to  get  either  line
   above would be
       big (fat harry )?deal
   or
       big( fat harry)? deal
   Do you see how they're essentially the same?

   Similar  to,iAE?,i,Cis,iAE*,i,C,  which means,i`Eany number, including
   none, of whatever's right in front,i'E.  It more or  less  means
   that   whatever   is  tagged  with,iAE*,i,Cis  allowed,  but  not
   required, so something like
       I (really )*hate peas
   would match,i`EI hate peas,i'E,,i`EI really hate peas!,i'E,,i`EI  really
   really hate peas,i'E, etc.

   Similar  to both,iAE?,i,Cand,iAE*,i,Cis,iAE+,i,C, which means,i`Eat least
   one of whatever just in front, but more is  fine  too,i'E.   The
   regex,i"Omis+pelling,ixwould           match,i`Emi_spelling,i'E,,i`Emi_s_-
   _spelling,i'E,,i`Emi_s_s_spelling,i'E, etc. Actually, it's just the same
   as,i"Omiss*pelling,ixbut     more    simple    to    type.    The
   regex,i"Oss*,ixmeans,i`Ean,iAEs,i,C, followed by zero or more,iAEs,i,C,i'E,
   while,i"Os+,ixmeans,i`Eone  or more,iAEs,i,C,i'E.  Both really the same.

   The special character,iAE|,i,Cmeans,i`Eor,i'E.   Unlike,iAE+,i,C,,iAE*,i,C,
   and,iAE?,i,Cwhich   act   on   the   thing   _i_m_m_e_d_i_a_t_e_l_y  before,
   the,iAE|,i,Cis more,i`Eglobal,i'E.
       give me (this|that) one
   Would match lines that had,i`Egive me this one,i'Eor,i`Egive me that
   one,i'Ein them.

   You can even combine more than two:
       give me (this|that|the other) one

   How about:
       [Ii]t is a (nice |sunny |bright |clear )*day




                                                               34





LOOKUP(1)                                               LOOKUP(1)


   Here, the,i`Ewhatever,i'Eimmediately before the,iAE*,i,Cis
       (nice |sunny |bright |clear )
   So this regex would match all the following lines:
      _I_t _i_s _a _d_a_y.
      I think _i_t _i_s _a _n_i_c_e _d_a_y.
      _I_t _i_s _a _c_l_e_a_r _s_u_n_n_y _d_a_y today.
      If _i_t _i_s _a _c_l_e_a_r _s_u_n_n_y _n_i_c_e _s_u_n_n_y _s_u_n_n_y _s_u_n_n_y _b_r_i_g_h_t _d_a_y then....
   Notice how the,i"O[Ii]t,ixmatches either,i`EIt,i'Eor,i`Eit,i'E?

   Note that the above regex would also match
      fru_i_t _i_s _a _d_a_y
   because it indeed fulfills all requirements of the regex, even
   though the,i`Eit,i'Eis  really  part  of  the  word,i`Efruit,i'E.   To
   answer    concerns    like    this,    which    are    common,
   are,iAE<,i,Cand,iAE>,i,C,    which    mean,i`Eword    break,i'E.     The
   regex,i"O<it,ixwould  match  any line with,i`Eit,i'E_b_e_g_i_n_n_i_n_g _a _w_o_r_d,
   while,i"Oit>,ixwould match  any  line  with,i`Eit,i'E_e_n_d_i_n_g  _a  _w_o_r_d.
   And,   of   course,,i"O<it>,ixwould   match  any  line  with  _t_h_e
   _w_o_r_d,i`Eit,i'Ein it.

   Going back to the regex to find  grey/gray,  that  would  make
   more sense, then, as
       <gr[ae]y>
   which  would  match only the _w_o_r_d_s,i`Egrey,i'Eand,i`Egray,i'E.   Some-
   what  similar  are,iAE^,i,Cand,iAE$,i,C,  which  mean,i`Ebeginning  of
   line,i'Eand,i`Eend  of line,i'E, respectively (but, not in a charac-
   ter class, of course).  So  the  regex,i"O^fun,ixwould  find  any
   line  that begins with the letters,i`Efun,i'E, while,i"O^fun>,ixwould
   find   any   line   that   begins   with   the    _w_o_r_d,i`Efun,i'E.
   ,i"O^fun$,ixwould find any line that was exactly,i`Efun,i'E.

   Finally,,i"O^\s*fun\s*$,ixwould        match       any       line
   that,i`Efun,i'Eexactly, but perhaps also had leading and/or trail-
   ing whitespace.

   That's  pretty much it. There are more complex things, some of
   which I'll mention in the list below, but even with these  few
   simple  constructs  one  can specify very detailed and complex
   patterns.

   Let's summarize some of the special things in regular  expres-
   sions:

   Items that are basic units:
     _c_h_a_r      any non-special character matches itself.
     \_c_h_a_r     special chars, when proceeded by \, become non-special.
     .         Matches any one character (except \n).
     \n        Newline
     \t        Tab.
     \r        Carriage Return.
     \f        Formfeed.
     \d        Digit. Just a short-hand for [0-9].
     \w        Word element. Just a short-hand for [0-9a-zA-Z_].
     \s        Whitespace. Just a short-hand for [\t \n\r\f].



                                                               35





LOOKUP(1)                                               LOOKUP(1)


     \## \###  Two or three digit octal number indicating a single byte.
     [_c_h_a_r_s]   Matches a character if it's one of the characters listed.
     [^_c_h_a_r_s]  Matches a character if it's not one of the ones listed.

     The \_c_h_a_r items above can be used within a character class,
     but not the items below.

     \D        Anything not \d.
     \W        Anything not \w.
     \S        Anything not \s.
     \a        Any ASCII character.
     \A        Any multibyte character.
     \k        Any (not half-width) katakana character (including ,i1/4).
     \K        Any character not \k (except \n).
     \h        Any hiragana character.
     \H        Any character not \h (except \n).
     (_r_e_g_e_x)   Parens make the _r_e_g_e_x one unit.
     (?:_r_e_g_e_x)   [from perl5] Grouping-only parens -- can't use for \# (below)
     \c        Any JISX0208 kanji (kuten rows 16-84)
     \C        Any character not \c (except \n).
     \#        Match whatever was matched by the #th paren from the left.

   With,i`E,i`u,i'Eto indicate one,i`Eunit,i'Eas above, the following may be used:

     ,i`u?       A ,i`u allowed, but not required.
     ,i`u+       At least one ,i`u required, but more ok.
     ,i`u*       Any number of ,i`u ok, but none required.

   There are also ways to match,i`Esituations,i'E:

     \b        A word boundary.
     <         Same as \b.
     >         Same as \b.
     ^         Matches the beginning of the line.
     $         Matches the end of the line.

   Finally, the,i`Eor,i'Eis

     _r_e_g_1|_r_e_g_2 Match if either _r_e_g_1 or _r_e_g_2 match.

   Note that,i`E\k,i'Eand the like aren't allowed in character classes, so
   something such as,i"O[\k\h],ixto try to get all kana won't work.
   Use ,i"O(\k|\h),ixinstead.


BBUUGGSS
   Needs   full   support  for  half-width  katakana  and  JIS  X
   0212-1990.
   Non-EUC (JIS & SJIS) items not tested well.
   Probably won't work on non-UNIX systems.
   Screen control codes (for clear and  highlight  commands)  are
   hard-coded for ANSI/VT100/kterm.





                                                               36





LOOKUP(1)                                               LOOKUP(1)


AAUUTTHHOORR
   Jeffrey Friedl (jfriedl@nff.ncl.omron.co.jp)

IINNFFOO
   Jim Breen's text files _e_d_i_c_t and _k_a_n_j_i_d_i_c and their documenta-
   tion can  be  found  in,i`Epub/nihongo,i'Eon  ftp.cc.monash.edu.au
   (130.194.1.106

   Information  on  input  and  output  encoding and codes can be
   found in Ken Lunde's _U_n_d_e_r_s_t_a_n_d_i_n_g _J_a_p_a_n_e_s_e  _I_n_f_o_r_m_a_t_i_o_n  _P_r_o_-
   _c_e_s_s_i_n_g  (AE"u"E"U,`i3/4`'o^E'o1/2`e'I'y) published by O'Reilly and Asso-
   ciates.  ISBN 1-56592-043-0.  There is also a Japanese edition
   published by SoftBank.

   A  program to convert files among the various encoding methods
   is  Dr.  Ken  Lunde's_j_c_o_n_v,  which  can  also  be   found   on
   ftp.cc.monash.edu.au.   _J_c_o_n_v  is  also  useful for converting
   halfwidth katakana (which _l_o_o_k_u_p doesn't yet support well)  to
   full-width.






































                                                               37