1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442
|
LOOKUP(1) LOOKUP(1)
April 22nd, 1994
NNAAMMEE
lookup - interactive file search and display
SSYYNNOOPPSSIISS
llooookkuupp [ args ] [ _f_i_l_e _._._. ]
DDEESSCCRRIIPPTTIIOONN
_L_o_o_k_u_p allows the quick interactive search of text files. It
supports ASCII, JIS-ROMAN, and Japanese EUC Packed formated
text, and has an integrated romaji/c_akana converter.
TTHHIISS MMAANNUUAALL
_L_o_o_k_u_p is flexible for a variety of applications. This manual
will, however, focus on the application of searching Jim
Breen's _e_d_i_c_t (Japanese-English dictionary) and _k_a_n_j_i_d_i_c
(kanji database). Being familiar with the content and format
of these files would be helpful. See the INFO section near the
end of this manual for information on how to obtain these
files and their documentation.
OOVVEERRVVIIEEWW OOFF MMAAJJOORR FFEEAATTUURREESS
The following just mentions some major features to whet your
appetite to actually read the whole manual (-:
Romaji-to-Kana Converter
_L_o_o_k_u_p can convert romaji to kana for you, even,i`Eon the
fly,i'Eas you type.
Fuzzy Searching
Searches can be a bit,i`Evague,i'Eor,i`Efuzzy,i'E, so that you'll
be able to find,i`EoA`i,upb,i'Eeven if you try to search
for,i`Eox`Eox-ox,c,i'E(the proper yomikata being,i`Eox`Eox|ox-ox,cox|,i'E).
Regular Expressions
Uses the powerful and expressive _r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n for
searching. One can easily specify complex searches that
affect,i`EI want lines that look like such-and-such, but not
like this-and-that, but that also have this particular
characteristic....,i'E
Wildcard ``Glob'' Patterns
Optionally, can use well-known filename wildcard patterns
instead of full-fledged regular expressions.
Filters
You can have _l_o_o_k_u_p not list certain lines that would oth-
erwise match your search, yet can optionally save them for
quick review. For example, you could have all name-only
entries from _e_d_i_c_t filtered from normal output.
1
LOOKUP(1) LOOKUP(1)
Automatic Modifications
Similarly, you can do a standard search-and-replace on
lines just before they print, perhaps to remove information
you don't care to see on most searches. For example, if
you're generally not interested in _k_a_n_j_i_d_i_c's info on Chi-
nese readings, you can have them removed from lines before
printing.
Smart Word-Preference Mode
You can have _l_o_o_k_u_p list only entries with _w_h_o_l_e _w_o_r_d_s that
match your search (as opposed to an _e_m_b_e_d_d_e_d match, such as
finding,i`Ethe,i'Einside,i`Ethem,i'E), but if no whole-word matches
exist, will go ahead and list any entry that matches the
search.
Handy Features
Other handy features include a dynamically settable and
parameterized prompt, automatic highlighting of that part
of the line that matches your search, an output pager,
readline-like input with horizontal scrolling for long
input lines, a,i`E.lookup,i'Estartup file, automated programa-
bility, and much more. Read on!
RREEGGUULLAARR EEXXPPRREESSSSIIOONNSS
_L_o_o_k_u_p makes liberal use of _r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n_s (or _r_e_g_e_x for
short) in controlling various aspects of the searches. If you
are not familiar with the important concepts of regexes, read
the tutorial appendix of this manual before continuing.
JJAAPPAANNEESSEE CCHHAARRAACCTTEERR EENNCCOODDIINNGG MMEETTHHOODDSS
Internally, _l_o_o_k_u_p works with Japanese packed-format EUC, and
all files loaded must be encoded similarly. If you have files
encoded in JIS or Shift-JIS, you must first convert them to
EUC before loading (see the INFO section for programs that can
do this).
Interactive input and output encoding, however, may be be
selected via the -jis, -sjis, and -euc invocation flags
(default is -euc), or by various commands to the program
(described later).
Make sure to use the encoding appropriate for your system. If
you're using kterm under the X Window System, you can use
_l_o_o_k_u_p's -jis flag to match kterm's default JIS encoding. Or,
you might use kterm's,i`E-km euc,i'Estartup option (or menu selec-
tion) to put kterm into EUC mode. Also, I have found kterm's
scrollbar (,i`E-sb -sl 500,i'E) to be quite useful.
With many,i`EEnglish,i'Efonts in Japan, the character that nor-
mally prints as a backslash (halfwidth version of ,i`A) in The
States appears as a yen symbol (the half-width version of ,i"i).
How it will appear on your system is a function of what font
you use and what output encoding method you choose, which may
be different from the font and method that was used to print
2
LOOKUP(1) LOOKUP(1)
this manual (both of which may be different from what's
printed on your keyboard's appropriate key). Make sure to
keep this in mind while reading.
SSTTAARRTTUUPP
Let's assume that your copy of _e_d_i_c_t is in ~/lib/edict. You
can start the program simply with
lookup ~/lib/edict
You'll note that _l_o_o_k_u_p spends some time building an index
before the default,i`Elookup> ,i'Eprompt appears.
_L_o_o_k_u_p gains much of its search speed by constructing an index
of the file(s) to be searched. Since building the index can be
time consuming itself, you can have _l_o_o_k_u_p write the built
index to a file that can be quickly loaded the next time you
run the program. Index files will be given a,i`E.jin,i'E(Jef-
frey's Index) ending.
Let's build the indices for _e_d_i_c_t and _k_a_n_j_i_d_i_c now:
lookup -write ~/lib/edict ~/lib/kanjidic
This will create the index files
~/lib/edict.jin
~/lib/kanjidic.jin
and exit.
You can now re-start _l_o_o_k_u_p _, automatically using the pre-com-
puted index files as:
lookup ~/lib/edict ~/lib/kanjidic
You should then be presented with the prompt without having to
wait for the index to be constructed (but see the section on
Operating System concerns for possible reasons of delay).
IINNPPUUTT
There are basically two types of input: searches and commands.
Commands do such things as tell _l_o_o_k_u_p to load more files or
set flags. Searches report lines of a file that match some
search specifier (where lines to search for are specified by
one or more regular expressions).
The input syntax may perhaps at first seem odd, but has been
designed to be powerful and concise. A bit of time invested to
learn it well will pay off greatly when you need it.
BBRRIIEEFF EEXXAAMMPPLLEE
Assuming you've started _l_o_o_k_u_p with _e_d_i_c_t and _k_a_n_j_i_d_i_c as
noted above, let's try a few searches. In these examples, the
,i`Esearch [edict]> ,i'E
3
LOOKUP(1) LOOKUP(1)
is the prompt. Note that the space after the,iAE>,i,Cis part of
the prompt.
Given the input:
search [edict]> tranquil
_l_o_o_k_u_p will report all lines with the string,i`Etranquil,i'Ein
them. There are currently about a dozen such lines, two of
which look like:
o^Aox'eox<< [ox"aox1ox'eox<<] /peaceful (an)/tranquil/calm/restful/
o^Aox'eox(R) [ox"aox1ox'eox(R)] /peace/tranquility/
Notice that lines with,i`Etranquil,i'E_a_n_d,i`Etranquility,i'Ematched?
This is because,i`Etranquil,i'Ewas embedded in the word,i`Etranquil-
ity,i'E. You could restrict the search to only the _w_o_r_d,i`Etran-
quil,i'Eby prepending the special,i`Estart of word,i'Esym-
bol,iAE<,i,Cand appending the special,i`Eend of word,i'Esym-
bol,iAE>,i,Cto the regex, as in:
search [edict]> <tranquil>
This is the regular expression that says,i`Ethe beginning of a
word, followed by a,iAEt,i,C,,iAEr,i,C, ...,,iAEl,i,C, which is at the
end of a word.,i'EThe current version of _e_d_i_c_t has just three
matching entries.
Let's try another:
search [edict]> fukushima
This is a search for the,i`EEnglish,i'Efukushima -- ways to search
for kana or kanji will be explored later. Note that among the
several lines selected and printed are:
_
'E^uoA,c [ox~Oox ox.oxIb] /Fukus_hima (pn,pl)/
`I'U'A3/4^E,ioA,c [ox-ox1/2ox~Oox ox.oxIb] /Kisofukushima (pl)/
By default, searches are done in a case-insensitive manner
--,iAEF,i,Cand,iAEf,i,Care treated the same by _l_o_o_k_u_p, at least so
far as the matching goes. This is called _c_a_s_e _f_o_l_d_i_n_g.
Let's give a command to turn this option off, so
that,iAEf,i,Cand,iAEF,i,Cwon't be considered the same. Here's an
odd point about _l_o_o_k_u_p_'_s input syntax: the default setting is
that all command lines must begin with a space. The space is
the (default) command-introduction character and tells the
input parser to expect a command rather than a search regular
expression. _I_t _i_s _a _c_o_m_m_o_n _m_i_s_t_a_k_e _a_t _f_i_r_s_t _t_o _f_o_r_g_e_t _t_h_e
_l_e_a_d_i_n_g _s_p_a_c_e _w_h_e_n issuing a command. Be careful.
Try the command,i`E fold,i'Eto report the current status of case-
folding. Notice that as soon as you type the space, the
4
LOOKUP(1) LOOKUP(1)
prompt changes to
,i`Elookup command> ,i'E
as a reminder that now you're typing a command rather than a
search specification.
lookup command> fold
The reply should be,i`Efile #0's case folding is on,i'E
You can actually turn it off with,i`E fold off,i'E. Now try the
search for,i`Efukushima,i'Eagain. Notice that this time the
entries with,i`EFukushima,i'Earen't listed? Now try the search
string,i`EFukushima,i'Eand see that the entries
with,i`Efukushima,i'Earen't listed.
Case folding is usually very convenient (it also makes corre-
sponding katakana and hiragana match the same), so don't for-
get to turn it back on:
lookup command> fold on
JJAAPPAANNEESSEE IINNPPUUTT
_L_o_o_k_u_p has an automatic romaji/c_akana converter. A lead-
ing,iAE/,i,Cindicates that romaji is to follow. Try typ-
ing,i`E/tokyo,i'Eand you'll see it convert to,i`E/ox`Eox-ox,c,i'Eas you
type. When you hit return, _l_o_o_k_u_p will list all lines that
have a,i`Eox`Eox-ox,c,i'Esomewhere in them. Well, sort of. Look care-
fully at the lines which match. Among them (if you had case
folding back on) you'll see:
=Y-=Y^e=Y1=Y`E9|,u [=Y-=Y^e=Y1=Y`Eox-ox,cox|] /Christianity/
oA`i,upb [ox`Eox|ox-ox,cox|] /Toukyou (pl)/Tokyo/current capital of Japan/
AE`I9|`A [ox`Eox~Aox-ox,cox|] /convex lens/
The first one has,i`Eox`Eox-ox,c,i'Ein it (as,i`E=Y`Eox-ox,c,i'E, where the
katakana,i`E=Y`E,i'Ematches in a case-insensitive manner to the
hiragana,i`Eox`E,i'E), but you might consider the others unexpected,
since they don't have,i`Eox`Eox-ox,c,i'Ein them. They're close
(,i`Eox`Eox|ox-ox,c,i'Eand,i`Eox`Eox~Aox-ox,c,i'E), but not exact. This is the
result of _l_o_o_k_u_p's,i`Efuzzification,i'E. Try the
command,i`E fuzz,i'E(again, don't forget the command-introduction
space). You'll see that fuzzification is turned on. Turn it
off with,i`E fuzz off,i'Eand try,i`E/tokyo,i'E(which will convert as
you type) again. This time you only get the lines which
have,i`Eox`Eox-ox,c,i'Eexactly (well, case folding is still on, so it
might match katakana as well).
In a fuzzy search, length of vowels is ignored --,i`Eox`E,i'Eis con-
sidered the same as,i`Eox`Eox|,i'E, for example. Also, the presence
or absence of any,i`Eox~A,i'Echaracter is ignored, and the pairs ox,
ox^A, ox_o oxoA, ox" ox~n, and ox_a ox`o are considered identical in a
fuzzy search.
5
LOOKUP(1) LOOKUP(1)
It might be convenient to consider a fuzzy search to be
a,i`Epronunciation search,i'E. Special note: fuzzification will
not be performed if a regular expres-
sion,i`E*,i'E,,i`E+,i'E,or,i`E?,i'Emodifies a non-ASCII character. This is
not an issue when input patterns are filename-like wildcard
patterns (discussed below).
In addition to kana fuzziness, there's one special case for
kanji when fuzziness is on. The kanji repeater mark,i`E,i1,i'Ewill
be recognized such that,i`E>>pb,i1,i'Eand,i`E>>pb>>pb,i'Ewill match each-
other.
Turn fuzzification back on (,i`Efuzz on,i'E), and search for all
_w_h_o_l_e _w_o_r_d_s which sound like,i`Etokyo,i'E. That search would be
specified as:
search [edict]> /<tokyo>
(again, the,i`Etokyo,i'Ewill be converted to,i`Eox`Eox-ox,c,i'Eas you
type). My copy of _e_d_i_c_t has the three lines
oA`i,upb [ox`Eox|ox-ox,cox|] /Toukyou (pl)/Tokyo/current capital of Japan/
AE~A,u"o [ox`Eox~Aox-ox,c] /special permission/patent/
AE`I9|`A [ox`Eox~Aox-ox,cox|] /convex lens/
This kind of whole-word romaji-to-kana search is so common,
there's a special short cut. Instead of typing,i`E/<tokyo>,i'E,
you can type,i`E[tokyo],i'E. The leading,iAE[,i,Cmeans,i`Estart
romaji,i'E_a_n_d,i`Estart of word,i'E. Were you to
type,i`E<tokyo>,i'Einstead (without a leading,iAE/,i,Cor,iAE[,i,Cto
indicate romaji-to-kana conversion), you would get all lines
with the _E_n_g_l_i_s_h whole-word,i`Etokyo,i'Ein them. That would be a
reasonable request as well, but not what we want at the
moment.
Besides the kana conversion, you can use any cut-and-paste
that your windowing system might provide to get Japanese text
onto the search line. Cut,i`Eox`Eox-ox,c,i'Efrom somewhere and paste
onto the search line. When hitting enter to run the search,
you'll notice that it is done without fuzzification (even if
the fuzzification flag was,i`Eon,i'E). That's because there's no
leading,iAE/,i,C. Not only does a leading,iAE/,i,Cndicate that you
want the romaji-to-kana conversion, but that you want it done
fuzzily.
So, if you'd like fuzzy cut-and-paste, just type a lead-
ing,iAE/,i,Cefore pasting (or go back and prepend one after past-
ing).
These examples have all been pretty simple, but you can use
all the power that regexes have to offer. As a slightly more
complex example, the search,i`E<gr[ea]y>,i'Ewould look for all
lines with the words,i`Egrey,i'Eor,i`Egray,i'Ein them. Since
6
LOOKUP(1) LOOKUP(1)
the,iAE[,i,Cisn't the first character of the line, it doesn't
mean what was mentioned above (start-of-word romaji). In this
case, it's just the regular-expression,i`Eclass,i'Eindicator.
If you feel more comfortable using filename-like,i`E*.txt,i'Ewild-
card patterns, you can use the,i`Ewildcard on,i'Ecommand to have
patterns be considered this way.
This has been a quick introduction to the basics of _l_o_o_k_u_p.
It can be very powerful and much more complex. Below is a
detailed description of its various parts and features.
RREEAADDLLIINNEE IINNPPUUTT
The actual keystrokes are read by a readline-ish package that
is pretty standard. In addition to just typing away, the fol-
lowing keystrokes are available:
^B / ^F move left/right one character on the line
^A / ^E move to the start/end of the line
^H / ^G delete one character to the left/right of the cursor
^U / ^K delete all characters to the left/right of the cursor
^P / ^N previous/next lines on the history list
^L or ^R redraw the line
^D delete char under the cursor, or EOF if line is empty
^space force romaji conversion (^@ on some systems)
If automatic romaji-to-kana conversion is turned on (as it is
by default), there are certain situations where the conversion
will be done, as we saw above. Lower-case romaji will be con-
verted to hiragana, while upper-case romaji to katakana. This
usually won't matter, though, as case folding will treat hira-
gana and katakana the same in the searches.
In exactly what situations the automatic conversion will be
done is intended to be rather intuitive once the basic idea is
learned. However, at _a_n_y _t_i_m_e, one can use control-space to
convert the ASCII to the left of the cursor to kana. This can
be particularly useful when needing to enter kana on a command
line (where auto conversion is never done; see below)
RROOMMAAJJII FFLLAAVVOORR
Most flavors of romaji are recognized. Special or non-obvious
items are mentioned below. Lowercase are converted to hira-
gana, uppercase to katakana.
Long vowels can be entered by repeating the vowel, or
with,iAE-,i,Cor,iAE^,i,C.
In situations where an,i`En,i'Ecould be vague, as in,i`Ena,i'Ebeing ox^E
or ox'oox/c, use a single quote to force ox'o. There-
fore,,i"Okenichi,ix/c_aox+-ox"Eox'A while,i"Oken'ichi,ix/c_aox+-ox'ooxoxox'A.
7
LOOKUP(1) LOOKUP(1)
The romaji has been richly extended with many non-standard
combinations such as ox~Oox,i or ox'AoxS, which are represented in
intuitive ways:,i"Ofa,ix/c_aox~Oox,i,,i"Oche,ix/c_aox'AoxS. etc.
Various other mappings of interest:
wo /c_aox`o we/c_aox~n wi/c_aox`'o
VA /c_a=Y^o=Y,i VI/c_a=Y^o=Y-L VU/c_a=Y^o VE/c_a=Y^o=YS VO/c_a=Y^o=Y(C)
di /c_aox^A dzi/c_aox^A dya/c_aox^Aox~a dyu/c_aox^Aoxoa dyo/c_aox^Aox,c
du /c_aoxoA tzu/c_aoxoA dzu/c_aoxoA
(the following kana are all smaller versions of the regular kana)
xa /c_aox,i xi/c_aox-L xu/c_aox=Y xe/c_aoxS xo/c_aox(C)
xu /c_aox=Y xtu/c_aox~A xwa/c_aox^i xka/c_a=Y~o xke/c_a=Y"o
xya/c_aox~a xyu/c_aoxoa xyo/c_aox,c
IINNPPUUTT SSYYNNTTAAXX
Any input line beginning with a space (or whichever character
is set as the command-introduction character) is processed as
a command to _l_o_o_k_u_p rather than a search spec. _A_u_t_o_m_a_t_i_c kana
conversion is never done on these lines (but _f_o_r_c_e_d conversion
with control-space may be done at any time).
Other lines are taken as search regular expressions, with the
following special cases:
? A line consisting of a single question mark will report the
current command-introduction character (the default is a
space, but can be changed with the,i`Ecmdchar,i'Ecommand).
= If a line begins with,iAE=,i,C, the line (without the,iAE=,i,C)
is taken as a search regular expression, and no automatic
(or internal -- see below) kana conversion is done anywhere
on the line (although again, conversion can always be
forced with control-space). This can be used to initiate a
search where the beginning of the regex is the command-
introduction character, or in certain situations where
automatic kana conversion is temporarily not desired.
/ A line beginning with,iAE/,i,Cindicates romaji input for the
whole line. If automatic kana conversion is turned on, the
conversion will be done in real-time, as the romaji is
typed. Otherwise it will be done internally once the line
is entered. _R_e_g_a_r_d_l_e_s_s, the presence of the lead-
ing,iAE/,i,Cindicates that any kana (either converted or cut-
and-pasted in) should be,i`Efuzzified,i'Eif fuzzification is
turned on.
As an addition to the above, if the line doesn't begin
with,iAE=,i,Cor the command-introduction character (and auto-
matic conversion is turned on),,iAE/,i,C _a_n_y_w_h_e_r_e on the line
initiates automatic conversion for the following word.
8
LOOKUP(1) LOOKUP(1)
[ A line beginning with,iAE[,i,Cis taken to be romaji (just as a
line beginning with,iAE/,i,C, and the converted romaji is sub-
ject to fuzzification (if turned on). However, if,iAE[,i,Cis
used rather than,iAE/,i,C, an implied,iAE<,i,C,i`Ebeginning of
word,i'Eis prepended to the resulting kana regex. Also, any
ending,iAE],i,Con such a line is converted to the,i`Eending of
word,i'Especifier,iAE>,i,Cin the resulting regex.
In addition to the above, lines may have certain prefixes and
suffixes to control aspects of the search or command:
! Various flags can be toggled for the duration of a particu-
lar search by prepending a,i`E!!,i'Esequence to the input line.
Sequences are shown below, along with commands related to
each:
!F! ,i"A Filtration is toggled for this line (filter)
!M! ,i"A Modification is toggled for this line (modify)
!w! ,i"A Word-preference mode is toggled for this line (word)
!c! ,i"A Case folding is toggled for this line (fold)
!f! ,i"A Fuzzification is toggled for this line (fuzz)
!W! ,i"A Wildcard-pattern mode is toggled for this line (wildcard)
!r! ,i"A Raw. Force fuzzification off for this line
!h! ,i"A Highlighting is toggled for this line (highlight)
!t! ,i"A Tagging is toggled for this line (tag)
!d! ,i"A Displaying is on for this line (display)
The letters can be combined, as in,i`E!cf!,i'E.
The final,iAE!,i,C can be omitted if the first character after
the sequence is not an ASCII letter.
If no letters are given (,i`E!!,i'E).,i`E!f!,i'Eis the default.
These last two points can be conveniently combined in the
common case of,i`E!/romaji,i'Ewhich would be the same
as,i`E!f!/romaji,i'E.
The special sequence,i`E!?,i'Elists the above, as well as indi-
cates which are currently turned on.
Note that the letters accepted in a,i`E!!,i'Esequence are many
of the indicators shown by the,i`Efiles,i'Ecommand.
+ A,iAE+,i,Cprepended to anything above will cause the final
search regex to be printed. This can be useful to see when
and what kind of fuzzification and/or internal kana conver-
sion is happening. Consider:
search [edict]> +/ox"iox<<ox"e
a match is,i`Eox"i[ox,iox/c,i1/4]*ox~A?ox<<[ox,iox/c,i1/4]*ox"e[ox=Yox|ox_aox(C),i1/4]*,i'E
Due to the,i`Eleading,i'E/ the kana is fuzzified, which
9
LOOKUP(1) LOOKUP(1)
explains the somewhat complex resulting regex. For compari-
son, note:
search [edict]> +ox"iox<<ox"e
a match is,i`Eox"iox<<ox"e,i'E
search [edict]> +!/ox"iox<<ox"e
a match is,i`Eox"iox<<ox"e,i'E
As the,iAE+,i,Cshows, these are not fuzzified. The first one
has no leading,iAE/,i,Cor,iAE[,i,Cto induce fuzzification, while
the second has the,iAE!,i,Cline prefix (which is the default
version of,i`E!f!,i'E), which toggles fuzzification mode
to,i`Eoff,i'Efor that line.
, The default of all searches and most commands is to work
with the first file loaded (_e_d_i_c_t in these examples). One
can change this default (see the,i`Eselect,i'Ecommand) or, by
appending a comma+digit sequence at the end of an input
line, force that line to work with another previously-
loaded file. An appended,i`E,1,i'Eworks with first extra file
loaded (in these examples, _k_a_n_j_i_d_i_c). An
appended,i`E,2,i'Eworks with the 2nd extra file loaded, etc.
An appended,i`E,0,i'Eworks with the original first file (and
can be useful if the default file has been changed via
the,i`Eselect,i'Ecommand).
The following sequence shows a common usage:
search [edict]> [ox`Eox-ox,cox`E]
oA`i,upboA^O [ox`Eox|ox-ox,cox|ox`E] /Tokyo Metropolitan area/
cutting and pasting the oA^O from above, and adding a,i`E,1,i'Eto
search _k_a_n_j_i_d_i_c:
search [edict]> oA^O,1
oA^O 4554 N4769 S11 ..... =Y`E =Y"A oxBox"aox3 {metropolis} {capital}
FFIILLEENNAAMMEE--LLIIKKEE WWIILLDDCCAARRDD MMAATTCCHHIINNGG
When wildcard-pattern mode is selected, patterns are consid-
ered as extended.Q "*.txt" "-like" patterns. This is often
more convenient for users not familiar with regular expres-
sions. To have this mode selected by default, put
default wildcard on
into your,i`E.lookup,i'Efile (see,i`ESTARTUP FILE,i'Ebelow).
When wildcard mode is on, only ,i`E*,i'E,,i`E?,i'E,,i`E+,i'E,and,i`E.,i'E,are
effected. See the entry for the ,i`Ewildcard,i'Ecommand below for
details.
10
LOOKUP(1) LOOKUP(1)
Other features, such as the multiple-pattern searches
(described below) and other regular-expression metacharacters
are available.
MMUULLTTIIPPLLEE--PPAATTTTEERRNN SSEEAARRCCHHEESS
You can put multiple patterns in a single search specifier.
For example consider
search [edict]> china||japan
The first part (,i`Echina,i'E) will select all lines that
have,i`Echina,i'Ein them. Then, _f_r_o_m _a_m_o_n_g _t_h_o_s_e _l_i_n_e_s, the second
part will select lines that have,i`Ejapan,i'Ein them. The,i`E||,i'Eis
not part of any pattern -- it is _l_o_o_k_u_p's,i`Epipe,i'Emechanism.
The above example is very different from the single pattern
,i`Echina|japan,i'Ewhich would select any line that had
either,i`Echina,i'E_o_r,i`Ejapan,i'E. With,i`Echina||japan,i'E, you get
lines that have,i`Echina,i'E_a_n_d _t_h_e_n _a_l_s_o have,i`Ejapan,i'Eas well.
Note that it is also different from the regular expres-
sion,i`Echina.*japan,i'E(or the wildcard pat-
tern,i`Echina*japan,i'E)which would select lines having,i`Echina,
then maybe some stuff, then japan,i'E. But consider the case
when,i`Ejapan,i'Ecomes on the line before,i`Echina,i'E. Just for your
comparison, the multiple-pattern specifier,i`Echina||japan,i'Eis
pretty much the same as the single regular expres-
sion,i`Echina.*japan|japan.*china,i'E.
If you use,i`E|!|,i'Einstead of,i`E||,i'E, it will mean,i`E...and then
lines _n_o_t matching...,i'E.
Consider a way to find all lines of _k_a_n_j_i_d_i_c that do have a
Halpern number, but don't have a Nelson number:
search [edict]> <H\d+>|!|<N\d+>
If you then wanted to restrict the listing to those that _a_l_s_o
had a,i`Ejinmeiyou,i'Emarking (_k_a_n_j_i_d_i_c's,i`EG9,i'Efield) and had a
reading of ox/cox-, you could make it:
search [edict]> <H\d+>|!|<N\d+>||<G9>||<ox/cox->
A prepended,iAE+,i,Cwould explain:
a match is,i`E<H\d+>,i'E
and not,i`E<N\d+>,i'E
and,i`E<G9>,i'E
and,i`E<ox/cox->,i'E
The,i`E|!|,i'Eand,i`E||,i'Ecan be used to make up to ten separate reg-
ular expressions in any one search specification.
11
LOOKUP(1) LOOKUP(1)
Again, it is important to stress that,i`E||,i'Edoes not
mean,i`Eor,i'E(as it does in a C program, or as,iAE|,i,Cdoes within a
regular expression). You might find it convenient to
read,i`E||,i'Eas,i`E_a_n_d also,i'E, while reading,i`E|!|,i'Eas,i`Ebut _n_o_t,i'E.
It is also important to stress that any whitespace around
the,i`E||,i'Eand,i`E|!|,i'Econstruct is _n_o_t ignored, but kept as part
of the regex on either side.
CCOOMMBBIINNAATTIIOONN SSLLOOTTSS
Each file, when loaded, is assigned to a,i`Eslot,i'Evia which sub-
sequent references to the file are then made. The slot may
then be searched, have filters and flags set, etc.
A special kind of slot, called a,i`Ecombination slot,i'E,rather
than representing a single file, can represent multiple previ-
ously-loaded slots. Searches against a combination slot
(or,i`Ecombo slot,i'Efor short) search all those previously-loaded
slots associated with it (called,i`Ecomponent slots,i'E). Combo
slots are set up with the _c_o_m_b_i_n_e command.
A Combo slot has no filter or modify spec, but can have a
local prompt and flags just like normal file slots. The
flags, however, have special meanings with combo slots. Most
combo-slot flags act as a mask against the component-slot
flags; when acted upon as a member of the combo, a component-
slot's flag will be disabled if the corresponding combo-slot's
flag is disabled.
Exceptions to this are the _a_u_t_o_k_a_n_a, _f_u_z_z, and _t_a_g flags.
The _a_u_t_o_k_a_n_a and _f_u_z_z flags governs a combo slot exactly the
same as a regular file slot. When a slot is searched as a
component of a combination slot, the component slot's _f_u_z_z
(and _a_u_t_o_k_a_n_a) flags, or lack thereof, are ignored.
The _t_a_g flag is quite different altogether; see the _t_a_g com-
mand for complete information.
Consider the following output from the _f_i_l_e_s command:
"(R)"~"3"~"~"~"~","~"~"3"~"~"~"3"~"~"~"~"~"~"~"~"~"~"~"~"~"~
"- 0"-F wcfh d"/ca I "- 2762k"-/usr/jfriedl/lib/edict
"- 1"-FM cf d"/ca I "- 705k"-/usr/jfriedl/lib/kanjidic
"- 2"-F cfh@d"/ca "- 1k"-/usr/jfriedl/lib/local.words
"-*3"-FM cfhtd"/ca "- combo"-kotoba (#2, #0)
"+-"~",u"~"~"~"~"_o"~"~",u"~"~"~",u"~"~"~"~"~"~"~"~"~"~"~"~"~"~
See the discussion of the _f_i_l_e_s command below for basic expla-
nation of the output.
As can be seen, slot #3 is a _c_o_m_b_i_n_a_t_i_o_n _s_l_o_t with the
name,i`Ekotoba,i'Ewith _c_o_m_p_o_n_e_n_t _s_l_o_t_s two and zero. When a search
is initiated on this slot, first slot #2,i`Elocal.words,i'Ewill be
12
LOOKUP(1) LOOKUP(1)
searched, then slot #0,i`Eedict,i'E. Because the combo slot's
_f_i_l_t_e_r flag is _o_n, the component slots' _f_i_l_t_e_r flag will
remain on during the search. The combo slot's _w_o_r_d flag is
_o_f_f, however, so slot #0's _w_o_r_d flag will be forced off during
the search.
See the _c_o_m_b_i_n_e command for information about creating combo
slots.
PPAAGGEERR
_L_o_o_k_u_p has a built in pager (a'la _m_o_r_e). Upon filling a
screen with text, the string
--MORE [space,return,c,q]--
is shown. A space will allow another screen of text; a return
will allow one more line. A,iAEc,i,C will allow output text to
continue unpaged until the next command. A,iAEq,i,C will flush
output of the current command.
If supported by the OS, _l_o_o_k_u_p_'_s idea of the screen size is
automatically set upon startup and window resize. _L_o_o_k_u_p must
know the width of the screen in doing both the horizontal
input-line scrolling, and for knowing when a long line wraps
on the screen.
The pager parameters can be set manually with the,i`Epager,i'Ecom-
mand.
CCOOMMMMAANNDDSS
Any line intended to be a command must begin with the command-
introduction character (the default is a space, but can be set
via the,i`Ecmdchar,i'Ecommand). However, that character is not
part of the command itself and won't be shown in the following
list of commands.
There are a number of commands that work with the _s_e_l_e_c_t_e_d
_f_i_l_e or _s_e_l_e_c_t_e_d _s_l_o_t (both meaning the same thing). The
selected file is the one indicated by an appended comma+digit,
as mentioned above. If no such indication is given, the
default _s_e_l_e_c_t_e_d _f_i_l_e is used (usually the first file loaded,
but can be changed with the,i`Eselect,i'Ecommand).
Some commands accept a _b_o_o_l_e_a_n argument, such as to turn a
flag on or off. In all such cases, a,i`E1,i'Eor,i`Eon,i'Emeans to turn
the flag on, while a,i`E0,i'Eor,i`Eoff,i'Eis used to turn it off.
Some flags are per-file (,i`Efuzz,i'E,,i`Efold,i'E, etc.), and a com-
mand to set such a flag normally sets the flag for the
selected file only. However, the default value inherited by
subsequently loaded files can be set by prepend-
ing,i`Edefault,i'Eto the command. This is particularly useful in
the startup file before any files are loaded (see the section
STARTUP FILE).
Items separated by,iAE|,i,Care mutually exclusive possibilities
(i.e. a boolean argument is,i`E1|on|0|off,i'E).
13
LOOKUP(1) LOOKUP(1)
Items shown in brackets (,iAE[,i,Cand,iAE],i,C) are optional. All
commands that accept a boolean argument to set a flag or mode
do so optionally -- with no argument the command will report
the current status of the mode or flag.
Any command that allows an argument in quotes (such as load,
etc.) allow the use of single or double quotes.
The commands:
[default] autokana [_b_o_o_l_e_a_n]
Automatic romaji /c_a kana conversion for the _s_e_l_e_c_t_e_d _f_i_l_e
is turned on or off (default is on). However,
if,i`Edefault,i'Eis specified, the value to be inherited as the
default by subsequently-loaded files is set (or reported).
Can be temporarily disabled by a prepended,iAE=,i,C,as
described in the INPUT SYNTAX section.
clear|cls
Attempts to clear the screen. If you're using a kterm it'll
just output the appropriate tty control sequence. Otherwise
it'll try to run the,i`Eclear,i'Ecommand.
cmdchar ['_o_n_e_-_b_y_t_e_-_c_h_a_r']
The default command-introduction character is a space, but
it may be changed via this command. The single quotes sur-
rounding the character are required. If no argument is
given, the current value is printed.
An input line consisting of a single question mark will
also print the current value (useful for when you don't
know the current value).
Woe to the one that sets the command-introduction character
to one of the other special input-line characters, such
as,iAE+,i,C,,iAE/,i,C, etc.
combine ["name"] [ _n_u_m += ] _s_l_o_t_n_u_m ...
Creates or adds file slots to a combination slot (see the
COMBINATION SLOTS section for general information). Note
that,i`Ecombo,i'Emay be used as the command as well.
Assuming for this example that slots 0-2 are loaded with
the files _c_u_r_l_y, _m_o_e, and _l_a_r_r_y, we can create a combina-
tion slot that will reference all three:
combo "three stooges" 2, 0, 1
The command will report
creating combo slot #3 (three stooges): 2 0 1
14
LOOKUP(1) LOOKUP(1)
The _n_a_m_e is optional, and will appear in the _f_i_l_e_s list,
and also maybe be used to specify the slot as an argument
to the _s_e_l_e_c_t command.
A search via the newly created combo slot would search in
the order specified on the _c_o_m_b_o command line: first _l_a_r_r_y,
then _c_u_r_l_y, and finally _m_o_e.
If you later load another file (say, _j_e_f_f_r_e_y to slot #4),
you can then add it to the previously made combo:
combo 3 += 4
(the,i`E+=,i'Ewording comes from the C programming language
where it means,i`Eadd on to,i'E). Adding to a combination
always adds slots to the end of the list.
You can take the opportunity of adding the slot to also
change the name, if you like:
combo "four stooges" 3 += 4
The reply would be
adding to combo slot #3(four stooges): 4
A file slot can be a component of any particular combo slot
only once. When reporting the created or added slot num-
bers, the number will appear in parenthesis if it had
already been a member of the list.
Furthermore, only _f_i_l_e slots can be component members of
_c_o_m_b_o slots. Attempting to combine combo slot _X to combo
slot _Y will result in having _X's component file slots
(rater than the combo slot itself) added to _Y.
command debug [_b_o_o_l_e_a_n]
Sets the internal command parser debugging flag on or off
(default is off).
debug [_b_o_o_l_e_a_n]
Sets the internal general-debugging flag on or off (default
is off).
describe _s_p_e_c_i_f_i_e_r
This command will tell you how a character (or each charac-
ter in a string) is encoded in the various encoding meth-
ods:
lookup command> describe ",uox"
,i`E,uox,i'Eas EUC is 0xb5a4 (181 164; 265 \244)
as JIS is 0x3524 ( 53 36; 65 \044 "5$")
as KUTEN is 2104 ( 0x1504; 25 \004)
as S-JIS is 0x8b1f (139 31; 213 \037)
15
LOOKUP(1) LOOKUP(1)
The quotes surrounding the character or string to describe
are optional. You can also give a regular ASCII character
and have the double-width version of the character
described.... indicating,i`EA,i'E, for example, would
describe,i`E-L'A,i'E. _S_p_e_c_i_f_i_e_r can also be a four-digit kuten
value, in which case the character with that kuten will be
described.
If a four-digit _s_p_e_c_i_f_i_e_r has a hex digit in it, or if it
is preceded by,i`E0x,i'E, the value is taken as a JIS code. You
can precede the value by,i`Ejis,i'E,,i`Esjis,i'E,,i`Eeuc,i'E,
or,i`Ekuten,i'Eto force interpretation to the requested code.
Finally, _s_p_e_c_i_f_i_e_r can be a string of stripped JIS (JIS w/o
the kanji-in and kanji-out codes, or with the codes but
without the escape characters in them). For exam-
ple,i`EF|K\,i'Ewould describe the two characters AE"u and "E"U.
encoding [euc|sjis|jis]
The same as the -euc, -jis, and -sjis command-line options,
sets the encoding method for interactive input and output
(or reports the current status). More detail over the out-
put encoding can be achieved with the _o_u_t_p_u_t _e_n_c_o_d_i_n_g com-
mand. A separate encoding for input can be set with the
_i_n_p_u_t _e_n_c_o_d_i_n_g command.
files [ - | long ]
Lists what files are loaded in what slots, and some status
information about them, as with:
"-*0"-F wcfh d"/ca I "- 3749k"-/usr/jeff/lib/edict
"- 1"-FM cf d"/ca I "- 754k"-/usr/jeff/lib/kanjidic
"(R)"~"3"~"~"~"~"~","~"~"3"~"~"~"3"~"~"~"~"~"~"~"~"~"~"~"~"~"~
"- 0"-F wcf h d "/ca I "- 2762k"-/usr/jfriedl/lib/edict
"- 1"-FM cf d "/ca I "- 705k"-/usr/jfriedl/lib/kanjidic
"- 2"-F cfWh@d "/ca "- 1k"-/usr/jfriedl/lib/local.words
"-*3"-FM cf htd "/ca "- combo"-kotoba (#2, #0)
"- 4"- cf d "/ca "- 205k"-/usr/dict/words
"+-"~",u"~"~"~"~"~"_o"~"~",u"~"~"~",u"~"~"~"~"~"~"~"~"~"~"~"~"~"~
The first section is the slot number, with a,i`E*,i'Ebeside the
_d_e_f_a_u_l_t _s_l_o_t (as set by the _s_e_l_e_c_t command).
The second section shows per-slot flags and status. Letters
are shown if the flag is on, omitted if off. In the list
below, related commands are given for each item:
F ,i"A if there is a filter {but '#' if disabled}. (filter)
M ,i"A if there is a modify spec {but '%' if disabled}. (modify)
w ,i"A if word-preference mode is turned on. (word)
c ,i"A if case folding is turned on. (fold)
f ,i"A if fuzzification is turned on. (fuzz)
W ,i"A if wildcard-pattern mode is turned on (wildcard)
16
LOOKUP(1) LOOKUP(1)
h ,i"A if highlighting is turned on. (highlight)
t ,i"A if there is a tag {but @ if disabled} (tag)
d ,i"A if found lines should be displayed (display)
",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i",i
a ,i"A if autokana is turned on (autokana)
P ,i"A if there is a file-specific local prompt (prompt)
I ,i"A if the file is loaded with a precomputed index (load)
d ,i"A if the display flag is on (display)
Note that the letters in the upper section directly corre-
spond to the,i`E!!,i'Esequence characters described in the
INPUT SYNTAX section.
If there is a digit at the end of the flag section, it
indicates that only #/10 of the file is actually loaded
into memory (as opposed to the file having been completely
loaded). Unloaded files will be loaded while _l_o_o_k_u_p is
idle, or when first used.
If the slot is a combination slot (as slot #3 is in the
example above), that is noted in the third section, and the
combination name and component slot numbers are noted in
the fourth. Also, for combination slots (which have no _f_i_l_-
_t_e_r or _m_o_d_i_f_y specifications, only the flags), _F and/or _M
are shown if the corresponding mode is allowed during
searches via the combo slot. See the _t_a_g command for info
about _t with respect to combination slots.
If an argument (either,i`E-,i'Eor,i`Elong,i'Ewill work) is given to
the command, a short message about what the flags mean is
also printed.
filter ["_l_a_b_e_l"] [!] /_r_e_g_e_x/[i]
Sets the filter for the _s_e_l_e_c_t_e_d _s_l_o_t (which must contain a
file and not a combination). If a filter is set and active
for a file, any line matching the given _r_e_g_e_x is filtered
from the output (if the,iAE!,i,Cis put before the _r_e_g_e_x, any
line _n_o_t matching the regex is filtered). The _l_a_b_e_l _,
which isn't required, merely acts as documentation in vari-
ous diagnostics.
As an example, consider that _e_d_i_c_t lines often
have,i`E(pn),i'Eon them to indicate that the given English is a
place name. Often these place names can be a bother, so it
would be nice to elide them from the output unless specifi-
cally requested. Consider the example:
lookup command> filter "name" /(pn)/
search [edict]> [ox-ox^I]
,u,i,C1/2 [ox-ox^Iox|] /function/faculty/
,u/c,C1/4 [ox-ox^Iox|] /inductive/
_o`oAE"u [ox-ox^Iox|] /yesterday/
/c~a3 "name" lines filtered/c"a
In the example,,iAE/,i,Ccharacters are used to delimit the
17
LOOKUP(1) LOOKUP(1)
start and stop of the regex (as is common with many pro-
grams). However, any character can be used. A final,iAEi,i,C,
if present, indicates that the regex should be applied in a
case-insensitive manner.
The filter, once set, can be enabled or disabled with the
other form of the,i`Efilter,i'Ecommand (described below). It
can also be temporarily turned off (or, if disabled, tem-
porarily turned on) by the,i`E!F!,i'Eline prefix.
Filtered lines can optionally be saved and then displayed
if you so desire. See the,i`Esaved list
size,i'Eand,i`Eshow,i'Ecommands.
Note that if you have saving enabled and only one line
would be filtered, it is simply printed at the end (rather
than print a one line message about how one line was fil-
tered).
By the way, a better,i`Ename,i'Efilter for _e_d_i_c_t would be:
filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
as it would filter all entries that had only one English
section, that section being a name. It is also an example
of using something other than,iAE/,i,Cto delimit a regex, as
it makes things a bit easier to read.
filter [_b_o_o_l_e_a_n]
Enables or disables the filter for the _s_e_l_e_c_t_e_d _s_l_o_t. If
no argument is given, displays the current filter and sta-
tus.
[default] fold [_b_o_o_l_e_a_n]
The _s_e_l_e_c_t_e_d _s_l_o_t's case folding is turned on or off
(default is on), or reported if no argument given. How-
ever, if,i`Edefault,i'Eis specified, the value to be inherited
as the default by subsequently-loaded files is set (or
reported).
Can be temporarily toggled by the,i`E!c!,i'Eline prefix.
[default] fuzz [_b_o_o_l_e_a_n]
The _s_e_l_e_c_t_e_d _s_l_o_t's fuzzification is turned on or off
(default is on), or reported if no argument given. How-
ever, if,i`Edefault,i'Eis specified, the value to be inherited
as the default by subsequently-loaded files is set (or
reported).
Can be temporarily toggled by the,i`E!f!,i'Eline prefix.
help [_r_e_g_e_x]
Without an argument gives a short help list. With an argu-
ment, lists only commands whose help string is picked up by
18
LOOKUP(1) LOOKUP(1)
the given _r_e_g_e_x.
[default] highlight [_b_o_o_l_e_a_n]
Sets matched-string highlighting on or off for the _s_e_l_e_c_t_e_d
_s_l_o_t (default off), or reports the current status if no
argument is given. However, if,i`Edefault,i'Eis specified, the
value to be inherited as the default by subsequently-loaded
files is set (or reported).
If on, shows in bold or reverse video (see below) that part
of the line which was matched by the search _r_e_g_e_x. If mul-
tiple regexes were given, that part matched by the first
regex is show.
Note that a regex might match a portion of a line which is
later removed by a _m_o_d_i_f_y parameter. In this case, no high-
lighting is done.
Can be temporarily toggled by the,i`E!h!,i'Eline prefix.
highlight style [_b_o_l_d | _i_n_v_e_r_s_e | _s_t_a_n_d_o_u_t | _<_______>]
Sets the style of highlighting for when highlighting is
done. _I_n_v_e_r_s_e (inverse video) and _s_t_a_n_d_o_u_t are the same.
The default is _b_o_l_d. You can also give an HTML tag, such
as,i`E<BOLD>,i'Eand items will be wrapped by <BOLD>...</BOLD>.
This would be particularly useful when the output is going
to a CGI, as when lookup has been built in a server config-
uration.
Note that the highlighting is affected by using raw
VT100/xterm control sequences. This isn't particularly very
nice if your terminal doesn't understand them. Sorry.
if {_e_x_p_r_e_s_s_i_o_n} _c_o_m_m_a_n_d_._._.
If the evaluated _e_x_p_r_e_s_s_i_o_n is non-zero, the _c_o_m_m_a_n_d will
be executed.
Note that {} rather than () surround the _e_x_p_r_e_s_s_i_o_n.
_E_x_p_r_e_s_s_i_o_n may be comprised of numbers, operators, paren-
thesis, etc. In addition to the normal +, -, *, and /,
are:
!_x ,i"A yields 0 if _x is non-zero, 1 if _x is zero.
_x && _y ,i"A
!_x ,i"A,iAEnot,i,CYields 1 if _x is zero, 0 if non-zero.
_x & _y ,i"A,iAEand,i,CYields 1 if both _x and _y are non-zero, 0 otherwise.
_x | _y ,i"A,iAEor,i,C Yields 1 if _x or _y (or both) is non-zero, 0 otherwise
19
LOOKUP(1) LOOKUP(1)
There may also be the special tokens _t_r_u_e and _f_a_l_s_e which
are 1 and 0 respectively.
There are also _c_h_e_c_k_e_d, _m_a_t_c_h_e_d, _p_r_i_n_t_e_d, _n_o_n_w_o_r_d, and _f_i_l_-
_t_e_r_e_d which correspond to the values printed by the _s_t_a_t_s
command.
An example use might be the following kind of thing in an
computer-generated script:
!d!expect this line
if {!printed} msg Oops! couldn't find "expect this line"
input encoding [ euc | sjis ]
Used to set (or report) what encoding to use when 8-bit
bytes are found in the interactive input (all flavors of
JIS are always recognized). Also see the _e_n_c_o_d_i_n_g and _o_u_t_-
_p_u_t _e_n_c_o_d_i_n_g commands.
limit [_v_a_l_u_e]
Sets the number of lines to print during any search before
aborting (or reports the current number if no value given).
Default is 100.
Output limiting is disabled if set to zero.
log [ to [+] _f_i_l_e ]
Begins logging the program output to _f_i_l_e (the Japanese
encoding method being the same as for screen output).
If,i`E+,i'Eis given, the log is appended to any text that might
have previously been in _f_i_l_e, in which case a leading
dashed line is inserted into the file.
If no arguments are given, reports the current logging sta-
tus.
log - | off
If only,i`E-,i'Eor _o_f_f is given, any currently-opened log file
is closed.
load [-now|-whenneeded] "_f_i_l_e_n_a_m_e"
Loads the named file to the next available slot. If a pre-
computed index is found (as,i`E_f_i_l_e_n_a_m_e.jin,i'E)it is loaded as
well. Otherwise, an index is generated internally.
The file to be loaded (and the index, if loaded) will be
loaded during idle times. This allows a startup file to
list many files to be loaded, but not have to wait for each
20
LOOKUP(1) LOOKUP(1)
of them to load in turn. Using the ,i`E-now,i'Eflag causes the
load to happen immediately, while using the ,i`E-when-
needed,i'Eoption (can be shortened to ,i`E-wn,i'E)causes the load
to happen only when the slot is first accessed.
Invoke _l_o_o_k_u_p as
% lookup -writeindex _f_i_l_e_n_a_m_e
to generate and write an index file, which will then be
automatically used in the future.
If the file has already been loaded, the file is not re-
read, but the previously-read file is shared. The new slot
will, however, have its own separate flags, prompt, filter,
etc.
modify /_r_e_g_e_x/_r_e_p_l_a_c_e/[ig]
Sets the _m_o_d_i_f_y parameter for the _s_e_l_e_c_t_e_d _f_i_l_e. If a file
has a modify parameter associated with it, each line
selected during a search will have that part of the line
which matches _r_e_g_e_x (if any) replaced by the _r_e_p_l_a_c_e_m_e_n_t
string before being printed.
Like the _f_i_l_t_e_r command, the delimiter need not be,iAE/,i,C;
any non-space character is fine. If a final,iAEi,i,Cis given,
the regex is applied in a case-insensitive manner. If a
final,iAEg,i,Cis given, the replacement is done to all matches
in the line, not just the first part that might match
_r_e_g_e_x.
The _r_e_p_l_a_c_e_m_e_n_t may have embedded,i`E1,i'E, etc. in it to refer
to parts of the matched text (see the tutorial on regular
expressions).
The modify parameter, once set, may be enabled or disabled
with the other form of the modify command (described
below). It may also be temporarily toggled via
the,i`E!m!,i'Eline prefix.
A silly example for the ultra-nationalist might be:
modify /<Japan>/Dainippon Teikoku/g
So that a line such as
AE"u9|"a [ox"Eox'Aox(R)ox'o] /Bank of Japan/
would come out as
AE"u9|"a [ox"Eox'Aox(R)ox'o] /Bank of Dainippon Teikoku/
As a real example of the modify command with _k_a_n_j_i_d_i_c, con-
sider that it is likely that one is not interested in all
the various fields each entry has. The following can be
used to remove the info on the U, N, Q, M, E, B, C, and Y
fields from the output:
modify /( [UNQMECBY]\S+)+//g,1
It's sort of complex, but works. Note that here the
21
LOOKUP(1) LOOKUP(1)
_r_e_p_l_a_c_e_m_e_n_t part is empty, meaning to just remove those
parts which matched. The result of such a search of AE"u
would normally print
AE"u 467c U65e5 N2097 B72 B73 S4 G1 H3027 F1 Q6010.0 MP5.0714 ,i`A
MN13733 E62 Yri4 P3-3-1 =Y"E=Y'A =Y,=Y"A ox`O -ox'O -ox<< {day}
but with the above modify spec, appears more simply as
AE"u 467c S4 G1 H3027 F1 P3-3-1 =Y"E=Y'A =Y,=Y"A ox`O -ox'O -ox<< {day}
modify [_b_o_o_l_e_a_n]
Enables or disables the modify parameter for the _s_e_l_e_c_t_e_d
_f_i_l_e, or report the current status if no argument is given.
msg _s_t_r_i_n_g
The given _s_t_r_i_n_g is printed.
Most likely used in a script as the target command of an _i_f
command.
output encoding [ euc | sjis | jis...]
Used to set exactly what kind of encoding should be used
for program output (also see the _i_n_p_u_t _e_n_c_o_d_i_n_g command).
Used when the _e_n_c_o_d_i_n_g command is not detailed enough for
one's needs.
If no argument is given, reports the current output encod-
ing. Otherwise, arguments can usually be any reasonable
dash-separated combination of:
euc
Selects EUC for the output encoding.
sjis
Selects Shift-JIS for the output encoding.
jis[78|83|90][-ascii|-roman]
Selects JIS for the output encoding. If no year (78,
83, or 90) given, 78 is used. Can optionally specify
that,i`EEnglish,i'Eshould be encoded as regular _A_S_C_I_I (the
default when JIS selected) or as _J_I_S_-_R_O_M_A_N.
212
Indicates that JIS X0212-1990 should be supported
(ignored for Shift-JIS output).
no212
Indicates that JIS X0212-1990 should be not be sup-
ported (default setting). This places JIS X0212-1990
characters under the domain of _d_i_s_p, _n_o_d_i_s_p, _c_o_d_e, or
_m_a_r_k (described below).
22
LOOKUP(1) LOOKUP(1)
hwk
Indicates that _half _width _kana should be left as-is
(default setting).
nohwk
Indicates that _half _width _kana should be stripped from
the output. _(_n_o_t _y_e_t _i_m_p_l_e_m_e_n_t_e_d_)_.
foldhwk
Indicates that _half _width _kana should be folded to
their full-width counterparts. _(_n_o_t _y_e_t _i_m_p_l_e_m_e_n_t_e_d_)_.
disp
Indicates that _n_o_n_-_d_i_s_p_l_a_y_a_b_l_e characters (such as JIS
X0212-1990 while the output encoding method is Shift-
JIS) should be passed along anyway (most likely
resulting in screen garbage).
nodisp
Indicates that _n_o_n_-_d_i_s_p_l_a_y_a_b_l_e characters should be
quietly stripped from the output.
code
Indicates that _n_o_n_-_d_i_s_p_l_a_y_a_b_l_e characters should be
printed as their octal codes (default setting).
mark
Indicates that _n_o_n_-_d_i_s_p_l_a_y_a_b_l_e characters should be
printed as,i`E,i'u,i'E.
Of course, not all options make sense in all combina-
tions, or at all times. When the current (or new) output
encoding is reported, a complete and exact specifier rep-
resenting the output encoding selected. An example might
be,i`Ejis78-ascii-no212-hwk-code,i'E.
pager [ _b_o_o_l_e_a_n | _s_i_z_e ]
Turns on or off an output pager, sets it's idea of the
screen size, or reports the current status.
_S_i_z_e can be a single number indicating the number of lines
to be printed between,i`EMORE?,i'Eprompts (usually a few lines
less than the total screen height, the default being 20
lines). It can also be two numbers in the form,i`E#x#,i'Ewhere
the first number is the width (in half-width characters;
default 80) and the second is the lines-per-page as above.
If the pager is on, every page of output will result in
a,i`EMORE?,i'Eprompt, at which there are four possible
responses. A space will allow one more full page to print.
A return will allow one more line. A,iAEc,i,C(for,i`Econ-
tinue,i'E) will all the rest of the output (for the current
command) to proceed without pause, while
a,iAEq,i,C(for,i`Equit,i'E) will flush the output for the current
23
LOOKUP(1) LOOKUP(1)
command.
If supported by the OS, the pager size parameters are set
appropriately from the window size upon startup or window
resize.
The default pager status is,i`Eoff,i'E.
[local] prompt "_s_t_r_i_n_g"
Sets the prompt string. If,i`Elocal,i'Eis indicated, sets the
prompt string for the _s_e_l_e_c_t_e_d _s_l_o_t only. Otherwise, sets
the global default prompt string.
Prompt strings may have the special %-sequences shown
below, with related commands given in parenthesis:
%N ,i"A the _d_e_f_a_u_l_t _s_l_o_t's file or combo name.
%n ,i"A like %N, but any leading path is not shown if a filename.
%# ,i"A the _d_e_f_a_u_l_t _s_l_o_t's number.
%S ,i"A the,i`Ecommand-introduction,i'Echaracter (cmdchar)
%0 ,i"A the running program's name
%F='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if filtering enabled (filter)
%M='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if modification enabled (modify)
%w='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if word mode on (word)
%c='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if case folding on (fold)
%f='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if fuzzification on (fuzz).
%W='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if wildcard-pat. mode on (wildcard).
%d='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if displaying on (display).
%C='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if currently entering a command.
%l='_s_t_r_i_n_g' ,i"A _s_t_r_i_n_g shown if logging is on (log).
%L ,i"A the name of the current output log, if any (log)
For the tests (%f, etc), you can put,iAE!,i,Cjust after
the,iAE%,i,Cto reverse the sense of the test (i.e. %!f="no
fuzz"). The reverse of %F is if a filter is installed but
disabled (i.e. _s_t_r_i_n_g will never be shown if there is no
filter for the default file). The modify %M works compara-
bly.
Also, you can use an alternative form for the items that
take an argument string. Replacing the quotes with paren-
theses will treat _s_t_r_i_n_g as a recursive prompt specifier.
For example, the specifier
%C='command'%!C(%f='fuzzy 'search:)
would result in a,i`Ecommand,i'Eprompt if entering a command,
while it would result in either a,i`Efuzzy search:,i'Eor
a,i`Esearch:,i'Eprompt if not entering a command. The paren-
thesized constructs may be nested.
Note that the letters of the test constructs are the same
as the letters for the,i`E!!,i'Esequences described in INPUT
SYNTAX.
24
LOOKUP(1) LOOKUP(1)
An example of a nice prompt command might be:
prompt "%C(%0 command)%!C(%w'*'%!f'raw '%n)> "
With this prompt specification, the prompt would normally
appear as,i`E_f_i_l_e_n_a_m_e> ,i'Ebut when fuzzification is turned off
as,i`Eraw _f_i_l_e_n_a_m_e> ,i'E. And if word-preference mode is on,
the whole thing has a,i`E*,i'Eprepended. However if a command
is being entered, the prompt would then become,i`E_n_a_m_e com-
mand,i'E, where _n_a_m_e was the program's name (system depen-
dent, but most likely,i`Elookup,i'E).
The default prompt format string is,i`E%C(%0 com-
mand)%!C(search [%n])> ,i'E.
regex debug [_b_o_o_l_e_a_n]
Sets the internal regex debugging flag (turn on if you want
billions of lines of stuff spewed to your screen).
saved list size [_v_a_l_u_e]
During a search, lines that match might be elided from the
output due to filters or word-preference mode. This com-
mand sets the number of such lines to remember during any
one search, such that they may be later displayed (before
the next search) by the _s_h_o_w command.
The default is 100.
select [ _n_u_m | _n_a_m_e | . ]
If _n_u_m is given, sets the _d_e_f_a_u_l_t _s_l_o_t to that slot number.
If _n_a_m_e is given, sets the _d_e_f_a_u_l_t _s_l_o_t to the first slot
found with a file (or combination) loaded with that name.
The incantation,i`Eselect .,i'Emerely sets the default slot to
itself, which can be useful in script files where you want
to indicate that any subsequent flags changes should work
with whatever file was the default at the time the script
was _s_o_u_r_c_ed.
If no argument is given, simply reports the current _d_e_f_a_u_l_t
_s_l_o_t (also see the _f_i_l_e_s command).
In command files loaded via the _s_o_u_r_c_e command, or as the
startup file, commands dealing with per-slot items (flags,
local prompt, filters, etc.) work with the file or slot
last _s_e_l_e_c_ted. The last such selected slot remains
selected once the load is complete.
Interactively, the default slot will become the _s_e_l_e_c_t_e_d
_s_l_o_t for subsequent searches and commands that aren't aug-
mented with an appended,i`E,#,i'E(as described in the INPUT
SYNTAX section).
show
Shows any lines elided from the previous search (either due
25
LOOKUP(1) LOOKUP(1)
to a _f_i_l_t_e_r or _w_o_r_d_-_p_r_e_f_e_r_e_n_c_e _m_o_d_e).
Will apply any modifications (see the,i`Emodify,i'Ecommand) if
modifications are enabled for the file. You can use
the,i`E!m!,i'Eline prefix as well with this command (in this
case, put the,i`E!m!,i'E_b_e_f_o_r_e the command-indicator charac-
ter).
The length of the list is controlled by the,i`Esaved list
size,i'Ecommand.
source "_f_i_l_e_n_a_m_e"
Commands are read from _f_i_l_e_n_a_m_e and executed.
In the file, all lines beginning with,i`E#,i'Eare ignored as
comments (note that comments must appear on a line by them-
selves, as,i`E#,i'Eis a reasonable character to have within
commands).
Lines whose first non-blank characters
is,i`E=,i'E,,i`E!,i'E,or,i`E+,i'Eare considered searches, while all
other non-blank lines are considered _l_o_o_k_u_p commands.
Therefore, there is no need for lines to begin with the
command-introduction character. However, leading whitespace
is always OK.
For search lines, take care that any trailing whitespace is
deleted if undesired, as trailing whitespace (like all non-
leading whitespace) is kept as part of the regular expres-
sion.
Within a command file, commands that modify per-file flags
and such always work with the most-recently loaded (or
selected) file. Therefore, something along the lines of
load "my.word.list"
set word on
load "my.kanji.list"
set word off
set local prompt "enter kanji> "
would word as might make intuitive sense.
Since a script file must have a _l_o_a_d, or _s_e_l_e_c_t before any
per-slot flag is set, one can use,i`Eselect .,i'Eto facilitate
command scripts that are to work with,i`Ethe current slot,i'E.
spinner [_v_a_l_u_e]
Set the value of the spinner (A silly little feature). If
set to a non-zero value, will cause a spinner to spin while
a file is being checked, one increment per _v_a_l_u_e lines in
26
LOOKUP(1) LOOKUP(1)
the file actually checked against the search specifier.
Default is off (i.e. zero).
stats
Shows information about how many lines of the text file
were checked against the last search specifier, and how
many lines matched and were printed.
tag [_b_o_o_l_e_a_n] ["_s_t_r_i_n_g"]
Enable, disable, or set the tag for the _s_e_l_e_c_t_e_d _s_l_o_t.
If the slot is not a combination slot, a tag _s_t_r_i_n_g may be
set (the quotes are required).
If a tag string is set and enabled for a file, the string
is prepended to each matching output line printed.
Unlike the _f_i_l_t_e_r and _m_o_d_i_f_y commands which automatically
enable the function when a parameter is set, a _t_a_g is not
automatically enabled when set. It can be enabled while
being set via,i`E'tag,i'Eonor could be enabled subsequently via
just,i`Etag on,i'E If the selected slot is a combination slot,
only the enable/disable status may be changed (on by
default). No tag string may be set.
The reason for the special treatment lies in the special
nature of how tags work in conjunction with combination
files.
During a search when the selected slot is a combination
slot, each file which is a member of the combination has
its per-file flags disabled if their corresponding flag is
disabled in the original combination slot. This allows the
combination slot's flags to act as a,i`Emask,i'Eto blot out
each component file's per-file flags.
The tag flag, however, is special in that the component
file's tag flag is turned _o_n if the combination slot's tag
flag is turned on (and, of course, the component file has a
tag string registered).
The intended use of this is that one might set a (disabled)
tag to a file, yet _d_i_r_e_c_t searches against that file will
have no prepended tag. However, if the file is searched as
part of a combination slot (and the combination slot's tag
flag is on), the tag _w_i_l_l be prepended, allowing one to
easily understand from which file an output line comes.
verbose [_b_o_o_l_e_a_n]
Sets verbose mode on or off, or reports the current status
(default on). Many commands reply with a confirmation if
verbose mode is turned on.
27
LOOKUP(1) LOOKUP(1)
version
Reports the current version of the program.
[default] wildcard [_b_o_o_l_e_a_n]
The _s_e_l_e_c_t_e_d _s_l_o_t's patterns are considerd wildcard pat-
terns if turned on, regular expressions if turned off. The
current status is reported if no argument given. However,
if,i`Edefault,i'Eis specified, the pattern-type to be inherited
as the default by subsequently-loaded files is set (or
reported).
Can be temporarily toggled by the,i`E!W!,i'Eline prefix.
When wildcard patterns are selected, the changed metachar-
acters are:,i`E*,i'Emeans,i`Eany stuff,i'E,,i`E?,i'Emeans,i`Eany one
character,i'E,while,i`E+,i'Eand,i`E.,i'Ebecome unspecial. Other regex
items such as,i`E|,i'E,,i`E(,i'E,,i`E[,i'E,etc. are unchanged.
What,i`E*,i'Eand,i`E?,i'Ewill actually match depends upon the sta-
tus of word-mode, as well as on the pattern itself. If
word-mode is on, or if the pattern begins with the start-
of-word,i`E<,i'Eor,i`E[,i'E,only non-spaces will be matched. Other-
wise, any character will be matched.
In summary,when wildcard mode is on, the input pattern is
effected in the following ways:
* is changed to the regular expression .* or
? is changed to the regular expression . or + is changed to the regular expression +
. is changed to the regular expression .
Because filename patterns are often called,i`Efilename
globs,i'E,the command,i`Eglob,i'Ecan be used in place of,i`Ewild-
card,i'E.
[default] word|wordpreference [_b_o_o_l_e_a_n]
The selected file's word-preference mode is turned on or
off (default is off), or reports the current setting if no
argument is specified. However, if,i`Edefault,i'Eis specified,
the value to be inherited as the default by subsequently-
loaded files is set (or reported).
In word-preference mode, entries are searched for _a_s _i_f the
search regex had a leading,iAE<,i,Cand a trailing,iAE>,i,C,
resulting in a list of entries with a whole-word match of
the regex. However, if there are none, but there _a_r_e non-
word entries, the non-word entries are shown (the,i`Esaved
list,i'Eis used for this -- see that command). This make it
an,i`Eif there are whole words like this, show me, otherwise
show me whatever you've got,i'Emode.
If there are both word and non-word entries, the non-word
28
LOOKUP(1) LOOKUP(1)
entries are remembered in the saved list (rather than any
possible filtered entries being remembered there).
One caveat: if a search matches a line in more than one
place, and the first is _n_o_t a whole-word, while one of the
others _i_s, the line will be listed considered non-whole
word. For example, the search,i"Ojapan,ixwith word-preference
mode on will not list an entry such as,i`E/Japanese/language
in Japan/,i'E, as the first,i`EJapan,i'Eis part of,i`EJapanese,i'Eand
not a whole word. If you really need just whole-word
entries, use the,iAE<,i,Cand,iAE>,i,Cyourself.
The mode may be temporarily toggled via the,i`E!w!,i'Eline pre-
fix.
The rules defining what lines are filtered, remembered,
discarded, and shown for each permutation of search are
rather complex, but the end result is rather intuitive.
quit | leave | bye | exit
Exits the program.
SSTTAARRTTUUPP FFIILLEE
If the file,i`E~/.lookup,i'Eis present, commands are read from it
during _l_o_o_k_u_p startup.
The file is read in the same way as the _s_o_u_r_c_e command reads
files (see that entry for more information on file format,
etc.)
However, if there had been files loaded via command-line argu-
ments, commands within the startup file to load files (and
their associated commands such as to set per-file flags) are
ignored.
Similarly, any use of the command-line flags -euc, -jis, or
-sjis will disable in the startup file the commands dealing
with setting the input and/or output encodings.
The special treatment mentioned in the above two paragraphs
only applies to commands within the startup file itself, and
does not apply to commands in command-files that might be
_s_o_u_r_c_ed from within the startup file.
The following is a reasonable example of a startup file:
## turn verbose mode off during startup file processing
verbose off
prompt "%C([%#]%0)%!C(%w'*'%!f'raw '%n)> "
spinner 200
pager on
## The filter for edict will hit for entries that
## have only one English part, and that English part
29
LOOKUP(1) LOOKUP(1)
## having a pl or pn designation.
load ~/lib/edict
filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
highlight on
word on
## The filter for kanjidic will hit for entries without a
## frequency-of-use number. The modify spec will remove
## fields with the named initial code (U,N,Q,M,E, and Y)
load ~/lib/kanjidic
filter "uncommon" !/<F\d+>/
modify /( [UNQMEY])+//g
## Use the same filter for my local word file,
## but turn off by default.
load ~/lib/local.words
filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
filter off
highlight on
word on
## Want a tag for my local words, but only when
## accessed via the combo below
tag off ",i~O"
combine "words" 2 0
select words
## turn verbosity back on for interactive use.
verbose on
CCOOMMMMAANNDD--LLIINNEE AARRGGUUMMEENNTTSS
With the use of a startup file, command-line arguments are
rarely needed. In practical use, they are only needed to cre-
ate an index file, as in:
lookup -write _t_e_x_t_f_i_l_e
Any command line arguments that aren't flags are taken to be
files which are loaded in turn during startup. In this case,
any,i`Eload,i'E,,i`Efilter,i'E, etc. commands in the startup file are
ignored.
The following flags are supported:
-help
Reports a short help message and exits.
-write Creates index files for the named files and exits. No
_s_t_a_r_t_u_p _f_i_l_e is read.
-euc
Sets the input and output encoding method to EUC (currently
the default). Exactly the same as the,i`Eencoding
30
LOOKUP(1) LOOKUP(1)
euc,i'Ecommand.
-jis
Sets the input and output encoding method to JIS. Exactly
the same as the,i`Eencoding jis,i'Ecommand.
-sjis
Sets the input and output encoding method to Shift-JIS.
Exactly the same as the,i`Eencoding sjis,i'Ecommand.
-v -version
Prints the version string and exits.
-norc
Indicates that the startup file should not be read.
-rc _f_i_l_e
The named file is used as the startup file, rather than the
default,i`E~/.lookup,i'E. It is an error for the file not to
exist.
-percent _n_u_m
When an index is built, letters that appear on more than
_n_u_m percent (default 50) of the lines are elided from the
index. The thought is that if a search will have to check
most of the lines in a file anyway, one may as well save
the large amount of space in the index file needed to rep-
resent that information, and the time/space tradeoff
shifts, as the indexing of oft-occurring letters provides a
diminishing return.
Smaller indexes can be made by using a smaller number.
-noindex
Indicates that any files loaded via the command line should
not be loaded with any precomputed index, but recalculated
on the fly.
-verbose
Has metric tons of stats spewed whenever an index is cre-
ated.
-port ###
For the (undocumented) server configuration only, tells
which port to listen on.
OOPPEERRAATTIINNGG SSYYSSTTEEMM CCOONNSSIIDDEERRAATTIIOONNSS
I/O primitives and behaviors vary with the operating system.
On my operating system, I can,i`Eread,i'Ea file by mapping it into
memory, which is a pretty much instant procedure regardless of
the size of the file. When I later access that memory, the
appropriate sections of the file are automatically read into
memory by the operating system as needed.
31
LOOKUP(1) LOOKUP(1)
This results in _l_o_o_k_u_p starting up and presenting a prompt
very quickly, but causes the first few searches that need to
check a lot of lines in the file to go more slowly (as lots of
the file will need to be read in). However, once the bulk of
the file is in, searches will go very fast. The win here is
that the rather long file-load times are amortized over the
first few (or few dozen, depending upon the situation)
searches rather than always faced right at command startup
time.
On the other hand, on an operating system without the mapping
ability, _l_o_o_k_u_p would start up very slowly as all the files
and indexes are read into memory, but would then search
quickly from the beginning, all the file already having been
read.
To get around the slow startup, particularly when many files
are loaded, _l_o_o_k_u_p uses _l_a_z_y _l_o_a_d_i_n_g if it can: a file is not
actually read into memory at the time the _l_o_a_d command is
given. Rather, it will be read when first actually accessed.
Furthermore, files are loaded while _l_o_o_k_u_p is idle, such as
when waiting for user input. See the _f_i_l_e_s command for more
information.
RREEGGUULLAARR EEXXPPRREESSSSIIOONNSS,, AA BBRRIIEEFF TTUUTTOORRIIAALL
_R_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n_s (,i`Eregex,i'Efor short) are a,i`Ecode,i'Eused to
indicate what kind of text you're looking for. They're how
one searches for things in the editors,i`Evi,i'E,,i`Este-
vie,i'E,,i`Emifes,i'Eetc., or with the grep commands. There are
differences among the various regex flavors in use -- I'll
describe the flavor used by _l_o_o_k_u_p here. Also, in order to be
clear for the common case, I might tell a few lies, but noth-
ing too heinous.
The regex,i"Oa,ixmeans,i`Eany line with an,iAEa,i,Cin it.,i'E Simple
enough.
The regex,i"Oab,ixmeans,i`Eany line with an,iAEa,i,Cimmediately fol-
lowed by a,iAEb,i,C,i'E. So the line
I am feeling flabby
would,i`Ematch,i'Ethe regex,i"Oab,ixbecause, indeed, there's
an,i`Eab,i'Eon that line. But it wouldn't match the line
this line has no a followed _immediately_ by a b
because, well, what the lines says is true.
In most cases, letters and numbers in a regex just mean that
you're looking for those letters and numbers in the order
given. However, there are some special characters used within
a regex.
A simple example would be a period. Rather than indicate that
you're looking for a period, it means,i`Eany character,i'E. So
32
LOOKUP(1) LOOKUP(1)
the silly regex,i"O.,ixwould mean,i`Eany line that has any charac-
ter on it.,i'EWell, maybe not so silly... you can use it to find
non-blank lines.
But more commonly it's used as part of a larger regex. Con-
sider the regex,i"Ogray,ix. It wouldn't match the line
The sky was grey and cloudy.
because of the different spelling (grey vs. gray). But the
regex,i"Ogr.y,ixasks for,i`Eany line with a,iAEg,i,C,,iAEr,i,C, some
character, and then a,iAEy,i,C,i'E. So this would
get,i`Egrey,i'Eand,i`Egray,i'E. A special construct somewhat similar
to,iAE.,i,Cwould be the _c_h_a_r_a_c_t_e_r _c_l_a_s_s. A character class
starts with a,iAE[,i,Cand ends with a,iAE],i,C, and will match any
character given in between. An example might be
gr[ea]y
which would match lines with a,iAEg,i,C,,iAEr,i,C, an,iAEe,i,C_o_r
an,iAEa,i,C, and then a,iAEy,i,C. Inside a character class you can
list as many characters as you want to.
For example the simple regex,i"Ox[0123456789]y,ixwould match any
line with a digit sandwiched between an,iAEx,i,Cand a,iAEy,i,C.
The order of the characters within the character class doesn't
really matter...,i"O[513467289],ixwould be the same
as,i"O[0123456789],ix.
But as a short cut, you could put,i"O[0-9],ixinstead
of,i"O[0123456789],ix. So the character class,i"O[a-z],ixwould
match any lower-case letter, while the character
class,i"O[a-zA-Z0-9],ixwould match any letter or digit.
The character,iAE-,i,Cis special within a character class, but
only if it's not the first thing. Another character that's
special in a character class is,iAE^,i,C, if it _i_s the first
thing. It,i`Einverts,i'Ethe class so that it will match any char-
acter _n_o_t listed. The class,i"O[^a-zA-Z0-9],ixwould match any
line with spaces or punctuation on them.
There are some special short-hand sequences for some common
character classes. The sequence,i"O\d,ixmeans,i`Edigit,i'E, and is
the same as,i"O[0-9],ix. ,i"O\w,ixmeans,i`Eword element,i'Eand is the
same as,i"O[0-9a-zA-Z_],ix. ,i"O\s,ixmeans,i`Espace-type thing,i'Eand is
the same as,i"O[ \t],ix(,i"O\t,ixmeans tab).
You can also use,i"O\D,ix,,i"O\W,ix, and,i"O\S,ixto mean things _n_o_t a
digit, word element, or space-type thing.
Another special character would be,iAE?,i,C. This means,i`Emaybe
one of whatever was just before it, not is fine too,i'E. In the
regex ,i"Obikes? for rent,ix, the,i`Ewhatever,i'Ewould be the,iAEs,i,C,
33
LOOKUP(1) LOOKUP(1)
so this would match lines with either,i`Ebikes for
rent,i'Eor,i`Ebike for rent,i'E.
Parentheses are also special, and can group things together.
In the regex
big (fat harry)? deal
the,i`Ewhatever,i'Efor the,iAE?,i,Cwould be,i`Efat harry,i'E. But be
careful to pay attention to details... this regex would match
I don't see what the big fat harry deal is!
but _n_o_t
I don't see what the big deal is!
That's because if you take away the,i`Ewhatever,i'Eof the,iAE?,i,C,
you end up with
big deal
Notice that there are _t_w_o spaces between the words, and the
regex didn't allow for that. The regex to get either line
above would be
big (fat harry )?deal
or
big( fat harry)? deal
Do you see how they're essentially the same?
Similar to,iAE?,i,Cis,iAE*,i,C, which means,i`Eany number, including
none, of whatever's right in front,i'E. It more or less means
that whatever is tagged with,iAE*,i,Cis allowed, but not
required, so something like
I (really )*hate peas
would match,i`EI hate peas,i'E,,i`EI really hate peas!,i'E,,i`EI really
really hate peas,i'E, etc.
Similar to both,iAE?,i,Cand,iAE*,i,Cis,iAE+,i,C, which means,i`Eat least
one of whatever just in front, but more is fine too,i'E. The
regex,i"Omis+pelling,ixwould match,i`Emi_spelling,i'E,,i`Emi_s_-
_spelling,i'E,,i`Emi_s_s_spelling,i'E, etc. Actually, it's just the same
as,i"Omiss*pelling,ixbut more simple to type. The
regex,i"Oss*,ixmeans,i`Ean,iAEs,i,C, followed by zero or more,iAEs,i,C,i'E,
while,i"Os+,ixmeans,i`Eone or more,iAEs,i,C,i'E. Both really the same.
The special character,iAE|,i,Cmeans,i`Eor,i'E. Unlike,iAE+,i,C,,iAE*,i,C,
and,iAE?,i,Cwhich act on the thing _i_m_m_e_d_i_a_t_e_l_y before,
the,iAE|,i,Cis more,i`Eglobal,i'E.
give me (this|that) one
Would match lines that had,i`Egive me this one,i'Eor,i`Egive me that
one,i'Ein them.
You can even combine more than two:
give me (this|that|the other) one
How about:
[Ii]t is a (nice |sunny |bright |clear )*day
34
LOOKUP(1) LOOKUP(1)
Here, the,i`Ewhatever,i'Eimmediately before the,iAE*,i,Cis
(nice |sunny |bright |clear )
So this regex would match all the following lines:
_I_t _i_s _a _d_a_y.
I think _i_t _i_s _a _n_i_c_e _d_a_y.
_I_t _i_s _a _c_l_e_a_r _s_u_n_n_y _d_a_y today.
If _i_t _i_s _a _c_l_e_a_r _s_u_n_n_y _n_i_c_e _s_u_n_n_y _s_u_n_n_y _s_u_n_n_y _b_r_i_g_h_t _d_a_y then....
Notice how the,i"O[Ii]t,ixmatches either,i`EIt,i'Eor,i`Eit,i'E?
Note that the above regex would also match
fru_i_t _i_s _a _d_a_y
because it indeed fulfills all requirements of the regex, even
though the,i`Eit,i'Eis really part of the word,i`Efruit,i'E. To
answer concerns like this, which are common,
are,iAE<,i,Cand,iAE>,i,C, which mean,i`Eword break,i'E. The
regex,i"O<it,ixwould match any line with,i`Eit,i'E_b_e_g_i_n_n_i_n_g _a _w_o_r_d,
while,i"Oit>,ixwould match any line with,i`Eit,i'E_e_n_d_i_n_g _a _w_o_r_d.
And, of course,,i"O<it>,ixwould match any line with _t_h_e
_w_o_r_d,i`Eit,i'Ein it.
Going back to the regex to find grey/gray, that would make
more sense, then, as
<gr[ae]y>
which would match only the _w_o_r_d_s,i`Egrey,i'Eand,i`Egray,i'E. Some-
what similar are,iAE^,i,Cand,iAE$,i,C, which mean,i`Ebeginning of
line,i'Eand,i`Eend of line,i'E, respectively (but, not in a charac-
ter class, of course). So the regex,i"O^fun,ixwould find any
line that begins with the letters,i`Efun,i'E, while,i"O^fun>,ixwould
find any line that begins with the _w_o_r_d,i`Efun,i'E.
,i"O^fun$,ixwould find any line that was exactly,i`Efun,i'E.
Finally,,i"O^\s*fun\s*$,ixwould match any line
that,i`Efun,i'Eexactly, but perhaps also had leading and/or trail-
ing whitespace.
That's pretty much it. There are more complex things, some of
which I'll mention in the list below, but even with these few
simple constructs one can specify very detailed and complex
patterns.
Let's summarize some of the special things in regular expres-
sions:
Items that are basic units:
_c_h_a_r any non-special character matches itself.
\_c_h_a_r special chars, when proceeded by \, become non-special.
. Matches any one character (except \n).
\n Newline
\t Tab.
\r Carriage Return.
\f Formfeed.
\d Digit. Just a short-hand for [0-9].
\w Word element. Just a short-hand for [0-9a-zA-Z_].
\s Whitespace. Just a short-hand for [\t \n\r\f].
35
LOOKUP(1) LOOKUP(1)
\## \### Two or three digit octal number indicating a single byte.
[_c_h_a_r_s] Matches a character if it's one of the characters listed.
[^_c_h_a_r_s] Matches a character if it's not one of the ones listed.
The \_c_h_a_r items above can be used within a character class,
but not the items below.
\D Anything not \d.
\W Anything not \w.
\S Anything not \s.
\a Any ASCII character.
\A Any multibyte character.
\k Any (not half-width) katakana character (including ,i1/4).
\K Any character not \k (except \n).
\h Any hiragana character.
\H Any character not \h (except \n).
(_r_e_g_e_x) Parens make the _r_e_g_e_x one unit.
(?:_r_e_g_e_x) [from perl5] Grouping-only parens -- can't use for \# (below)
\c Any JISX0208 kanji (kuten rows 16-84)
\C Any character not \c (except \n).
\# Match whatever was matched by the #th paren from the left.
With,i`E,i`u,i'Eto indicate one,i`Eunit,i'Eas above, the following may be used:
,i`u? A ,i`u allowed, but not required.
,i`u+ At least one ,i`u required, but more ok.
,i`u* Any number of ,i`u ok, but none required.
There are also ways to match,i`Esituations,i'E:
\b A word boundary.
< Same as \b.
> Same as \b.
^ Matches the beginning of the line.
$ Matches the end of the line.
Finally, the,i`Eor,i'Eis
_r_e_g_1|_r_e_g_2 Match if either _r_e_g_1 or _r_e_g_2 match.
Note that,i`E\k,i'Eand the like aren't allowed in character classes, so
something such as,i"O[\k\h],ixto try to get all kana won't work.
Use ,i"O(\k|\h),ixinstead.
BBUUGGSS
Needs full support for half-width katakana and JIS X
0212-1990.
Non-EUC (JIS & SJIS) items not tested well.
Probably won't work on non-UNIX systems.
Screen control codes (for clear and highlight commands) are
hard-coded for ANSI/VT100/kterm.
36
LOOKUP(1) LOOKUP(1)
AAUUTTHHOORR
Jeffrey Friedl (jfriedl@nff.ncl.omron.co.jp)
IINNFFOO
Jim Breen's text files _e_d_i_c_t and _k_a_n_j_i_d_i_c and their documenta-
tion can be found in,i`Epub/nihongo,i'Eon ftp.cc.monash.edu.au
(130.194.1.106
Information on input and output encoding and codes can be
found in Ken Lunde's _U_n_d_e_r_s_t_a_n_d_i_n_g _J_a_p_a_n_e_s_e _I_n_f_o_r_m_a_t_i_o_n _P_r_o_-
_c_e_s_s_i_n_g (AE"u"E"U,`i3/4`'o^E'o1/2`e'I'y) published by O'Reilly and Asso-
ciates. ISBN 1-56592-043-0. There is also a Japanese edition
published by SoftBank.
A program to convert files among the various encoding methods
is Dr. Ken Lunde's_j_c_o_n_v, which can also be found on
ftp.cc.monash.edu.au. _J_c_o_n_v is also useful for converting
halfwidth katakana (which _l_o_o_k_u_p doesn't yet support well) to
full-width.
37
|