1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282
|
.if \n1 .ll \n1n \" for page width.. . use cmd line arg -r1# to set width to #
.de Q \" puts quotes around the argument. End previous line with \c
\&\&\\$1\&\&\\$2\&\\c
..
.TH LOOKUP 1
.nr IN 3n
.ce 1
April 22nd, 1994
.SH NAME
lookup \- interactive file search and display
.SH SYNOPSIS
.B lookup
[
args
]
[
.I file ...
]
.br
.SH DESCRIPTION
.I Lookup
allows the quick interactive search of text files. It supports ASCII,
JIS-ROMAN, and Japanese EUC Packed formated text, and has an
integrated romajikana converter.
.SH THIS MANUAL
.I Lookup
is flexible for a variety of applications. This manual will, however,
focus on the application of searching Jim Breen's
.I edict
(Japanese-English dictionary) and
.I kanjidic
(kanji database). Being familiar with the content and format of these
files would be helpful. See the INFO section near the end of this
manual for information on how
to obtain these files and their documentation.
.SH OVERVIEW OF MAJOR FEATURES
The following just mentions some major features to whet your appetite
to actually read the whole manual (-:
.TP
Romaji-to-Kana Converter
.I Lookup
can convert romaji to kana for you, even\c
.Q "on the fly"
as you type.
.TP
Fuzzy Searching
Searches can be a bit\c
.Q vague
or\c
.Q fuzzy ", "
so that you'll be able to
find\c
.Q
even if you try to search for\c
.Q Ȥ
(the proper yomikata being\c
.Q Ȥ礦 "). "
.TP
Regular Expressions
Uses the powerful and expressive
.I "regular expression"
for searching. One can easily specify complex searches that affect\&\&I want
lines that look like such-and-such, but not like this-and-that, but that
also have this particular characteristic....\&
.TP
Wildcard ``Glob'' Patterns
Optionally, can use well-known filename wildcard patterns instead of
full-fledged regular expressions.
.TP
Filters
You can have
.I lookup
not list certain lines that would otherwise match your search, yet can
optionally save them for quick review. For example, you could have all
name-only entries from
.I edict
filtered from normal output.
.TP
Automatic Modifications
Similarly, you can do a standard search-and-replace on lines just before
they print, perhaps to remove information you don't care to see on
most searches. For example, if you're generally not interested in
.IR kanjidic "'s"
info on Chinese readings, you can have them removed from lines before
printing.
.TP
Smart Word-Preference Mode
You can have
.I lookup
list only entries with
.I "whole words"
that match your search (as opposed to an
.I embedded
match, such as finding\c
.Q the
inside\c
.Q them "), "
but if no whole-word
matches exist, will go ahead and list any entry that matches the
search.
.TP
Handy Features
Other handy features include a dynamically settable and
parameterized prompt, automatic highlighting of that part of the line
that matches your search, an output pager, readline-like input with
horizontal scrolling for long input lines, a\c
.Q .lookup
startup file, automated programability, and much more. Read on!
.SH REGULAR EXPRESSIONS
.I Lookup
makes liberal use of
.I "regular expressions"
(or
.I regex
for short) in controlling various aspects of the searches. If you are
not familiar with the important concepts of regexes, read the tutorial
appendix of this manual before continuing.
.SH JAPANESE CHARACTER ENCODING METHODS
Internally,
.I lookup
works with Japanese packed-format EUC, and all files loaded must be
encoded similarly. If you have files encoded in JIS or Shift-JIS, you
must first convert them to EUC before loading (see the INFO section for
programs that can do this).
Interactive input and output encoding, however,
may be be selected via the -jis, -sjis, and -euc invocation flags
(default is -euc),
or by various commands to the program (described later).
Make sure to use the encoding appropriate for your system. If you're
using kterm under the X Window System, you can use
.IR lookup "'s"
-jis flag to match kterm's default JIS encoding. Or, you might use
kterm's\c
.Q "-km euc"
startup option (or menu selection) to put kterm into
EUC mode. Also, I have found kterm's scrollbar (\c
.Q "-sb -sl 500" ") "
to be quite useful.
With many\c
.Q English
fonts in Japan, the character that normally prints
as a backslash (halfwidth version of \&) in The States appears as a
yen symbol (the half-width version of \&). How it will appear on your
system is a function of what font you use and what output encoding
method you choose, which may be different from the font and method
that was used to print this manual (both of which may be different
from what's printed on your keyboard's appropriate key). Make sure to
keep this in mind while reading.
.SH STARTUP
Let's assume that your copy of
.I edict
is in ~/lib/edict. You can start the program simply with
.nf
lookup ~/lib/edict
.fi
You'll note that
.I lookup
spends some time building an index before the default\c
.Q "lookup>\ "
prompt appears.
.I Lookup
gains much of its search speed by constructing an index of the file(s)
to be searched. Since building the index can be time consuming itself,
you can have
.I lookup
write the built index to a file that can be
quickly loaded the next time you run the program.
Index files will be given a\c
.Q .jin
(Jeffrey's Index) ending.
Let's build the indices for
.I edict
and
.I kanjidic
now:
.nf
lookup -write ~/lib/edict ~/lib/kanjidic
.fi
This will create the index files
.nf
~/lib/edict.jin
~/lib/kanjidic.jin
.fi
and exit.
You can now re-start
.I lookup ,
automatically using the pre-computed index files as:
.nf
lookup ~/lib/edict ~/lib/kanjidic
.fi
You should then be presented with the prompt without having to wait
for the index to be constructed (but see the section on Operating
System concerns for possible reasons of delay).
.SH INPUT
There are basically two types of input: searches and commands.
Commands do such things as tell
.I lookup
to load more files or set flags. Searches report lines of a file that
match some search specifier (where lines to search for are specified by
one or more regular expressions).
The input syntax may perhaps at first seem odd, but has been designed
to be powerful and concise. A bit of time invested to learn it
well will pay off greatly when you need it.
.SH BRIEF EXAMPLE
Assuming you've started
.I lookup
with
.I edict
and
.I kanjidic
as noted above, let's try a few searches. In these examples, the
.nf
search [edict]>
.fi
is the prompt.
Note that the space after the\&\&>\&\&is part of the prompt.
Given the input:
.nf
search [edict]> tranquil
.fi
.I lookup
will report all lines with the string\c
.Q tranquil
in them. There are currently about
a dozen such lines, two of which look like:
.nf
¤餫 [䤹餫] /peaceful (an)/tranquil/calm/restful/
¤餮 [䤹餮] /peace/tranquility/
.fi
Notice that lines with\c
.Q tranquil
\fIand\fP\c
.Q tranquility
matched? This is because\c
.Q tranquil
was embedded in the
word\&\&tranquility\&\&.
You could restrict the search to only the
\fIword\fP\c
.Q tranquil
by prepending the special\c
.Q "start of word"
symbol\&\&<\&\&and appending the special\c
.Q "end of word"
symbol\&\&>\&\&to the regex, as in:
.nf
search [edict]> <tranquil>
.fi
This is the regular expression that says\&\&the beginning of a word,
followed by a\&\&t\&\&,\&\&r\&\&, ...,\&\&l\&\&, which is at the end of a word.\&The
current version of
.I edict
has just three matching entries.
Let's try another:
.nf
search [edict]> fukushima
.fi
This is a search for the\c
.Q English
fukushima -- ways to search for
kana or kanji will be explored later. Note that among the several
lines selected and printed are:
.nf
[դ] /Fukushima (pn,pl)/
ʡ [դ] /Kisofukushima (pl)/
.fi
By default, searches are done in a case-insensitive
manner --\&\&F\&\&and\&\&f\&\&are treated the same by
.IR lookup ,
at least so far as the matching goes. This is called
.IR "case folding" .
Let's give a command to turn this option off,
so that\&\&f\&\&and\&\&F\&\&won't
be considered the same. Here's an odd point about
.I "lookup's"
input syntax: the default setting is that all command lines must begin
with a space. The space is the (default) command-introduction
character and tells the input parser to expect a command rather than a
search regular expression.
.I
It is a common mistake at first to forget the leading space when
issuing a command. Be careful.
Try the command\c
.Q "\ fold"
to report the current status of case-folding.
Notice that as soon as you type the space, the prompt changes to
.nf
lookup command>
.fi
as a reminder that now you're typing a command rather than a search
specification.
.nf
lookup command> fold
.fi
The reply should be\c
.Q "file #0's case folding is on"
.br
You can actually turn it off with\c
.Q " fold off" ". "
Now try the search for\c
.Q fukushima
again. Notice that this time the entries with\c
.Q Fukushima
aren't listed? Now try the search string\c
.Q Fukushima
and see that the entries with\c
.Q fukushima
aren't listed.
Case folding is usually very convenient (it also makes corresponding
katakana and hiragana match the same), so don't forget to turn it back on:
.nf
lookup command> fold on
.fi
.SH JAPANESE INPUT
.I Lookup
has an automatic romajikana converter. A leading\&\&/\&\&indicates that
romaji is to follow. Try typing\c
.Q /tokyo
and you'll see it convert to\c
.Q /\&Ȥ
as you type. When you hit return,
.I lookup
will list all lines that have a\&\&Ȥ\&\&somewhere in them. Well, sort
of. Look carefully at the lines which match. Among them (if you had
case folding back on) you'll see:
.nf
ꥹȶ [ꥹȤ礦] /Christianity/
[Ȥ礦] /Toukyou (pl)/Tokyo/current capital of Japan/
̶ [Ȥä礦] /convex lens/
.fi
The first one has\&\&Ȥ\&\&in it (as\&\&Ȥ\&\&,
where the katakana\&\&\&\&matches in a case-insensitive
manner to the hiragana\&\&\&\&), but you
might consider the others unexpected, since they don't
have\c
.Q Ȥ
in them.
They're close (\&\&Ȥ\&\&and\&\&Ȥä\&\&),
but not exact. This is the result of
.IR lookup "'s\c"
.Q fuzzification "\&."
Try the command\c
.Q "\ fuzz"
(again, don't forget the command-introduction space).
You'll see that fuzzification is turned on. Turn it off with\c
.Q "\ fuzz off"
and try\c
.Q /tokyo
(which will convert as you type) again.
This time you only get the lines which have\&\&Ȥ\&\&exactly
(well, case folding is still on, so it might match katakana as well).
In a fuzzy search, length of vowels is ignored --\&\&\&\&is
considered the same as\&\&Ȥ\&\&, for example. Also, the
presence or absence of any\&\&\&\&character is ignored, and the
pairs , , , and are considered identical in a
fuzzy search.
It might be convenient to consider a fuzzy search to be a\c
.Q "pronunciation search" ". "
Special note: fuzzification will not be performed if a regular expression\c
.Q "*" ,
.Q "+" ,
or\c
.Q "?"
modifies a non-ASCII character. This is not an issue when input patterns
are filename-like wildcard patterns (discussed below).
In addition to kana fuzziness, there's one special case for kanji when
fuzziness is on. The kanji repeater mark\c
.Q ""
will be recognized such that\c
.Q ""
and\c
.Q ""
will match each-other.
Turn fuzzification back on (\&\&fuzz on\&\&), and search for all
.I "whole words"
which sound like\&\&tokyo\&\&. That search would be specified as:
.nf
search [edict]> /<tokyo>
.fi
(again, the\c
.Q tokyo
will be converted to\c
.Q Ȥ
as you type).
My copy of
.I edict
has the three lines
.nf
[Ȥ礦] /Toukyou (pl)/Tokyo/current capital of Japan/
õ [Ȥä] /special permission/patent/
̶ [Ȥä礦] /convex lens/
.fi
This kind of whole-word romaji-to-kana search is so common, there's a
special short cut. Instead of typing\&\&/<tokyo>\&\&, you can
type\c
.Q [tokyo] ". "
The leading\&\&[\&\&means\&\&start romaji\&\&\c
.I and\c
.Q "start of word" ". "
Were you to type\c
.Q <tokyo>
instead (without a
leading\&\&/\&\&or\&\&[\&\&to indicate romaji-to-kana conversion), you would
get all lines with the
.I English
whole-word\c
.Q tokyo
in them.
That would be a reasonable request as well, but not what we want at the moment.
Besides the kana conversion, you can use any cut-and-paste that your
windowing system might provide to get Japanese text onto the search
line. Cut\c
.Q Ȥ
from somewhere and paste onto the search line. When
hitting enter to run the search, you'll notice that it is done without
fuzzification (even if the fuzzification flag was\c
.Q on "). "
That's because
there's no leading\&\&/\&\&. Not only does a leading\&\&/\&\&ndicate that you
want the romaji-to-kana conversion, but that you want it done fuzzily.
So, if you'd like fuzzy cut-and-paste, just type a leading\&\&/\&\&efore
pasting (or go back and prepend one after pasting).
These examples have all been pretty simple, but you can use all the
power that regexes have to offer. As a slightly more complex example,
the search\c
.Q <gr[ea]y>
would look for all lines with
the words\c
.Q grey
or\c
.Q gray
in them. Since the\&\&[\&\&isn't the first character
of the line, it doesn't mean what was mentioned above (start-of-word romaji).
In this case, it's just the regular-expression\c
.Q class
indicator.
If you feel more comfortable using filename-like\c
.Q "*.txt"
wildcard patterns, you can use the\c
.Q "wildcard on"
command to have patterns be considered this way.
This has been a quick introduction to the basics of
.IR lookup .
It can be very powerful and much more complex. Below is a detailed
description of its various parts and features.
.SH READLINE INPUT
The actual keystrokes are read by a readline-ish package that is
pretty standard. In addition to just typing away, the following
keystrokes are available:
.nf
^B / ^F move left/right one character on the line
^A / ^E move to the start/end of the line
^H / ^G delete one character to the left/right of the cursor
^U / ^K delete all characters to the left/right of the cursor
^P / ^N previous/next lines on the history list
^L or ^R redraw the line
^D delete char under the cursor, or EOF if line is empty
^space force romaji conversion (^@ on some systems)
.fi
If automatic romaji-to-kana conversion is turned on (as it is by
default), there are certain situations where the conversion will be
done, as we saw above. Lower-case romaji will be converted to
hiragana, while upper-case romaji to katakana. This usually won't
matter, though, as case folding will treat hiragana and katakana the
same in the searches.
In exactly what situations the automatic conversion will be done is
intended to be rather intuitive once the basic idea is learned.
However, at
.IR "any time" ,
one can use control-space to convert the ASCII to the left of the
cursor to kana. This can be particularly useful when needing to enter
kana on a command line (where auto conversion is never done; see below)
.SH ROMAJI FLAVOR
Most flavors of romaji are recognized. Special or non-obvious items are
mentioned below. Lowercase are converted to hiragana, uppercase to katakana.
Long vowels can be entered by repeating the vowel, or with\&\&-\&\&or\&\&^\&\&.
In situations where an\&\&n\&\&could be vague, as
in\&\&na\&\&being or \&, use a single quote to force \&.
Therefore,\&kenichi\&עˤ while\&ken'ichi\&ע\&.
The romaji has been richly extended with many non-standard
combinations such as դ or \&, which are represented in
intuitive ways:\&fa\&עդ\&,\&che\&ע\&. etc.
Various other mappings of interest:
.nf
wo we wi
VA VI VU VE VO
di dzi dya¤ dyu¤ dyo¤
du tzu dzu
(the following kana are all smaller versions of the regular kana)
xa xi xu xe xo
xu xtu xwa xka xke
xya xyu xyo
.fi
.SH INPUT SYNTAX
Any input line beginning with a space (or whichever character is set as
the command-introduction character) is processed as a command to
.I lookup
rather than a search spec.
.I Automatic
kana conversion is never done on these lines (but
.I forced
conversion with control-space may be done at any time).
Other lines are taken as search regular expressions, with the
following special cases:
.TP
?
A line consisting of a single question mark will report the current
command-introduction character (the default is a space, but can be
changed with the\c
.Q cmdchar
command).
.TP
=
If a line begins with\&\&=\&\&, the line (without the\&\&=\&\&) is taken as a
search regular expression, and no automatic (or internal -- see below)
kana conversion is done anywhere on the line (although again,
conversion can always be forced with control-space). This can be used
to initiate a search where the beginning of the regex is the
command-introduction character, or in certain situations where automatic kana
conversion is temporarily not desired.
.TP
/
A line beginning with\&\&/\&\&indicates romaji input for the whole line.
If automatic kana conversion is turned on, the conversion will be done
in real-time, as the romaji is typed. Otherwise it will be done
internally once the line is entered.
.IR Regardless ,
the presence of the leading\&\&/\&\&indicates that any kana (either
converted or cut-and-pasted in) should be\c
.Q fuzzified
if fuzzification is turned on.
As an addition to the above, if the line doesn't begin with\&\&=\&\&or the
command-introduction character (and automatic conversion is turned
on),\&\&/\&\&
.I anywhere
on the line initiates automatic conversion for the following word.
.TP
[
A line beginning with\&\&[\&\&is taken to be romaji (just as a line
beginning with\&\&/\&\&, and the converted romaji is subject to
fuzzification (if turned on). However, if\&\&[\&\&is used rather
than\&\&/\&\&, an implied\&\&<\&\&\c
.Q "beginning of word"
is prepended to the resulting
kana regex. Also, any ending\&\&]\&\&on such a line is converted to the\c
.Q "ending of word"
specifier\&\&>\&\&in the resulting regex.
.PP
In addition to the above, lines may have certain prefixes and suffixes
to control aspects of the search or command:
.TP
!
Various flags can be toggled for the duration of a particular search
by prepending a\c
.Q !!
sequence to the input line.
Sequences are shown below, along with commands related to each:
.nf
!F! Filtration is toggled for this line (filter)
!M! Modification is toggled for this line (modify)
!w! Word-preference mode is toggled for this line (word)
!c! Case folding is toggled for this line (fold)
!f! Fuzzification is toggled for this line (fuzz)
!W! Wildcard-pattern mode is toggled for this line (wildcard)
!r! Raw. Force fuzzification off for this line
!h! Highlighting is toggled for this line (highlight)
!t! Tagging is toggled for this line (tag)
!d! Displaying is on for this line (display)
.fi
The letters can be combined, as in\c
.Q "!cf!" .
The final\&\&!\&\& can be omitted if the first character
after the sequence is not an ASCII letter.
If no letters are given (\c
.Q !! ").\c"
.Q !f!
is the default.
These last two points can be conveniently combined in the common case of\c
.Q !/romaji
which would be the same as\c
.Q !f!/romaji ". "
The special sequence\c
.Q !?
lists the above, as well as indicates which are currently turned on.
Note that the letters accepted in a\c
.Q !!
sequence are many of the indicators shown by the\c
.Q files
command.
.TP
+
A\&\&+\&\&prepended to anything above will cause the final search
regex to be printed. This can be useful to see when and what kind of
fuzzification and/or internal kana conversion is happening. Consider:
.nf
search [edict]> +/狼
a match isȤ[]*?[]*[]*
.fi
Due to the\c
.Q leading "\&/\, "
the kana is fuzzified, which explains the
somewhat complex resulting regex. For comparison, note:
.nf
search [edict]> +狼
a match isȤ狼
search [edict]> +!/狼
a match isȤ狼
.fi
As the\&\&+\&\&shows, these are not fuzzified. The first one has no
leading\&\&/\&\&or\&\&[\&\&to induce fuzzification, while the second has
the\&\&!\&\&line prefix (which is the default version of\c
.Q !f! "), "
which toggles fuzzification mode to\c
.Q off
for that line.
.TP
\&,
The default of all searches and most commands is to work with the
first file loaded (\fIedict\fP in these examples). One can change this
default (see the\c
.Q select
command) or, by appending a comma+digit
sequence at the end of an input line, force that line to work with
another previously-loaded file. An appended\c
.Q ,1
works with first
extra file loaded (in these examples, \fIkanjidic\fP). An appended\c
.Q ,2
works with the 2nd extra file loaded, etc.
An appended\c
.Q ,0
works with the original first file (and can be useful
if the default file has been changed via the\c
.Q select
command).
The following sequence shows a common usage:
.nf
search [edict]> [Ȥ]
[Ȥ礦] /Tokyo Metropolitan area/
.fi
cutting and pasting the from above, and adding a\c
.Q ,1
to search
.IR kanjidic :
.nf
search [edict]> ,1
4554 N4769 S11 ..... ߤ䤳 {metropolis} {capital}
.fi
.SH FILENAME-LIKE WILDCARD MATCHING
When wildcard-pattern mode is selected, patterns are considered as
extended\
.Q "*.txt" "-like"
patterns. This is often more convenient for users not familiar with
regular expressions. To have this mode selected by default, put
.nf
default wildcard on
.fi
into your\c
.Q ".lookup"
file (see\c
.Q "STARTUP FILE"
below).
When wildcard mode is on, only \c
.Q "*" ,
.Q "?" ,
.Q "+" ,
and\c
.Q "." ,
are effected.
See the entry for the
.Q wildcard
command below for details.
Other features, such as the multiple-pattern searches (described below)
and other regular-expression metacharacters are available.
.SH MULTIPLE-PATTERN SEARCHES
You can put multiple patterns in a single search specifier.
For example consider
.nf
search [edict]> china||japan
.fi
The first part (\&\&china\&\&) will select all lines that have\c
.Q china
in them. Then,
.IR "from among those lines" ,
the second part will select lines that have\c
.Q japan
in them. The\c
.Q ||
is not part of any pattern -- it is
.IR lookup "'s\c"
.Q pipe
mechanism.
The above example is very different from the single pattern
\&\&china|japan\&\&which would select any line that
had either\&\&china\&\&\c
.I or\c
.Q japan ". "
With\c
.Q china||japan ", "
you get lines that have\c
.Q china
.I "and then also"
have\c
.Q japan
as well.
Note that it is also different from the regular expression\c
.Q china.*japan
(or the wildcard pattern\c
.Q china*japan ")"
which would select lines having\c
.Q "china, then maybe some stuff, then japan" ". "
But consider the case when\c
.Q japan
comes on the line before\c
.Q china .
Just for your comparison, the multiple-pattern
specifier\&\&china||japan\&\&is pretty
much the same as the single regular
expression\&\&china.*japan|japan.*china\&\&.
If you use\&\&|!|\&\&instead of\&\&||\&\&,
it will mean\&\&...and then lines
.I not
matching...\&\&.
Consider a way to find all lines of
.I kanjidic
that do have a Halpern number, but don't have a Nelson number:
.nf
search [edict]> <H\\d+>|!|<N\\d+>
.fi
If you then wanted to restrict the listing to those that
.I also
had a\&\&jinmeiyou\&\&marking (\fIkanjidic\fP's\&\&G9\&\&field)
and had a reading of , you could make it:
.nf
search [edict]> <H\\d+>|!|<N\\d+>||<G9>||<>
A prepended+would explain:
a match is<H\\d+>
and not<N\\d+>
and<G9>
and<>
.fi
The\&\&|!|\&\&and\&\&||\&\&can be used to make up to ten
separate regular expressions in any one search specification.
Again, it is important to stress that\&\&||\&\&does not
mean\&\&or\&\&(as it does in a C program,
or as\&\&|\&\&does within a regular expression).
You might find it convenient to read\&\&||\&\&as\&\&\fIand\fP also\&\&,
while reading\&\&|!|\&\&as\&\&but \fInot\fP\&\&.
It is also important to stress that any whitespace around the\c
.Q ||
and\c
.Q |!|
construct is
.I not
ignored, but kept as part of the regex on either side.
.SH COMBINATION SLOTS
Each file, when loaded, is assigned to a\c
.Q slot
via which subsequent references to the file are then made.
The slot may then be searched, have filters and flags set, etc.
A special kind of slot, called a\c
.Q "combination slot" ,
rather than representing a single file, can represent multiple
previously-loaded slots. Searches against a combination slot
(or\c
.Q "combo slot"
for short) search all those previously-loaded slots associated with it
(called\c
.Q "component slots" "). "
Combo slots are set up with the
.I combine
command.
A Combo slot has no filter or modify spec, but can have a local prompt
and flags just like normal file slots. The flags, however, have
special meanings with combo slots. Most combo-slot flags act as a mask
against the component-slot flags; when acted upon as a member of the
combo, a component-slot's flag will be disabled if the corresponding
combo-slot's flag is disabled.
Exceptions to this are the
.IR autokana ,
.IR fuzz ,
and
.I tag
flags.
The
.I autokana
and
.I fuzz
flags governs a combo slot exactly the same as a regular file slot.
When a slot is searched as a component of a combination slot, the
component slot's
.I fuzz
(and
.IR autokana )
flags, or lack thereof, are ignored.
The
.I tag
flag is quite different altogether; see the
.I tag
command for complete information.
Consider the following output from the
.I files
command:
.nf
0F wcfh da I 2762k/usr/jfriedl/lib/edict
1FM cf da I 705k/usr/jfriedl/lib/kanjidic
2F cfh@da 1k/usr/jfriedl/lib/local.words
*3FM cfhtda combokotoba (#2, #0)
.fi
See the discussion of the
.I files
command below for basic explanation of the output.
As can be seen, slot #3 is a
.I "combination slot"
with the name\c
.Q kotoba
with
.I "component slots"
two and zero. When a search is initiated on this slot, first slot #2\c
.Q "local.words"
will be searched, then slot #0\c
.Q edict ". "
Because the combo slot's
.I filter
flag is
.IR on ,
the component slots'
.I filter
flag will remain on during the search.
The combo slot's
.I word
flag is
.IR off ,
however, so slot #0's
.I word
flag will be forced off during the search.
See the
.I combine
command for information about creating combo slots.
.SH PAGER
.I Lookup
has a built in pager (a'la \fImore\fP). Upon filling a screen with
text, the string
.nf
--MORE [space,return,c,q]--
.fi
is shown. A space will allow another screen of text; a return will allow
one more line. A\&\&c\&\& will allow output text to continue unpaged until
the next command. A\&\&q\&\& will flush output of the current command.
If supported by the OS,
.I lookup's
idea of the screen size is automatically set upon startup and window resize.
.I Lookup
must know the width of the screen in doing both the horizontal
input-line scrolling, and for knowing when a long line wraps on the screen.
The pager parameters can be set manually with the\c
.Q pager
command.
.SH COMMANDS
Any line intended to be a command must begin with the
command-introduction character (the default is a space, but can be set
via the\&\&cmdchar\&\&command). However, that character is not part of
the command itself and won't be shown in the following list of
commands.
There are a number of commands that work with the
.I "selected file"
or
.I "selected slot"
(both meaning the same thing).
The selected file is the one indicated by an appended comma+digit, as
mentioned above. If no such indication is given, the default
.I "selected file"
is used (usually the first file loaded, but can be changed with
the\&\&select\&\&command).
Some commands accept a
.I boolean
argument, such as to turn a flag on or off. In all such cases,
a\&\&1\&\&or\&\&on\&\&means to turn the flag on,
while a\&\&0\&\&or\&\&off\&\&is used to
turn it off. Some flags are per-file
(\&\&fuzz\&\&,\&\&fold\&\&, etc.), and a
command to set such a flag
normally sets the flag for the selected file only. However, the
default value inherited by subsequently loaded files can be set
by prepending\c
.Q default
to the command. This is particularly useful in the startup file
before any files are loaded (see the section STARTUP FILE).
Items separated by\&\&|\&\&are mutually exclusive possibilities (i.e. a
boolean argument is\&\&1|on|0|off\&\&).
Items shown in brackets (\&[\&and\&\&]\&\&)
are optional. All commands that
accept a boolean argument to set a flag or mode do so optionally --
with no argument the command will report the current status of the
mode or flag.
Any command that allows an argument in quotes (such as load, etc.)
allow the use of single or double quotes.
.PP
The commands:
.br
.so c_autokana.so
.so c_clear.so
.so c_cmdchar.so
.so c_combine.so
.so c_cmd_debug.so
.so c_debug.so
.so c_describe.so
.so c_encoding.so
.so c_files.so
.so c_filter.so
.so c_fold.so
.so c_fuzz.so
.so c_help.so
.so c_highlight.so
.so c_if.so
.so c_in_code.so
.so c_limit.so
.so c_log.so
.so c_load.so
.so c_modify.so
.so c_msg.so
.so c_out_code.so
.so c_pager.so
.so c_prompt.so
.so c_rdebug.so
.so c_list_size.so
.so c_select.so
.so c_show.so
.so c_source.so
.so c_spinner.so
.so c_stats.so
.so c_tag.so
.so c_verbose.so
.so c_version.so
.so c_wild.so
.so c_word.so
.so c_quit.so
.SH STARTUP FILE
If the file\c
.Q ~/.lookup
is present, commands are read from it during
.I lookup
startup.
The file is read in the same way as the
.I source
command reads files (see that entry for more information on file
format, etc.)
However, if there had been files loaded via command-line arguments,
commands within the startup file to load files (and their associated
commands such as to set per-file flags) are ignored.
Similarly, any use of the command-line flags -euc, -jis, or -sjis
will disable in the startup file the commands dealing with setting the
input and/or output encodings.
The special treatment mentioned in the above two paragraphs only applies
to commands within the startup file itself, and does not apply to commands
in command-files that might be
.IR source d
from within the startup file.
The following is a reasonable example of a startup file:
.nf
## turn verbose mode off during startup file processing
verbose off
prompt "%C([%#]%0)%!C(%w'*'%!f'raw '%n)> "
spinner 200
pager on
## The filter for edict will hit for entries that
## have only one English part, and that English part
## having a pl or pn designation.
load ~/lib/edict
filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
highlight on
word on
## The filter for kanjidic will hit for entries without a
## frequency-of-use number. The modify spec will remove
## fields with the named initial code (U,N,Q,M,E, and Y)
load ~/lib/kanjidic
filter "uncommon" !/<F\\d+>/
modify /( [UNQMEY]\S+)+//g
## Use the same filter for my local word file,
## but turn off by default.
load ~/lib/local.words
filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
filter off
highlight on
word on
## Want a tag for my local words, but only when
## accessed via the combo below
tag off ""
combine "words" 2 0
select words
## turn verbosity back on for interactive use.
verbose on
.fi
.SH "COMMAND-LINE ARGUMENTS"
With the use of a startup file, command-line arguments are rarely needed.
In practical use, they are only needed to create an index file, as in:
.nf
lookup -write \fItextfile\fP
.fi
Any command line arguments that aren't flags are taken to be files
which are loaded in turn during startup.
In this case, any\&\&load\&\&,\&\&filter\&\&, etc.
commands in the startup file are ignored.
The following flags are supported:
.TP
\-help\ \ \
Reports a short help message and exits.
.TP
\-write\ \ \
Creates index files for the named files and exits. No
.I "startup file"
is read.
.TP
\-euc\ \ \
Sets the input and output encoding method to EUC (currently the default).
Exactly the same as the\&\&encoding euc\&\&command.
.TP
\-jis\ \ \
Sets the input and output encoding method to JIS.
Exactly the same as the\&\&encoding jis\&\&command.
.TP
\-sjis\ \ \
Sets the input and output encoding method to Shift-JIS.
Exactly the same as the\&\&encoding sjis\&\&command.
.TP
\-v \-version
Prints the version string and exits.
.TP
\-norc\ \ \
.br
Indicates that the startup file should not be read.
.TP
\-rc \fIfile\fP
The named file is used as the startup file, rather than the
default\c
.Q "~/.lookup" ". "
It is an error for the file not to exist.
.TP
-percent \fInum\fP
.br
When an index is built, letters that appear on more than
.I num
percent (default 50) of the lines are elided from the index. The
thought is that if a search will have to check most of the lines in a
file anyway, one may as well save the large amount of space in the
index file needed to represent that information, and the time/space
tradeoff shifts, as the indexing of oft-occurring letters provides a
diminishing return.
Smaller indexes can be made by using a smaller number.
.TP
\-noindex
.br
Indicates that any files loaded via the command line should
not be loaded with any precomputed index, but recalculated on the fly.
.TP
\-verbose
.br
Has metric tons of stats spewed whenever an index is created.
.TP
\-port ###
For the (undocumented) server configuration only, tells which port to
listen on.
.SH OPERATING SYSTEM CONSIDERATIONS
I/O primitives and behaviors vary with the operating system. On my
operating system, I can\&\&read\&\&a file by mapping it into memory, which
is a pretty much instant procedure regardless of the size of the file.
When I later access that memory, the appropriate sections of the file
are automatically read into memory by the operating system as needed.
This results in
.I lookup
starting up and presenting a prompt very quickly, but causes the first
few searches that need to check a lot of lines in the file to go more
slowly (as lots of the file will need to be read in). However, once
the bulk of the file is in, searches will go very fast. The win here is
that the rather long file-load times are amortized over the first few
(or few dozen, depending upon the situation) searches rather than always
faced right at command startup time.
On the other hand, on an operating system without the mapping ability,
.I lookup
would start up very slowly as all the files and indexes are read into memory,
but would then search quickly from the beginning, all the file already
having been read.
To get around the slow startup, particularly when many files are loaded,
.I lookup
uses
.I "lazy loading"
if it can: a file is not actually read into memory at the time the
.I load
command is given. Rather, it will be read when first actually accessed.
Furthermore, files are loaded while
.I lookup
is idle, such as when waiting for user input. See the
.I files
command for more information.
.SH REGULAR EXPRESSIONS, A BRIEF TUTORIAL
.so regex.so
.SH BUGS
Needs full support for half-width katakana and JIS X 0212-1990.
.br
Non-EUC (JIS & SJIS) items not tested well.
.br
Probably won't work on non-UNIX systems.
.br
Screen control codes (for clear and highlight commands) are hard-coded
for ANSI/VT100/kterm.
.SH AUTHOR
Jeffrey Friedl (jfriedl@nff.ncl.omron.co.jp)
.SH INFO
Jim Breen's text files
.I edict
and
.I kanjidic
and their documentation can be found in\c
.Q pub/nihongo
on ftp.cc.monash.edu.au (130.194.1.106
Information on input and output encoding and codes can be found in
Ken Lunde's
.I "Understanding Japanese Information Processing"
(\&ܸ\&) published by O'Reilly and Associates.
ISBN 1-56592-043-0. There is also a Japanese edition published
by SoftBank.
A program to convert files among the various encoding methods is
Dr. Ken Lunde's\c
.IR jconv ,
which can also be found on ftp.cc.monash.edu.au.
.I Jconv
is also useful for converting halfwidth katakana (which
.I lookup
doesn't yet support well) to full-width.
|