1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129
|
.\" Copyright (c) 2004 William S\&. Yerazunis\&. Manpage typesetting by Joost van Baal and Shalendra Chhabra
.TH "crm" 1 "19 Aug 2004" "crm114 20040816\&.BlameClockworkOrange-auto\&.3" "CRM114"
.po 2m
.de ZI
.\" Zoem Indent/Itemize macro I.
.br
'in +\\$1
.nr xa 0
.nr xa -\\$1
.nr xb \\$1
.nr xb -\\w'\\$2'
\h'|\\n(xau'\\$2\h'\\n(xbu'\\
..
.de ZJ
.br
.\" Zoem Indent/Itemize macro II.
'in +\\$1
'in +\\$2
.nr xa 0
.nr xa -\\$2
.nr xa -\\w'\\$3'
.nr xb \\$2
\h'|\\n(xau'\\$3\h'\\n(xbu'\\
..
.if n .ll -2m
.am SH
.ie n .in 4m
.el .in 8m
..
.SH NAME
crm \- The Controllable Regex Mutilator
.SH SYNOPSIS
.B crm
[\fIOPTION\fR]... \fICRMFILE\fR
.SH WARNING
This man page is taken from an older CRM114 version. It is provided as a
convenience to Debian users and may not be up-to-date. If you would like to
update it, please send appropriate patches to the Debian bug tracking system.
.SH OPTIONS
.PP
\fB-d\fP N (\fIenter debugger after running N cycles\&. Omitting N means N equals 0\&.\fP)\fB\fP
.TP
\fB-e\fP (\fIdo not import any environment variables\fP)\fB\fP
.TP
\fB-h\fP (\fIprint help text\fP)\fB\fP
.TP
\fB-p\fP (\fIgenerate an execution-time-spent profile on exit\fP)\fB\fP
.TP
\fB-P\fP N (\fImax program lines\fP)\fB\fP
.TP
\fB-q\fP m (\fImathmode (0,1 = alg/RPN only in EVAL, 2,3 = alg/RPN everywhere)\fP)\fB\fP
.TP
\fB-s\fP N (\fInew feature file (\&.css) size is N (default 1 meg+1 featureslots)\fP)\fB\fP
.TP
\fB-S\fP N (\fInew feature file (\&.css) size is N rounded to 2^I+1 featureslots\fP)\fB\fP
.TP
\fB-t\fP (\fIuser trace output\fP)\fB\fP
.TP
\fB-T\fP (\fIimplementors trace output (only for the masochistic!)\fP)\fB\fP
.TP
\fB-u\fP dir (\fIchdir to directory dir before starting execution\fP)\fB\fP
.TP
\fB-v\fP (\fIprint CRM114 version identification and exit\fP)\fB\fP
.TP
\fB-w\fP N (\fImax data window (bytes, default 16 megs)\fP)\fB\fP
.TP
\fB--\fP (\fIsignals the end CRM114 flags; prior flags are not seen by the user program; subsequent args are not processed by CRM114\fP)\fB\fP
.TP
\fB--foo\fP (\fIcreates the user variable :foo: with the value SET\fP)\fB\fP
.TP
\fB--x=y\fP (\fIcreates the user variable :x: with the value y\fP)\fB\fP
.TP
\fB-{\fP stmts} (\fIexecute the statements inside the {} brackets\fP)\fB\fP
.TP
\fBcrmfile\fP (\fI\&.crm file name\fP)
.SH DESCRIPTION
CRM114 is a language designed to write filters in\&. It caters to
filtering email, system log streams, html, and other marginally
human-readable ASCII that may occasion to grace your computer\&.
CRM114\&'s unique strengths are the data structure (everything is
a string and a string can overlap another string), it\&'s ability
to work on truly infinitely long input streams, it\&'s ability to
use extremely advanced classifiers to sort text, and the ability
to do approximate regular expressions (that is, regexes that
don\&'t quite match) via the TRE regex library\&.
CRM114 also sports a very powerful subprocess control facility, and
a unique syntax and program structure that puts the fun back in
programming (OK, you can run away screaming now)\&. The syntax is
declensional rather than positional; the type of quote marks around
an argument determine what that argument will be used for\&.
The typical CRM114 program uses regex operations more often
than addition (in fact, math was only added to TRE in the waning
days of 2003, well after CRM114 had been in daily use for over
a year and a half)\&.
In other words, crm114 is a very \fBvery\fP powerful mutagenic filter that
happens to be a programming language as well\&.
The filtering style of the CRM-114 discriminator is based on the fact
that most spam, normal log file messages, or other uninteresting data
is easily categorized by a few characteristic patterns (such as
"Mortgage leads", "advertise on the internet", and "mail-order toner
cartridges"\&.) CRM114 may also be useful to folks who are on multiple
interlocking mailing lists\&.
In a bow to Unix-style flexibility, by default CRM114 reads it\&'s
input from standard input, and by default sends it\&'s output to
standard output\&. Note that the default action has a zero-length
output\&. Redirection and use of other input or output files is
possible, as well as the use of windowing, either delimiter-based or
time-based, for real-time continuous applications\&.
CRM114 can be used for other than mail filtering; consider it to be a version
of \fIgrep\fP with super powers\&. If perl is a seventy-bladed swiss army knife,
CRM114 is a razor-sharp katana that can talk\&.
.SH INVOCATION
Absent the -{ program } flag, the first argument is taken to be the name of
a file containing a crm114 program, subsequent arguments are merely supplied
as :_argN: values\&. Use single quotes around commandline programs
\&'-{ like this }\&' to prevent the shell from doing odd things to your
command-line programs\&.
CRM114 can be directly invoked by the shell if the first line of your
program file uses the shell standard, as in:
.di ZV
.in 0
.nf \fC
#! /usr/bin/crm
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
You can use CRM114 flags on the shell-standard invocation line, and
hide them with \&'--\&' from the program itself; \&'--\&' incidentally prevents
the invoking user from changing any CRM114 invocation flags\&.
Flags should be located after any positional variables on the command
line\&. Flags \fIare\fP visible as :_argN: variables, so you can create
your own flags for your own programs (separate CRM114 and user flags
with \&'--\&')\&.
Two examples on how to do this:
.di ZV
.in 0
.nf \fC
\&./foo\&.crm bar mugga < baz -t -w 150000
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
.di ZV
.in 0
.nf \fC
\&./foo\&.crm -t -w 1500000 -- bar < baz mugga
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
One example on how \fBnot\fP to do this:
.di ZV
.in 0
.nf \fC
\&./foo\&.crm -t -w 150000 bar < baz mugga
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
(That\&'s WRONG!)
You can put a list of user-settable vars on the \fC#!/usr/bin/crm\fR
invocation line\&. CRM114 will print these out when a program is
invoked directly (e\&.g\&. "\&./myprog\&.crm -h", not "crm myprog\&.crm -h")
with the -h (for help) flag\&. (note that this works ONLY on bash
on Linux- *BSD\&'s have a different bash interpretation and this
doesn\&'t work)
Example:
.di ZV
.in 0
.nf \fC
#!/usr/bin/crm -( var1 var2=A var2=B var2=C )
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
This allows only \fCvar1\fR and \fCvar2\fR be set on the command line\&. If a
variable is not assigned a value, the user can set any value desired\&. If the
variable is equated to a set of values, those are the \fIonly\fP values allowed\&.
Another example:
.di ZV
.in 0
.nf \fC
#!/usr/bin/crm -( var1 var2=foo ) --
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
This allows \fCvar1\fR to
be set to any value, \fCvar2\fR may only be set to either \fCfoo\fR or not at all,
and no other variables may be set nor may invocation flags be changed (because
of the trailing "--")\&. Since "--" also blocks \&'-h\&' for help, such programs
should provide their own help facility\&.
.SH VARIABLES
Variable names and locations start with a : , end with a : , and may
contain only characters that have ink (i\&.e\&. the [:graph:] class) with
few exceptions\&.
Examples \fC:here:\fR, \fC:ThErE:\fR, \fC:every-where_0123+45%6789:\fR,
\fC:this_is_a_very_very_long_var_name_that_does_not_tell_us_much:\fR\&.
Builtin variables:
.ZI 21m ":_nl:"
newline
.in -21m
.ZI 21m ":_ht:"
horizontal tab
.in -21m
.ZI 21m ":_bs:"
backspace
.in -21m
.ZI 21m ":_sl:"
a slash
.in -21m
.ZI 21m ":_sc:"
a semicolon
.in -21m
.ZI 21m ":_arg0: thru :_argN:"
command-line args, including \fIall\fP flags
.in -21m
.ZI 21m ":_argc:"
how many command line arguments there were
.in -21m
.ZI 21m ":_pos0: thru :_posN:"
positional args (\&'-\&' or \&'--\&' args deleted)
.in -21m
.ZI 21m ":_posc:"
how many positional arguments there were
.in -21m
.ZI 21m ":_pos_str:"
all positional arguments concatented
.in -21m
.ZI 21m ":_env_whatever:"
environment value \&'whatever\&'
.in -21m
.ZI 21m ":_env_string:"
all environmental arguments concatenated
.in -21m
.ZI 21m ":_crm_version:"
the version of the CRM system
.in -21m
.ZI 21m ":_dw:"
the current data window contents
.in -21m
.SH VARIABLE EXPANSION
Variables are expanded by the \fC:*:\fR var-expansion operator,
e\&.g\&. \fC:*:_nl:\fR expands to a newline character\&. Uninitialized vars
evaluate to their text name (and the colons stay)\&.
You can also use the standard constant C \&'\e\&' characters, such as "\en"
for newline, as well as excaped hexadecimal and octal characters like
\exHH and \eoOOO but these are constants, not variables, and cannot be
redefined\&.
Depending on the value of "math mode" (flag -q)\&. you can also use
\fC:#:string_or_var:\fR to get the length of a string, and \fC:@:string_or_var:\fR
to do basic mathematics and inequality testing, either only in EVALs
or for all var-expanded expressions\&. See "Sequence of Evaluation"
below for more details\&.
.SH PROGRAM BEHAVIOR
Default behavior is to read all of standard input till EOF into the
default data window (named \fC:_dw:\fR), then execute the program (this is
overridden if first executable statement is a WINDOW statement)\&.
Variables don\&'t get their own storage unless you ISOLATE them (see
below), instead variables are start/length pairs indexing into the
default data window\&. Thus, ALTERing an unISOLATEd variable changes
the value of the default data buffer itself\&. This is a great power,
so use it only for good, and never for evil\&.
.SH STATEMENTS AND STUFF
Statements are separated with a \&';\&' or with a newline\&.
.ZI 8m "\e"
\&
.br
\&'\e\&' is the string-text escape character\&. You only \fIneed\fP to
escape the literal representation of closing delimiters inside var-expanded
arguments\&.
You can use the classic C/C++ \e-escapes, such as \en, \er,
\et, \ea, \eb, \ev, \ef, \e0, and also \exHH and \eoOOO for hex and
octal characters, respectively\&.
A \&'\e\&' as the \fIlast\fP character of a line means the next line
is just a continuation of this one\&.
A \e-escape that isn\&'t recognized as something special isn\&'t
an error; you may \fIoptionally\fP escape any of the delimiters
\fC>\fR, \fC)\fR \fC]\fR \fC}\fR \fC;\fR \fC/\fR \fC#\fR \fC\e\fR and get just
that character\&.
A \&'\e\&' anywhere else is just a literal backslash, so the regex
([abc])\e1 is written just that way; there is no need to
double-backslash the \e1 (although it will work if you do)\&.
.in -8m
.ZI 8m "# this is a comment"
\&
'in -8m
.ZI 8m "# and this too \e#"
\&
'in -8m
'in +8m
\&
.br
A comment is not a piece of preprocessor
sugar -- it is a \fIstatement\fP and ends at the newline or at "\e#"\&.
.in -8m
.ZI 8m "insert filename"
\&
.br
inserts the file verbatim at this line at compile
time\&.
.in -8m
.ZI 8m ";"
\&
.br
statement separator - must ALWAYS be escaped as \e; unless it\&'s
inside delimiters or else it will mark the end of the statement\&.
.in -8m
.ZI 8m "{ and }"
\&
.br
start and end blocks of statements\&. Must always be \&'\e\&'
escaped or inside delimiters or these will mark the start/end of a
block\&.
.in -8m
.ZI 8m "noop"
\&
.br
no-op statement
.in -8m
.ZI 8m ":label:"
\&
.br
define a GOTOable label
.in -8m
.ZI 8m "accept"
\&
.br
writes the current data window to standard output; execution
continues\&.
.in -8m
.ZI 8m "alius"
\&
.br
if the last bracket-group succeeded, ALIUS skips to end of {}
block (a skip, not a FAIL); if the prior group FAILed, ALIUS does
nothing\&. Thus, ALIUS is both an ELSE clause and a CASE statement\&.
.in -8m
.ZI 8m "alter (:var:) /new-val/"
\&
.br
destructively change value of var to newval;
(:var:) is var to change (var-expanded); /new-val/ is value to change
to (var-expanded)\&.
.in -8m
.ZI 8m "classify <flags> (:c1:\&.\&.\&.|\&.\&.\&.:cN:) (:stats:) [:in:] /word-pat/"
\&
.br
compare the statistics of the current data window buffer with classfiles
c1\&.\&.\&.cN\&.
.ZI 17m "<flags>"
If <flags> is set to <nocase>, ignore case in
word-pat, does not change case in hash (use tr() to do that on
:in: if you want it)\&.
.in -17m
.ZI 17m "(:c1: \&.\&.\&."
file or files to consider "success" files\&. The
CLASSIFY succeeds if these files as a group match best\&. If not,
the CLASSIFY does a FAIL\&.
.in -17m
.ZI 17m "|"
optional separator\&. Spaces on each side of the " | " are
required\&.
.in -17m
.ZI 17m "\&.\&.\&.\&. :cN:)"
optional files to the right of " | " are considered
as a group to "fail"\&. If statement fails, execution skips to end
of enclosing {\&.\&.} block, which exits with a FAIL status (see
ALIUS for why this is useful)\&.
.in -17m
.ZI 17m "(:stats:)"
optional var that will get a text formatted matching
summary
.in -17m
.ZI 17m "[:in:]"
restrict statistical measure to the string inside :in:
.in -17m
.ZI 17m "/word-pat/"
regex to describe what a parseable word is\&.
.in -17m
.in -8m
.ZI 8m "eval (:result:) /instring/"
\&
.br
repeatedly evaluates /instring/ until it
ceases to change, then places that result
as the value of :result: \&. EVAL uses
smart (but foolable) heuristics to avoid
infinite loops, like evaluating a string
that evaluates to a request to evaluate
itself again\&. The error rate is about
1 / 2^62 and will detect chain groups of
length 255 or less\&.
If the instring uses math evaluation
(see section below on math operations)
and the evaluation has an inequality
test, (>, < or =) then if the inequality
fails, the EVAL will FAIL to the end of
block\&. If the evaluation has a numeric
fault (e\&.g\&. divide-by-zero) the EVAL will
do a TRAPpable FAULT\&.
.in -8m
.ZI 8m "exit /:retval:/"
\&
.br
ends program execution\&. If supplied, the
return value is converted to an integer
and returned as the exit code of the
crm114 program\&. If no retval is supplied,
the return value is 0\&.
.in -8m
.ZI 8m "fail"
\&
.br
skips down to end of the current { } block
and causes that block to exit with a FAIL
status (see ALIUS for why this is useful)
.in -8m
.ZI 8m "fault /faultstr/"
\&
.br
forces a FAULT with the given string as
the reason\&. The fault string is
val-expanded\&.
.in -8m
.ZI 8m "goto /:label:/"
\&
.br
unconditional branch (you can use a variable as the
goal, e\&.g\&. /:*:there:/ )
.in -8m
.ZI 8m "hash (:result:) /input/"
\&
.br
compute a fast 32-bit hash of the /input/,
and ALTER :result: to the
hexadecimal hash value\&. HASH is
\fInot\fP warranted to be constant across
major releases of CRM114, nor is it
cryptographically secure\&.
.ZI 17m "(:result:)"
value that gets result\&.
.in -17m
.ZI 17m "/input/"
string to be hashed (can contain expanded :*:vars:,
defaults to the data window :_dw:)
.in -17m
.in -8m
.ZI 8m "intersect (:out:) [:var1: :var2: \&.\&.\&.]"
\&
.br
makes :out: contain the part of
the data window that is the intersection of
:var1 :var2: \&.\&.\&. ISOLATEd vars are ignored\&.
This only resets the value of the captured
variable, and does NOT alter any text in
the data window\&.
.in -8m
.ZI 8m "isolate (:var:) /initial-value/"
\&
.br
puts :var: into a data area outside of
the data buffer; subsequent changes to this
var don\&'t change the data buffer (though
they may change the value of any var
subsequently set inside of this var)\&.
If the var already was ISOLATED, this is
a noop\&.
.ZI 17m "(:var:)"
name of ISOLATEd var (var-expanded)
.in -17m
.ZI 17m "/initial-value/"
optional initial value for :var:
(var-expanded)\&. If no value is supplied,
the previous value is retained/copied\&.
.in -17m
.in -8m
.ZI 8m "input <flags> (:result:) [:filename:]"
\&
.br
read in the content of filename\&.
If no filename, then read stdin
.ZI 17m "<byline>"
read one line only
.in -17m
.ZI 17m "(:result:)"
var that gets the input value
.in -17m
.ZI 17m "[:filename:]"
the file to read
.in -17m
.in -8m
.ZI 8m "learn <flags> (:class:) [:in:] /word-pat/"
\&
.br
learn the statistics of
the :in: var (or the input window if no
var) as an example of class :class:
.ZI 17m "<flags>"
can be any of <nocase>, <refute> and <microgroom>\&.
<nocase>: ignore case in matching word-pat (does not ignore case in
hash- use tr() to do that on :in: if you want it)\&. <refute>: this is
an anti-example of this class- unlearn it! <microgroom>: enable the
microgroomer to purge less-important information automatically
whenever the statistics file gets to crowded\&.
.in -17m
.ZI 17m "(:class:)"
name of file holding hashed results; nominal file
extension is \&.css
.in -17m
.ZI 17m "[:in:]"
captured var containing the text to be learned (if
omitted, the full contents of the data window is used)
.in -17m
.ZI 17m "/word-pat/"
regex that defines a "word"\&. Things that aren\&'t
"words" are ignored\&.
.in -17m
.in -8m
.ZI 8m "liaf"
\&
.br
skips UP to START of the current {} block (LIAF is FAIL
spelled backwards)
.in -8m
.ZI 8m "match <flags> (:var1: \&.\&.\&.) [:in:] /regex/"
\&
.br
Attempt to match the given
regex; if match succeds, variables are bound; if match fails, program
skips to the closing \&'}\&' of this block
.ZI 17m "<flags>"
flags can be any of
.ZI 3m "<absent>"
statement succeeds if match not present
.in -3m
.ZI 3m "<nocase>"
ignore case when matching
.in -3m
.ZI 3m "<fromstart>"
start match at start of the [:in:] var
.in -3m
.ZI 3m "<fromcurrent>"
start match at start of previous successful
match on the [:in:] var
.in -3m
.ZI 3m "<fromnext>"
start match at one character past the start of
the previous successful match on the [:in:] var
.in -3m
.ZI 3m "<fromend>"
start match at one character past the end of
prev\&. match on this [:in:] var
.in -3m
.ZI 3m "<newend>"
require match to end after end of prev\&. match on
this [:in:] var
.in -3m
.ZI 3m "<backwards>"
search backward in the [:in:] variable from the
last successful match\&.
.in -3m
.ZI 3m "<nomultiline>"
don\&'t allow this match to span lines
.in -3m
.in -17m
.ZI 17m "(:var1: \&.\&.\&.)"
optional variables to bind to regex result and
\&'(\&' \&')\&' subregexes
.in -17m
.ZI 17m "[:in:]"
search only in the variable specified; if omitted, :_dw:
(the full input data window) is used
.in -17m
.ZI 17m "/regex/"
POSIX regex (with \e escapes as needed)
.in -17m
If you build CRM114 to use the GNU regex library for MATCHing,
be warned that GNU REGEX has numerous issues\&. See the
KNOWN_BUGS file for a detailed listing\&.
.in -8m
.ZI 8m "output <flags> [filename] /output-text/"
\&
.br
output an arbitrary string
with captured values expanded\&.
.ZI 17m "<flags>"
<append>: append to the file (otherwise, overwrites)
.in -17m
.ZI 17m "[filename]"
filename to send output (var-expanded), default output
is to stdout
.in -17m
.ZI 17m "/output-text/"
string to output (var-expanded)
.in -17m
.in -8m
.ZI 8m "syscall <flags> (:in:) (:out:) (:status:) /command/"
\&
.br
execute a shell
command
.ZI 17m "<flags>"
can be any of <keep> and <async>\&. <keep>: keep this
process around; if kept, then a syscall with the same :keep: var will
continue feeding to and reading from the kept proc\&. <async>: don\&'t
wait for process to send an EOF; just grab what\&'s available in
the process\&'s output pipe and proceed (limit per syscall is 256 Kbytes)
.in -17m
.ZI 17m "(:in:)"
var-expanded string to feed to command as input (can be
null if you don\&'t want to send the process something\&.) You \fBmust\fP
specify this if you want to specify an :out: variable\&.
.in -17m
.ZI 17m "(:out:)"
var-expanded varname to place results into (\fBmust\fP
pre-exist, can be null if you don\&'t want to read the process\&'s
output (yet, or at all)\&. Limit per syscall is 256 Kbytes\&. You
\fBmust\fP specify this if you want to use the :status: variable)
.in -17m
.ZI 17m "(:status:)"
if you want to keep a minion proc around, or catch the
exit status of the process, specify a var here\&. The minion process\&'s
PID and pipes will be stored here\&. The program can access the proc
again with another syscall by using this var again\&. When the process
exits, it\&'s exit code will be stored here\&.
.in -17m
.in -8m
.ZI 8m "trap (:reason:) /trap_regex/"
\&
.br
traps faults from both FAULT statements
and program errors occurring anywhere in the preceding bracket-block\&. If
no fault exists, TRAP does a SKIP to end of block\&. If there is a fault
and the fault reason string matches the trap_regex, the fault is trapped,
and execution continues with the line after the TRAP, otherwise the fault
is passed up to the next surrounding trapped bracket block\&.
.ZI 17m "(:reason:)"
the fault message that caused this FAULT\&. If it was a
user fault, this is the text the user supplied in the FAULT statement\&.
.in -17m
.ZI 17m "/trap_regex/"
the regex that determines what kind of faults this
TRAP will accept\&. Putting a wildcard here (e\&.g\&. /\&.*/ means that ALL
faults will be trapped here\&.
.in -17m
.in -8m
.ZI 8m "union (:out:) [:var1: :var2: \&.\&.\&.]"
\&
.br
makes :out: contain the union of
the data window segments that contains var1, var2\&.\&.\&. plus any intervening
text as well\&. Any ISOLATEd var is ignored\&. This is non-surgical, and
does not alter the data window
.in -8m
.ZI 8m "window <flags> (:w-var:) (:s-var:) /cut-regex/ /add-regex/"
\&
.br
window
slider\&. This deletes to and including the cut-regex from :var: (default:
use the data window), then reads adds from std\&. input till add-regex
(inclusive)\&.
.ZI 17m "<flags>"
flags can be any of
.ZI 17m "<nocase>"
ignore case when matching cut- and add- regexes
.in -17m
.ZI 17m "<bychar>"
check input for add-regex every character
.in -17m
.ZI 17m "<byline>"
check input for add-regex every line
.in -17m
.ZI 17m "<byeof>"
wait for EOF to check for add-regex (extra characters
are kept around for later)
.in -17m
.ZI 17m "<eofends>"
read lots of input; the input is up to the regex
match OR the contents till EOF
.in -17m
.in -17m
.ZI 17m "(:w-var:)"
what var to window
.in -17m
.ZI 17m "(:s-var:)"
what var to use for source (defaults to stdin, if you
use a source var you \fBmust\fP specify the windowed var\&.
.in -17m
.ZI 17m "/cut-regex/"
var-expanded cut pattern
.in -17m
.ZI 17m "/add-regex/"
var-expanded add pattern, if absent reads till EOF
.in -17m
If both cut-regex and add-regex are omitted, and this window statement is
the \fIfirst executable\fP statement in the program, then CRM114 does
\fInot\fP wait to read a anything from standard input input before starting
program execution\&.
.in -8m
.SH A QUICK REGEX INTRO
A regex is a pattern match\&. Do a "man 7 regex" for details\&.
Matches are, by default "first starting point that matches, then
longest match possible that can fit"\&.
.ZI 8m "a through z"
\&
'in -8m
.ZI 8m "A through Z"
\&
'in -8m
.ZI 8m "0 through 9"
\&
'in -8m
'in +8m
\&
.br
all match themselves\&.
.in -8m
.ZI 8m "most punctuation"
\&
.br
matches itself, but check below!
.in -8m
.ZI 8m "*"
\&
.br
repeat preceding 0 or more times
.in -8m
.ZI 8m "+"
\&
.br
repeat preceding 1 or more times
.in -8m
.ZI 8m "?"
\&
.br
repeat preceding 0 or 1 time
.in -8m
.ZI 8m "*?, +?, ??"
\&
.br
repeat preceding, but \fIshortest\fP match that fits, given
the already-selected start point of the regex\&. (only
supported by TRE regex, not GNU regex)
.in -8m
.ZI 8m "[abcde]"
\&
.br
any one of the letters a, b, c, d, or e
.in -8m
.ZI 8m "[a-q]"
\&
.br
the letters a through q (just one of them)
.in -8m
.ZI 8m "{n,m}"
\&
.br
repetition count: match the preceding at least n and no more
than m times (POSIX restricts this to a maximum of 255
repeats)
.in -8m
.ZI 8m "[[:<:]]"
\&
.br
matches at the start of a word (GNU regex only)
.in -8m
.ZI 8m "[[:>:]]"
\&
.br
matches the end of a word (GNU regex only)
.in -8m
.ZI 8m "^"
\&
.br
as first char of a match, matches the start of a line (ONLY in
<nomultiline> matches\&.
.in -8m
.ZI 8m "$"
\&
.br
as last char of a match, matches at the end of a line (ONLY in
<nomultiline> matches)
.in -8m
.ZI 8m "\&."
\&
.br
(a period) matches any single character (except start-of-line or
end of line "virtual characters", but it does match a newline)\&.
.in -8m
.ZI 8m "a|b"
\&
.br
match a or b
.in -8m
.ZI 8m "(match)"
\&
.br
the () go away, and the string that matched inside is
available for capturing\&. Use \e\e( and \e\e) to match actual
parenthesis (the first \&'\e\&' tells "show the second \&'\e\&' to
the regex engine, the second \&'\e\&' forces a literalization
onto the parenthesis character\&.
.in -8m
.ZI 8m "\en"
\&
.br
matches the N\&'th parenthesized subexpression\&. Remember to
backslash-escape the backslash (e\&.g\&. write this as \e\e1)
This is only if you\&'re using TRE, not GNU regex\&.
.in -8m
The following are other POSIX expressions, which mostly do what you\&'d
guess they\&'d do from their names\&.
.di ZV
.in 0
.nf \fC
[[:alnum:]]
[[:alpha:]]
[[:blank:]]
[[:cntrl:]]
[[:digit:]]
[[:lower:]]
[[:upper:]]
[[:graph:]]
[[:print:]]
[[:punct:]]
[[:space:]]
[[:xdigit:]]
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
[[:graph:]] matches any character that puts ink on paper or lights a pixel\&.
[[:print:]] matches any character that moves the "print head" or cursor\&.
.SH NOTES ON SEQUENCE OF EVALUTATION
By default, CRM114 supports string length and mathematical evaluation
only in an EVAL statement, although it can be set to allow these in
any place where a var-expanded variable is allowed (see the -q flag)\&.
The default value ( zero ) allows stringlength and math evaluation
only in EVAL statements, and uses non-precedence (that is, strict
left-to-right unless parenthesis are used) algebraic notation\&. -q 1
uses RPN instead of algebraic, again allowing stringlength and math
evaluation only in EVAL expressions\&. Modes 2 and 3 allow stringlength
and math evaluation in \fIany\fP var-expanded expression, with
non-precedence algebraic notation and RPN notation respectively\&.
Evaluation is always left-to-right; there is no precedence of
operators beyond the sequential passes noted below\&.
The evaluation is done in four sequential passes:
.ZI 3m "1"
\e-constants like \en, \eo377 and \ex3F are substituted
.in -3m
.ZI 3m "2"
:*:var: variables are substituted (note the difference between
a constant like \&'\en\&' and a variable like ":*:_nl:" here - constants
are substituted first, then variables are substituted\&.)
.in -3m
.ZI 3m "3"
:#:var: string-length operations are performed
.in -3m
.ZI 3m "4"
:@:expression: mathematical expressions are performed; syntax is
either RPN or non-precedenced (parens required) algebraic
notation\&. Embedded non-evaluated strings in a mathematical
expression is currently a no-no\&.
Allowed operators are: + - * / % > < = only\&.
Only >, <, and = set logical results; they also evaluate to
1 and 0 for continued chain operations - e\&.g\&.
.di ZV
.in 0
.nf \fC
((:*:a: > 3) + (:*:b: > 5) + (:*:c: > 9) > 2)
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
is true IFF any of the following is true
.ZI 3m "\(bu"
a > 3 and b > 5
.in -3m
.ZI 3m "\(bu"
a > 3 and c > 9
.in -3m
.ZI 3m "\(bu"
b > 5 and c > 9
.in -3m
.in -3m
.SH NOTES ON APPROXIMATE REGEX MATCHING
Only the TRE engine supports approximate matching\&. The GNU engine does
not support approximate matching\&.
Approximate matching is specified similarly to a "repetition count" in
a regular regex, using brackets\&. This approximation applies to the
previous parenthesized expression (again, just like repetion counts)\&.
You can specify maximum total changes, and how many inserts, deletes,
and substitutions you wish to allow\&. The minimum-error match is found
and reported, if it exists within the bounds you state\&.
The basic syntax is:
.di ZV
.in 0
.nf \fC
(text-to-match){~[maxerrs] [#maxsubsts] [+maxinserts] [-maxdeletes]}
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
Note that the \&'~\&' (with an optional maxerr count) is \fIrequired\fP (that\&'s how
we know it\&'s an approximate regex rather than just a rep-count); if you
don\&'t specify a max error count, you will get the best match, if you do,
the match will have at most that many errors\&.
Remember that you specify the changes to the text in the \fIpattern\fP
necessary to make it match the text in the string being searched\&.
You cannot use approximate regexes and backrefs (like \e1) in the same
regex\&. This is a limitation of in TRE at this point\&.
You can also use an inequality in addition to the basic syntax above:
.di ZV
.in 0
.nf \fC
(text-to-match){~[maxerrs] [basic-syntax] [nI + mD + oS < K] }
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR
where n, m, and o are the costs per insertion, deletion, and substitution
respectively, \&'I\&', \&'D\&', and \&'S\&' are indicators to tell which cost goes
with which kind of error, and K is the total cost of the errors; the cost
of the errors is always strictly less than K\&.
Here are some examples\&.
.ZI 8m "(foobar)"
\&
.br
exactly matches "foobar"
.in -8m
.ZI 8m "(foobar){~}"
\&
.br
finds the closest match to "foobar", with the minimum
number of inserts, deletes, and substitutions\&. Always succeeds\&.
.in -8m
.ZI 8m "(foobar){~3}"
\&
.br
finds the closest match to "foobar", with no more than
3 inserts, deletes, or substitutions
.in -8m
.ZI 8m "(foobar){~2 +2 -1 #1)"
\&
.br
find the closest match to "foobar", with at
most two errors total, and at most two inserts, one delete, and one
substitution\&.
.in -8m
.ZI 8m "(foobar){~4 #1 1i + 2d < 5 }"
\&
.br
find the closest match to "foobar",
with at most four errors total, at most one substitution, and with the
number of insertions plus 2x the number of deletions less than 5\&.
.in -8m
.ZI 8m "(foo){~1}(bar){~1)"
\&
.br
find the closest match to "foobar", with at
most one error in the "foo" and one error in the "bar"\&.
.in -8m
.SH OVERALL LANGUAGE NOTES
Here\&'s how to remember what goes where in the CRM114 language\&.
Unlike most computer languages, CRM114 uses inflection (or declension)
rather than position to describe what role each part of a statement
plays\&. The declensions are marked by the delimiters- the /, ( and ), <
and >, and [ and ]\&.
By and large, you can mix up the arguments to each kind of statement
without changing their meaning\&. Only the ACTION needs to be first\&.
Other parts of the statement can occur in any order, save that
multiple (paren_args) and /pattern_args/ must stay in their nominal
order but can go anywhere in the statement\&. They do not need to be
consecutive\&.
The parts of a CRM114 statement are:
.ZI 17m "ACTION"
the verb\&. This is at the start of the statement\&.
.in -17m
.ZI 17m "/pattern/"
the overall pattern the verb should use, analogous to the
"subject" of the statement\&.
.in -17m
.ZI 17m "<flags>"
modifies how the ACTION does the work\&. You\&'d call these
"adverbs" in human languages\&.
.in -17m
.ZI 17m "(vars)"
what variables to use as adjuncts in the action (what would
be called the "direct objects")\&. These can get changed when the action
happens\&.
.in -17m
.ZI 17m "[limited-to]"
where the action is allowed to take place (think of it
as the "indirect object")\&. These are not directly changed by the action\&.
.in -17m
.SH SEE ALSO
\fBcssmerge(1)\fP, \fBcssdiff(1)\fP, \fBcssutil(1)\fP
The CRM114 homepage is at http://crm114\&.sf\&.net/ \&.
.SH VERSION
This manpage: $Id: crm114\&.azm,v 1\&.12 2004/08/19 11:10:49 vanbaal Exp $
This manpage describes the crm114 utility as it has been described by
QUICKREF\&.txt, shipped with crm114-20040212-BlameJetlag\&.src\&.tar\&.gz\&. The
DESCRIPTION section is copy-and-pasted from INTRO\&.txt as distributed with the
same source tarball\&.
Converted from plain ascii to zoem by Joost van Baal\&.
.SH COPYRIGHT
Copyright (C) 2001, 2002, 2003, 2004 William S\&. Yerazunis
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version\&.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE\&. See the
GNU General Public License for more details\&.
You should have received a copy of the GNU General Public License
along with this program (see COPYING); if not, check with
http://www\&.gnu\&.org/copyleft/gpl\&.html or write to the Free Software
Foundation, Inc\&., 59 Temple Place - Suite 330, Boston, MA 02111, USA\&.
.SH AUTHOR
William S\&. Yerazunis\&. Manpage typesetting by Joost van Baal and Shalendra Chhabra
|