File: crm.1

package info (click to toggle)
crm114 20100106-10
  • links: PTS
  • area: main
  • in suites: bookworm, bullseye, sid, trixie
  • size: 3,184 kB
  • sloc: ansic: 34,910; sh: 617; makefile: 578; lisp: 208
file content (1129 lines) | stat: -rw-r--r-- 32,409 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
.\" Copyright (c) 2004 William S\&. Yerazunis\&. Manpage typesetting by Joost van Baal and Shalendra Chhabra
.TH "crm" 1 "19 Aug 2004" "crm114 20040816\&.BlameClockworkOrange-auto\&.3" "CRM114"
.po 2m
.de ZI
.\" Zoem Indent/Itemize macro I.
.br
'in +\\$1
.nr xa 0
.nr xa -\\$1
.nr xb \\$1
.nr xb -\\w'\\$2'
\h'|\\n(xau'\\$2\h'\\n(xbu'\\
..
.de ZJ
.br
.\" Zoem Indent/Itemize macro II.
'in +\\$1
'in +\\$2
.nr xa 0
.nr xa -\\$2
.nr xa -\\w'\\$3'
.nr xb \\$2
\h'|\\n(xau'\\$3\h'\\n(xbu'\\
..
.if n .ll -2m
.am SH
.ie n .in 4m
.el .in 8m
..
.SH NAME
crm \- The Controllable Regex Mutilator
.SH SYNOPSIS
.B crm
[\fIOPTION\fR]... \fICRMFILE\fR
.SH WARNING
This man page is taken from an older CRM114 version.  It is provided as a
convenience to Debian users and may not be up-to-date.  If you would like to
update it, please send appropriate patches to the Debian bug tracking system.
.SH OPTIONS
.PP
\fB-d\fP N (\fIenter debugger after running N cycles\&. Omitting N means N equals 0\&.\fP)\fB\fP
.TP
\fB-e\fP (\fIdo not import any environment variables\fP)\fB\fP
.TP
\fB-h\fP (\fIprint help text\fP)\fB\fP
.TP
\fB-p\fP (\fIgenerate an execution-time-spent profile on exit\fP)\fB\fP
.TP
\fB-P\fP N (\fImax program lines\fP)\fB\fP
.TP
\fB-q\fP m (\fImathmode (0,1 = alg/RPN only in EVAL, 2,3 = alg/RPN everywhere)\fP)\fB\fP
.TP
\fB-s\fP N (\fInew feature file (\&.css) size is N (default 1 meg+1 featureslots)\fP)\fB\fP
.TP
\fB-S\fP N (\fInew feature file (\&.css) size is N rounded to 2^I+1 featureslots\fP)\fB\fP
.TP
\fB-t\fP (\fIuser trace output\fP)\fB\fP
.TP
\fB-T\fP (\fIimplementors trace output (only for the masochistic!)\fP)\fB\fP
.TP
\fB-u\fP dir (\fIchdir to directory dir before starting execution\fP)\fB\fP
.TP
\fB-v\fP (\fIprint CRM114 version identification and exit\fP)\fB\fP
.TP
\fB-w\fP N (\fImax data window (bytes, default 16 megs)\fP)\fB\fP
.TP
\fB--\fP (\fIsignals the end CRM114 flags; prior flags are not seen by the user program; subsequent args are not processed by CRM114\fP)\fB\fP
.TP
\fB--foo\fP (\fIcreates the user variable :foo: with the value SET\fP)\fB\fP
.TP
\fB--x=y\fP (\fIcreates the user variable :x: with the value y\fP)\fB\fP
.TP
\fB-{\fP stmts} (\fIexecute the statements inside the {} brackets\fP)\fB\fP
.TP
\fBcrmfile\fP (\fI\&.crm file name\fP)
.SH DESCRIPTION
CRM114 is a language designed to write filters in\&. It caters to
filtering email, system log streams, html, and other marginally
human-readable ASCII that may occasion to grace your computer\&.

CRM114\&'s unique strengths are the data structure (everything is
a string and a string can overlap another string), it\&'s ability
to work on truly infinitely long input streams, it\&'s ability to
use extremely advanced classifiers to sort text, and the ability
to do approximate regular expressions (that is, regexes that
don\&'t quite match) via the TRE regex library\&.

CRM114 also sports a very powerful subprocess control facility, and
a unique syntax and program structure that puts the fun back in
programming (OK, you can run away screaming now)\&. The syntax is
declensional rather than positional; the type of quote marks around
an argument determine what that argument will be used for\&.

The typical CRM114 program uses regex operations more often
than addition (in fact, math was only added to TRE in the waning
days of 2003, well after CRM114 had been in daily use for over
a year and a half)\&.

In other words, crm114 is a very \fBvery\fP powerful mutagenic filter that
happens to be a programming language as well\&.

The filtering style of the CRM-114 discriminator is based on the fact
that most spam, normal log file messages, or other uninteresting data
is easily categorized by a few characteristic patterns (such as
"Mortgage leads", "advertise on the internet", and "mail-order toner
cartridges"\&.) CRM114 may also be useful to folks who are on multiple
interlocking mailing lists\&.

In a bow to Unix-style flexibility, by default CRM114 reads it\&'s
input from standard input, and by default sends it\&'s output to
standard output\&. Note that the default action has a zero-length
output\&. Redirection and use of other input or output files is
possible, as well as the use of windowing, either delimiter-based or
time-based, for real-time continuous applications\&.

CRM114 can be used for other than mail filtering; consider it to be a version
of \fIgrep\fP with super powers\&. If perl is a seventy-bladed swiss army knife,
CRM114 is a razor-sharp katana that can talk\&.
.SH INVOCATION
Absent the -{ program } flag, the first argument is taken to be the name of
a file containing a crm114 program, subsequent arguments are merely supplied
as :_argN: values\&. Use single quotes around commandline programs
\&'-{ like this }\&' to prevent the shell from doing odd things to your
command-line programs\&.

CRM114 can be directly invoked by the shell if the first line of your
program file uses the shell standard, as in:

.di ZV
.in 0
.nf \fC
#! /usr/bin/crm
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

You can use CRM114 flags on the shell-standard invocation line, and
hide them with \&'--\&' from the program itself; \&'--\&' incidentally prevents
the invoking user from changing any CRM114 invocation flags\&.

Flags should be located after any positional variables on the command
line\&. Flags \fIare\fP visible as :_argN: variables, so you can create
your own flags for your own programs (separate CRM114 and user flags
with \&'--\&')\&.
Two examples on how to do this:

.di ZV
.in 0
.nf \fC
\&./foo\&.crm bar mugga < baz  -t -w 150000
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

.di ZV
.in 0
.nf \fC
\&./foo\&.crm -t -w 1500000 -- bar < baz mugga
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

One example on how \fBnot\fP to do this:

.di ZV
.in 0
.nf \fC
\&./foo\&.crm -t -w 150000 bar < baz mugga
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

(That\&'s WRONG!)

You can put a list of user-settable vars on the \fC#!/usr/bin/crm\fR
invocation line\&. CRM114 will print these out when a program is
invoked directly (e\&.g\&. "\&./myprog\&.crm -h", not "crm myprog\&.crm -h")
with the -h (for help) flag\&. (note that this works ONLY on bash
on Linux- *BSD\&'s have a different bash interpretation and this
doesn\&'t work)

Example:

.di ZV
.in 0
.nf \fC
#!/usr/bin/crm  -( var1 var2=A var2=B var2=C )
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

This allows only \fCvar1\fR and \fCvar2\fR be set on the command line\&. If a
variable is not assigned a value, the user can set any value desired\&. If the
variable is equated to a set of values, those are the \fIonly\fP values allowed\&.

Another example:

.di ZV
.in 0
.nf \fC
#!/usr/bin/crm  -( var1 var2=foo )  --
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

This allows \fCvar1\fR to
be set to any value, \fCvar2\fR may only be set to either \fCfoo\fR or not at all,
and no other variables may be set nor may invocation flags be changed (because
of the trailing "--")\&. Since "--" also blocks \&'-h\&' for help, such programs
should provide their own help facility\&.
.SH VARIABLES
Variable names and locations start with a : , end with a : , and may
contain only characters that have ink (i\&.e\&. the [:graph:] class) with
few exceptions\&.

Examples \fC:here:\fR, \fC:ThErE:\fR, \fC:every-where_0123+45%6789:\fR,
\fC:this_is_a_very_very_long_var_name_that_does_not_tell_us_much:\fR\&.
Builtin variables:

.ZI 21m ":_nl:"
newline
.in -21m
.ZI 21m ":_ht:"
horizontal tab
.in -21m
.ZI 21m ":_bs:"
backspace
.in -21m
.ZI 21m ":_sl:"
a slash
.in -21m
.ZI 21m ":_sc:"
a semicolon
.in -21m
.ZI 21m ":_arg0: thru :_argN:"
command-line args, including \fIall\fP flags
.in -21m
.ZI 21m ":_argc:"
how many command line arguments there were
.in -21m
.ZI 21m ":_pos0: thru :_posN:"
positional args (\&'-\&' or \&'--\&' args deleted)
.in -21m
.ZI 21m ":_posc:"
how many positional arguments there were
.in -21m
.ZI 21m ":_pos_str:"
all positional arguments concatented
.in -21m
.ZI 21m ":_env_whatever:"
environment value \&'whatever\&'
.in -21m
.ZI 21m ":_env_string:"
all environmental arguments concatenated
.in -21m
.ZI 21m ":_crm_version:"
the version of the CRM system
.in -21m
.ZI 21m ":_dw:"
the current data window contents
.in -21m
.SH VARIABLE EXPANSION
Variables are expanded by the \fC:*:\fR var-expansion operator,
e\&.g\&. \fC:*:_nl:\fR expands to a newline character\&. Uninitialized vars
evaluate to their text name (and the colons stay)\&.

You can also use the standard constant C \&'\e\&' characters, such as "\en"
for newline, as well as excaped hexadecimal and octal characters like
\exHH and \eoOOO but these are constants, not variables, and cannot be
redefined\&.

Depending on the value of "math mode" (flag -q)\&. you can also use
\fC:#:string_or_var:\fR to get the length of a string, and \fC:@:string_or_var:\fR
to do basic mathematics and inequality testing, either only in EVALs
or for all var-expanded expressions\&. See "Sequence of Evaluation"
below for more details\&.
.SH PROGRAM BEHAVIOR
Default behavior is to read all of standard input till EOF into the
default data window (named \fC:_dw:\fR), then execute the program (this is
overridden if first executable statement is a WINDOW statement)\&.

Variables don\&'t get their own storage unless you ISOLATE them (see
below), instead variables are start/length pairs indexing into the
default data window\&. Thus, ALTERing an unISOLATEd variable changes
the value of the default data buffer itself\&. This is a great power,
so use it only for good, and never for evil\&.
.SH STATEMENTS AND STUFF
Statements are separated with a \&';\&' or with a newline\&.

.ZI 8m "\e"
\&
.br
\&'\e\&' is the string-text escape character\&. You only \fIneed\fP to
escape the literal representation of closing delimiters inside var-expanded
arguments\&.

You can use the classic C/C++ \e-escapes, such as \en, \er,
\et, \ea, \eb, \ev, \ef, \e0, and also \exHH and \eoOOO for hex and
octal characters, respectively\&.

A \&'\e\&' as the \fIlast\fP character of a line means the next line
is just a continuation of this one\&.

A \e-escape that isn\&'t recognized as something special isn\&'t
an error; you may \fIoptionally\fP escape any of the delimiters
\fC>\fR, \fC)\fR \fC]\fR \fC}\fR \fC;\fR \fC/\fR \fC#\fR \fC\e\fR and get just
that character\&.

A \&'\e\&' anywhere else is just a literal backslash, so the regex
([abc])\e1 is written just that way; there is no need to
double-backslash the \e1 (although it will work if you do)\&.
.in -8m
.ZI 8m "# this is a comment"
\&
'in -8m
.ZI 8m "# and this too \e#"
\&
'in -8m
'in +8m
\&
.br
A comment is not a piece of preprocessor
sugar -- it is a \fIstatement\fP and ends at the newline or at "\e#"\&.
.in -8m
.ZI 8m "insert filename"
\&
.br
inserts the file verbatim at this line at compile
time\&.
.in -8m
.ZI 8m ";"
\&
.br
statement separator - must ALWAYS be escaped as \e; unless it\&'s
inside delimiters or else it will mark the end of the statement\&.
.in -8m
.ZI 8m "{ and }"
\&
.br
start and end blocks of statements\&. Must always be \&'\e\&'
escaped or inside delimiters or these will mark the start/end of a
block\&.
.in -8m
.ZI 8m "noop"
\&
.br
no-op statement
.in -8m
.ZI 8m ":label:"
\&
.br
define a GOTOable label
.in -8m
.ZI 8m "accept"
\&
.br
writes the current data window to standard output; execution
continues\&.
.in -8m
.ZI 8m "alius"
\&
.br
if the last bracket-group succeeded, ALIUS skips to end of {}
block (a skip, not a FAIL); if the prior group FAILed, ALIUS does
nothing\&. Thus, ALIUS is both an ELSE clause and a CASE statement\&.
.in -8m
.ZI 8m "alter (:var:) /new-val/"
\&
.br
destructively change value of var to newval;
(:var:) is var to change (var-expanded); /new-val/ is value to change
to (var-expanded)\&.
.in -8m
.ZI 8m "classify <flags> (:c1:\&.\&.\&.|\&.\&.\&.:cN:) (:stats:) [:in:] /word-pat/"
\&
.br
compare the statistics of the current data window buffer with classfiles
c1\&.\&.\&.cN\&.

.ZI 17m "<flags>"
If <flags> is set to <nocase>, ignore case in
word-pat, does not change case in hash (use tr() to do that on
:in: if you want it)\&.
.in -17m
.ZI 17m "(:c1: \&.\&.\&."
file or files to consider "success" files\&. The
CLASSIFY succeeds if these files as a group match best\&. If not,
the CLASSIFY does a FAIL\&.
.in -17m
.ZI 17m "|"
optional separator\&. Spaces on each side of the " | " are
required\&.
.in -17m
.ZI 17m "\&.\&.\&.\&. :cN:)"
optional files to the right of " | " are considered
as a group to "fail"\&. If statement fails, execution skips to end
of enclosing {\&.\&.} block, which exits with a FAIL status (see
ALIUS for why this is useful)\&.
.in -17m
.ZI 17m "(:stats:)"
optional var that will get a text formatted matching
summary
.in -17m
.ZI 17m "[:in:]"
restrict statistical measure to the string inside :in:
.in -17m
.ZI 17m "/word-pat/"
regex to describe what a parseable word is\&.
.in -17m
.in -8m
.ZI 8m "eval (:result:) /instring/"
\&
.br
repeatedly evaluates /instring/ until it
ceases to change, then places that result
as the value of :result: \&. EVAL uses
smart (but foolable) heuristics to avoid
infinite loops, like evaluating a string
that evaluates to a request to evaluate
itself again\&. The error rate is about
1 / 2^62 and will detect chain groups of
length 255 or less\&.
If the instring uses math evaluation
(see section below on math operations)
and the evaluation has an inequality
test, (>, < or =) then if the inequality
fails, the EVAL will FAIL to the end of
block\&. If the evaluation has a numeric
fault (e\&.g\&. divide-by-zero) the EVAL will
do a TRAPpable FAULT\&.
.in -8m
.ZI 8m "exit /:retval:/"
\&
.br
ends program execution\&. If supplied, the
return value is converted to an integer
and returned as the exit code of the
crm114 program\&. If no retval is supplied,
the return value is 0\&.
.in -8m
.ZI 8m "fail"
\&
.br
skips down to end of the current { } block
and causes that block to exit with a FAIL
status (see ALIUS for why this is useful)
.in -8m
.ZI 8m "fault /faultstr/"
\&
.br
forces a FAULT with the given string as
the reason\&. The fault string is
val-expanded\&.
.in -8m
.ZI 8m "goto /:label:/"
\&
.br
unconditional branch (you can use a variable as the
goal, e\&.g\&. /:*:there:/ )
.in -8m
.ZI 8m "hash (:result:) /input/"
\&
.br
compute a fast 32-bit hash of the /input/,
and ALTER :result: to the
hexadecimal hash value\&. HASH is
\fInot\fP warranted to be constant across
major releases of CRM114, nor is it
cryptographically secure\&.

.ZI 17m "(:result:)"
value that gets result\&.
.in -17m
.ZI 17m "/input/"
string to be hashed (can contain expanded :*:vars:,
defaults to the data window :_dw:)
.in -17m
.in -8m
.ZI 8m "intersect (:out:) [:var1: :var2: \&.\&.\&.]"
\&
.br
makes :out: contain the part of
the data window that is the intersection of
:var1 :var2: \&.\&.\&. ISOLATEd vars are ignored\&.
This only resets the value of the captured
variable, and does NOT alter any text in
the data window\&.
.in -8m
.ZI 8m "isolate (:var:) /initial-value/"
\&
.br
puts :var: into a data area outside of
the data buffer; subsequent changes to this
var don\&'t change the data buffer (though
they may change the value of any var
subsequently set inside of this var)\&.
If the var already was ISOLATED, this is
a noop\&.

.ZI 17m "(:var:)"
name of ISOLATEd var (var-expanded)
.in -17m
.ZI 17m "/initial-value/"
optional initial value for :var:
(var-expanded)\&. If no value is supplied,
the previous value is retained/copied\&.
.in -17m
.in -8m
.ZI 8m "input <flags> (:result:) [:filename:]"
\&
.br
read in the content of filename\&.
If no filename, then read stdin

.ZI 17m "<byline>"
read one line only
.in -17m
.ZI 17m "(:result:)"
var that gets the input value
.in -17m
.ZI 17m "[:filename:]"
the file to read
.in -17m
.in -8m
.ZI 8m "learn <flags> (:class:) [:in:] /word-pat/"
\&
.br
learn the statistics of
the :in: var (or the input window if no
var) as an example of class :class:

.ZI 17m "<flags>"
can be any of <nocase>, <refute> and <microgroom>\&.
<nocase>: ignore case in matching word-pat (does not ignore case in
hash- use tr() to do that on :in: if you want it)\&. <refute>: this is
an anti-example of this class- unlearn it! <microgroom>: enable the
microgroomer to purge less-important information automatically
whenever the statistics file gets to crowded\&.
.in -17m
.ZI 17m "(:class:)"
name of file holding hashed results; nominal file
extension is \&.css
.in -17m
.ZI 17m "[:in:]"
captured var containing the text to be learned (if
omitted, the full contents of the data window is used)
.in -17m
.ZI 17m "/word-pat/"
regex that defines a "word"\&. Things that aren\&'t
"words" are ignored\&.
.in -17m
.in -8m
.ZI 8m "liaf"
\&
.br
skips UP to START of the current {} block (LIAF is FAIL
spelled backwards)
.in -8m
.ZI 8m "match <flags> (:var1: \&.\&.\&.) [:in:] /regex/"
\&
.br
Attempt to match the given
regex; if match succeds, variables are bound; if match fails, program
skips to the closing \&'}\&' of this block

.ZI 17m "<flags>"
flags can be any of

.ZI 3m "<absent>"
statement succeeds if match not present
.in -3m
.ZI 3m "<nocase>"
ignore case when matching
.in -3m
.ZI 3m "<fromstart>"
start match at start of the [:in:] var
.in -3m
.ZI 3m "<fromcurrent>"
start match at start of previous successful
match on the [:in:] var
.in -3m
.ZI 3m "<fromnext>"
start match at one character past the start of
the previous successful match on the [:in:] var
.in -3m
.ZI 3m "<fromend>"
start match at one character past the end of
prev\&. match on this [:in:] var
.in -3m
.ZI 3m "<newend>"
require match to end after end of prev\&. match on
this [:in:] var
.in -3m
.ZI 3m "<backwards>"
search backward in the [:in:] variable from the
last successful match\&.
.in -3m
.ZI 3m "<nomultiline>"
don\&'t allow this match to span lines
.in -3m
.in -17m
.ZI 17m "(:var1: \&.\&.\&.)"
optional variables to bind to regex result and
\&'(\&' \&')\&' subregexes
.in -17m
.ZI 17m "[:in:]"
search only in the variable specified; if omitted, :_dw:
(the full input data window) is used
.in -17m
.ZI 17m "/regex/"
POSIX regex (with \e escapes as needed)
.in -17m
If you build CRM114 to use the GNU regex library for MATCHing,
be warned that GNU REGEX has numerous issues\&. See the
KNOWN_BUGS file for a detailed listing\&.
.in -8m
.ZI 8m "output <flags> [filename] /output-text/"
\&
.br
output an arbitrary string
with captured values expanded\&.

.ZI 17m "<flags>"
<append>: append to the file (otherwise, overwrites)
.in -17m
.ZI 17m "[filename]"
filename to send output (var-expanded), default output
is to stdout
.in -17m
.ZI 17m "/output-text/"
string to output (var-expanded)
.in -17m
.in -8m
.ZI 8m "syscall <flags> (:in:) (:out:) (:status:) /command/"
\&
.br
execute a shell
command

.ZI 17m "<flags>"
can be any of <keep> and <async>\&. <keep>: keep this
process around; if kept, then a syscall with the same :keep: var will
continue feeding to and reading from the kept proc\&. <async>: don\&'t
wait for process to send an EOF; just grab what\&'s available in
the process\&'s output pipe and proceed (limit per syscall is 256 Kbytes)
.in -17m
.ZI 17m "(:in:)"
var-expanded string to feed to command as input (can be
null if you don\&'t want to send the process something\&.) You \fBmust\fP
specify this if you want to specify an :out: variable\&.
.in -17m
.ZI 17m "(:out:)"
var-expanded varname to place results into (\fBmust\fP
pre-exist, can be null if you don\&'t want to read the process\&'s
output (yet, or at all)\&. Limit per syscall is 256 Kbytes\&. You
\fBmust\fP specify this if you want to use the :status: variable)
.in -17m
.ZI 17m "(:status:)"
if you want to keep a minion proc around, or catch the
exit status of the process, specify a var here\&. The minion process\&'s
PID and pipes will be stored here\&. The program can access the proc
again with another syscall by using this var again\&. When the process
exits, it\&'s exit code will be stored here\&.
.in -17m
.in -8m
.ZI 8m "trap (:reason:) /trap_regex/"
\&
.br
traps faults from both FAULT statements
and program errors occurring anywhere in the preceding bracket-block\&. If
no fault exists, TRAP does a SKIP to end of block\&. If there is a fault
and the fault reason string matches the trap_regex, the fault is trapped,
and execution continues with the line after the TRAP, otherwise the fault
is passed up to the next surrounding trapped bracket block\&.

.ZI 17m "(:reason:)"
the fault message that caused this FAULT\&. If it was a
user fault, this is the text the user supplied in the FAULT statement\&.
.in -17m
.ZI 17m "/trap_regex/"
the regex that determines what kind of faults this
TRAP will accept\&. Putting a wildcard here (e\&.g\&. /\&.*/ means that ALL
faults will be trapped here\&.
.in -17m
.in -8m
.ZI 8m "union (:out:) [:var1: :var2: \&.\&.\&.]"
\&
.br
makes :out: contain the union of
the data window segments that contains var1, var2\&.\&.\&. plus any intervening
text as well\&. Any ISOLATEd var is ignored\&. This is non-surgical, and
does not alter the data window
.in -8m
.ZI 8m "window <flags> (:w-var:) (:s-var:) /cut-regex/ /add-regex/"
\&
.br
window
slider\&. This deletes to and including the cut-regex from :var: (default:
use the data window), then reads adds from std\&. input till add-regex
(inclusive)\&.

.ZI 17m "<flags>"
flags can be any of

.ZI 17m "<nocase>"
ignore case when matching cut- and add- regexes
.in -17m
.ZI 17m "<bychar>"
check input for add-regex every character
.in -17m
.ZI 17m "<byline>"
check input for add-regex every line
.in -17m
.ZI 17m "<byeof>"
wait for EOF to check for add-regex (extra characters
are kept around for later)
.in -17m
.ZI 17m "<eofends>"
read lots of input; the input is up to the regex
match OR the contents till EOF
.in -17m
.in -17m
.ZI 17m "(:w-var:)"
what var to window
.in -17m
.ZI 17m "(:s-var:)"
what var to use for source (defaults to stdin, if you
use a source var you \fBmust\fP specify the windowed var\&.
.in -17m
.ZI 17m "/cut-regex/"
var-expanded cut pattern
.in -17m
.ZI 17m "/add-regex/"
var-expanded add pattern, if absent reads till EOF
.in -17m
If both cut-regex and add-regex are omitted, and this window statement is
the \fIfirst executable\fP statement in the program, then CRM114 does
\fInot\fP wait to read a anything from standard input input before starting
program execution\&.
.in -8m
.SH A QUICK REGEX INTRO
A regex is a pattern match\&. Do a "man 7 regex" for details\&.

Matches are, by default "first starting point that matches, then
longest match possible that can fit"\&.

.ZI 8m "a through z"
\&
'in -8m
.ZI 8m "A through Z"
\&
'in -8m
.ZI 8m "0 through 9"
\&
'in -8m
'in +8m
\&
.br
all match themselves\&.
.in -8m
.ZI 8m "most punctuation"
\&
.br
matches itself, but check below!
.in -8m
.ZI 8m "*"
\&
.br
repeat preceding 0 or more times
.in -8m
.ZI 8m "+"
\&
.br
repeat preceding 1 or more times
.in -8m
.ZI 8m "?"
\&
.br
repeat preceding 0 or 1 time
.in -8m
.ZI 8m "*?, +?, ??"
\&
.br
repeat preceding, but \fIshortest\fP match that fits, given
the already-selected start point of the regex\&. (only
supported by TRE regex, not GNU regex)
.in -8m
.ZI 8m "[abcde]"
\&
.br
any one of the letters a, b, c, d, or e
.in -8m
.ZI 8m "[a-q]"
\&
.br
the letters a through q (just one of them)
.in -8m
.ZI 8m "{n,m}"
\&
.br
repetition count: match the preceding at least n and no more
than m times (POSIX restricts this to a maximum of 255
repeats)
.in -8m
.ZI 8m "[[:<:]]"
\&
.br
matches at the start of a word (GNU regex only)
.in -8m
.ZI 8m "[[:>:]]"
\&
.br
matches the end of a word (GNU regex only)
.in -8m
.ZI 8m "^"
\&
.br
as first char of a match, matches the start of a line (ONLY in
<nomultiline> matches\&.
.in -8m
.ZI 8m "$"
\&
.br
as last char of a match, matches at the end of a line (ONLY in
<nomultiline> matches)
.in -8m
.ZI 8m "\&."
\&
.br
(a period) matches any single character (except start-of-line or
end of line "virtual characters", but it does match a newline)\&.
.in -8m
.ZI 8m "a|b"
\&
.br
match a or b
.in -8m
.ZI 8m "(match)"
\&
.br
the () go away, and the string that matched inside is
available for capturing\&. Use \e\e( and \e\e) to match actual
parenthesis (the first \&'\e\&' tells "show the second \&'\e\&' to
the regex engine, the second \&'\e\&' forces a literalization
onto the parenthesis character\&.
.in -8m
.ZI 8m "\en"
\&
.br
matches the N\&'th parenthesized subexpression\&. Remember to
backslash-escape the backslash (e\&.g\&. write this as \e\e1)
This is only if you\&'re using TRE, not GNU regex\&.
.in -8m
The following are other POSIX expressions, which mostly do what you\&'d
guess they\&'d do from their names\&.

.di ZV
.in 0
.nf \fC

  [[:alnum:]]
  [[:alpha:]]
  [[:blank:]]
  [[:cntrl:]]
  [[:digit:]]
  [[:lower:]]
  [[:upper:]]
  [[:graph:]]
  [[:print:]]
  [[:punct:]]
  [[:space:]]
  [[:xdigit:]]
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

[[:graph:]] matches any character that puts ink on paper or lights a pixel\&.
[[:print:]] matches any character that moves the "print head" or cursor\&.
.SH NOTES ON SEQUENCE OF EVALUTATION
By default, CRM114 supports string length and mathematical evaluation
only in an EVAL statement, although it can be set to allow these in
any place where a var-expanded variable is allowed (see the -q flag)\&.
The default value ( zero ) allows stringlength and math evaluation
only in EVAL statements, and uses non-precedence (that is, strict
left-to-right unless parenthesis are used) algebraic notation\&. -q 1
uses RPN instead of algebraic, again allowing stringlength and math
evaluation only in EVAL expressions\&. Modes 2 and 3 allow stringlength
and math evaluation in \fIany\fP var-expanded expression, with
non-precedence algebraic notation and RPN notation respectively\&.
Evaluation is always left-to-right; there is no precedence of
operators beyond the sequential passes noted below\&.
The evaluation is done in four sequential passes:

.ZI 3m "1"
\e-constants like \en, \eo377 and \ex3F are substituted
.in -3m
.ZI 3m "2"
:*:var: variables are substituted (note the difference between
a constant like \&'\en\&' and a variable like ":*:_nl:" here - constants
are substituted first, then variables are substituted\&.)
.in -3m
.ZI 3m "3"
:#:var: string-length operations are performed
.in -3m
.ZI 3m "4"
:@:expression: mathematical expressions are performed; syntax is
either RPN or non-precedenced (parens required) algebraic
notation\&. Embedded non-evaluated strings in a mathematical
expression is currently a no-no\&.

Allowed operators are: + - * / % > < = only\&.

Only >, <, and = set logical results; they also evaluate to
1 and 0 for continued chain operations - e\&.g\&.

.di ZV
.in 0
.nf \fC
((:*:a: > 3) + (:*:b: > 5) + (:*:c: > 9) > 2)
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

is true IFF any of the following is true

.ZI 3m "\(bu"
a > 3 and b > 5
.in -3m
.ZI 3m "\(bu"
a > 3 and c > 9
.in -3m
.ZI 3m "\(bu"
b > 5 and c > 9
.in -3m
.in -3m
.SH NOTES ON APPROXIMATE REGEX MATCHING
Only the TRE engine supports approximate matching\&. The GNU engine does
not support approximate matching\&.

Approximate matching is specified similarly to a "repetition count" in
a regular regex, using brackets\&. This approximation applies to the
previous parenthesized expression (again, just like repetion counts)\&.
You can specify maximum total changes, and how many inserts, deletes,
and substitutions you wish to allow\&. The minimum-error match is found
and reported, if it exists within the bounds you state\&.

The basic syntax is:

.di ZV
.in 0
.nf \fC
(text-to-match){~[maxerrs] [#maxsubsts] [+maxinserts] [-maxdeletes]}
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

Note that the \&'~\&' (with an optional maxerr count) is \fIrequired\fP (that\&'s how
we know it\&'s an approximate regex rather than just a rep-count); if you
don\&'t specify a max error count, you will get the best match, if you do,
the match will have at most that many errors\&.

Remember that you specify the changes to the text in the \fIpattern\fP
necessary to make it match the text in the string being searched\&.

You cannot use approximate regexes and backrefs (like \e1) in the same
regex\&. This is a limitation of in TRE at this point\&.

You can also use an inequality in addition to the basic syntax above:

.di ZV
.in 0
.nf \fC
(text-to-match){~[maxerrs] [basic-syntax] [nI + mD + oS < K] }
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

where n, m, and o are the costs per insertion, deletion, and substitution
respectively, \&'I\&', \&'D\&', and \&'S\&' are indicators to tell which cost goes
with which kind of error, and K is the total cost of the errors; the cost
of the errors is always strictly less than K\&.
Here are some examples\&.

.ZI 8m "(foobar)"
\&
.br
exactly matches "foobar"
.in -8m
.ZI 8m "(foobar){~}"
\&
.br
finds the closest match to "foobar", with the minimum
number of inserts, deletes, and substitutions\&. Always succeeds\&.
.in -8m
.ZI 8m "(foobar){~3}"
\&
.br
finds the closest match to "foobar", with no more than
3 inserts, deletes, or substitutions
.in -8m
.ZI 8m "(foobar){~2 +2 -1 #1)"
\&
.br
find the closest match to "foobar", with at
most two errors total, and at most two inserts, one delete, and one
substitution\&.
.in -8m
.ZI 8m "(foobar){~4 #1 1i + 2d < 5 }"
\&
.br
find the closest match to "foobar",
with at most four errors total, at most one substitution, and with the
number of insertions plus 2x the number of deletions less than 5\&.
.in -8m
.ZI 8m "(foo){~1}(bar){~1)"
\&
.br
find the closest match to "foobar", with at
most one error in the "foo" and one error in the "bar"\&.
.in -8m
.SH OVERALL LANGUAGE NOTES
Here\&'s how to remember what goes where in the CRM114 language\&.

Unlike most computer languages, CRM114 uses inflection (or declension)
rather than position to describe what role each part of a statement
plays\&. The declensions are marked by the delimiters- the /, ( and ), <
and >, and [ and ]\&.

By and large, you can mix up the arguments to each kind of statement
without changing their meaning\&. Only the ACTION needs to be first\&.
Other parts of the statement can occur in any order, save that
multiple (paren_args) and /pattern_args/ must stay in their nominal
order but can go anywhere in the statement\&. They do not need to be
consecutive\&.

The parts of a CRM114 statement are:

.ZI 17m "ACTION"
the verb\&. This is at the start of the statement\&.
.in -17m
.ZI 17m "/pattern/"
the overall pattern the verb should use, analogous to the
"subject" of the statement\&.
.in -17m
.ZI 17m "<flags>"
modifies how the ACTION does the work\&. You\&'d call these
"adverbs" in human languages\&.
.in -17m
.ZI 17m "(vars)"
what variables to use as adjuncts in the action (what would
be called the "direct objects")\&. These can get changed when the action
happens\&.
.in -17m
.ZI 17m "[limited-to]"
where the action is allowed to take place (think of it
as the "indirect object")\&. These are not directly changed by the action\&.
.in -17m
.SH SEE ALSO
\fBcssmerge(1)\fP, \fBcssdiff(1)\fP, \fBcssutil(1)\fP

The CRM114 homepage is at http://crm114\&.sf\&.net/ \&.
.SH VERSION
This manpage: $Id: crm114\&.azm,v 1\&.12 2004/08/19 11:10:49 vanbaal Exp $

This manpage describes the crm114 utility as it has been described by
QUICKREF\&.txt, shipped with crm114-20040212-BlameJetlag\&.src\&.tar\&.gz\&. The
DESCRIPTION section is copy-and-pasted from INTRO\&.txt as distributed with the
same source tarball\&.

Converted from plain ascii to zoem by Joost van Baal\&.
.SH COPYRIGHT
Copyright (C) 2001, 2002, 2003, 2004 William S\&. Yerazunis

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version\&.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE\&. See the
GNU General Public License for more details\&.

You should have received a copy of the GNU General Public License
along with this program (see COPYING); if not, check with
http://www\&.gnu\&.org/copyleft/gpl\&.html or write to the Free Software
Foundation, Inc\&., 59 Temple Place - Suite 330, Boston, MA 02111, USA\&.
.SH AUTHOR
William S\&. Yerazunis\&. Manpage typesetting by Joost van Baal and Shalendra Chhabra