File: vmod_re2.vcc

package info (click to toggle)
libvmod-re2 2.0.0%2B20250617-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 708 kB
  • sloc: ansic: 1,518; cpp: 514; makefile: 111; sh: 38; ruby: 36
file content (1737 lines) | stat: -rw-r--r-- 66,211 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
#-
# Copyright (c) 2016-2018 UPLEX Nils Goroll Systemoptimierung
# All rights reserved
#
# Author: Geoffrey Simmons <geoffrey.simmons@uplex.de>
#
# See LICENSE
#

$Module re2 3 "Varnish Module for access to the Google RE2 regular expression engine"
$Prefix vmod

$Synopsis manual

$ABI vrt

SYNOPSIS
========

::

  import re2;

  # regex object interface
  new OBJECT = re2.regex(STRING pattern [, <regex options>])
  BOOL <obj>.match(STRING)
  STRING <obj>.backref(INT ref)
  STRING <obj>.namedref(STRING name)
  STRING <obj>.sub(STRING text, STRING rewrite)
  STRING <obj>.suball(STRING text, STRING rewrite)
  STRING <obj>.extract(STRING text, STRING rewrite)
  INT <obj>.cost()
  
  # regex function interface
  BOOL re2.match(STRING pattern, STRING subject [, <regex options>])
  STRING re2.backref(INT ref)
  STRING re2.namedref(STRING name)
  STRING re2.sub(STRING pattern, STRING text, STRING rewrite
                 [, <regex options>])
  STRING re2.suball(STRING pattern, STRING text, STRING rewrite
                    [, <regex options>])
  STRING re2.extract(STRING pattern, STRING text, STRING rewrite
                     [, <regex options>])
  INT re2.cost(STRING pattern [, <regex options>])

  # set object interface
  new OBJECT = re2.set([ENUM anchor] [, <regex options>])
  VOID <obj>.add(STRING [, BOOL save] [, BOOL never_capture] [, STRING string]
                 [, BACKEND backend] [, INT integer] [,SUB sub])
  BOOL <obj>.match(STRING)
  INT <obj>.nmatches()
  BOOL <obj>.matched(INT)
  INT <obj>.which([ENUM select])
  STRING <obj>.string([INT n,] [ENUM select])
  BACKEND <obj>.backend([INT n,] [ENUM select])
  INT     <obj>.integer([INT n] [, ENUM select])
  SUB     <obj>.subroutine([INT n] [, ENUM select])
  BOOL    <obj>.check_call([INT n] [, ENUM select])
  STRING <obj>.sub(STRING text, STRING rewrite [, INT n]
                   [, ENUM select])
  STRING <obj>.suball(STRING text, STRING rewrite [, INT n]
                      [, ENUM select])
  STRING <obj>.extract(STRING text, STRING rewrite [, INT n]
                       [, ENUM select])
  BOOL <obj>.saved([ENUM {REGEX, STR, BE, INT, SUB} which] [, INT n]
                   [, ENUM select])
  VOID <obj>.hdr_filter(HTTP [, BOOL])

  # utility function
  STRING re2.quotemeta(STRING)

  # VMOD version
  STRING re2.version()

DESCRIPTION
===========

Varnish Module (VMOD) for access to the Google RE2 regular expression engine.

Varnish VCL uses the PCRE library (Perl Compatible Regular Expressions) for
its native regular expressions, which runs very efficiently for many common
uses of pattern matching in VCL, as attested by years of successful use of
PCRE with Varnish.

But for certain kinds of patterns, the worst-case running time of the PCRE
matcher is exponential in the length of the string to be matched. The
matcher uses backtracking, implemented with recursive calls to the internal
``match()`` function. In principle there is no upper bound to the possible
depth of backtracking and recursion, except as imposed by the ``varnishd``
runtime parameters ``pcre_match_limit`` and ``pcre_match_limit_recursion``;
matches fail if either of these limits are met. Stack overflow caused by
deep backtracking has occasionally been the subject of ``varnishd`` issues.

RE2 differs from PCRE in that it limits the syntax of patterns so that they
always specify a regular language in the formally strict sense. Most notably,
backreferences within a pattern are not permitted, for example ``(foo|bar)\1``
to match ``foofoo`` and ``barbar``, but not ``foobar`` or ``barfoo``. See the
link in ``SEE ALSO`` for the specification of RE2 syntax.

This means that an RE2 matcher runs as a finite automaton, which guarantees
linear running time in the length of the matched string. There is no
backtracking, and hence no risk of deep recursion or stack overflow.

The relative advantages and disadvantages of RE2 and PCRE is a broad subject,
beyond the scope of this manual. See the references in ``SEE ALSO`` for more
in-depth discussion.

regex object and function interfaces
------------------------------------

The VMOD provides regular expression operations by way of the ``regex`` object
interface and a functional interface. For ``regex`` objects, the pattern is
compiled at VCL initialization time, and the compiled pattern is re-used for
each invocation of its methods. Compilation failures (due to errors in the
pattern) cause failure at initialization time, and the VCL fails to load. The
``.backref()`` and ``.namedref()`` methods refer back to the last invocation
of the ``.match()`` method for the same object.

The functional interface provides the same set of operations, but the pattern
is compiled at runtime on each invocation (and then discarded). Compilation
failures are reported as errors in the Varnish log. The ``backref()`` and
``namedref()`` functions refer back to the last invocation of the ``match()``
function, for any pattern.

Compiling a pattern at runtime on each invocation is considerably more costly
than re-using a compiled pattern. So for patterns that are fixed and known
at VCL initialization, the object interface should be used. The functional
interface should only be used for patterns whose contents are not known until
runtime.

set object interface
--------------------

``set`` objects provide a shorthand for constructing patterns that consist of
an alternation -- a group of patterns combined with ``|`` for "or". For
example::

  import re2;
  
  sub vcl_init {
        new myset = re2.set();
	myset.add("foo");	# Pattern 1
	myset.add("bar");	# Pattern 2
	myset.add("baz");	# Pattern 3
  }

``myset.match(<string>)`` can now be used to match a string against
the pattern ``foo|bar|baz``. When a match is successful, the matcher
has determined all of the patterns that matched. These can then be
retrieved with the method ``.nmatches()`` for the number of matched
patterns, and with ``.matched(n)``, which returns ``true`` if the
``nth`` pattern matched, where the patterns are numbered in the order
in which they were added::

  if (myset.match("foobar")) {
      std.log("Matched " + myset.nmatches() + " patterns");
      if (myset.matched(1)) {
          # Pattern /foo/ matched
          call do_foo;
      }
      if (myset.matched(2)) {
          # Pattern /bar/ matched
          call do_bar;
      }
      if (myset.matched(3)) {
          # Pattern /baz/ matched
          call do_baz;
      }
  }

An advantage of alternations and sets with RE2, as opposed to an
alternation in PCRE or a series of separate matches in an
if-elsif-elsif sequence, comes from the fact that the matcher is
implemented as a state machine. That means that the matcher progresses
through the string to be matched just once, following patterns in the
set that match through the state machine, or determining that there is
no match as soon as there are no more possible paths in the state
machine. So a string can be matched against a large set of patterns in
time that is proportional to the length of the string to be
matched. In contrast, PCRE matches patterns in an alternation one
after another, stopping after the first matching pattern, or
attempting matches against all of them if there is no match. Thus a
match against an alternation in PCRE is not unlike an if-elsif-elsif
sequence of individual matches, and requires the time needed for each
individual match, overall in proportion with the number of patterns to
be matched.

Another advantage of the VMOD's set object is the ability to associate
strings or backends with the patterns added to the set with the
``.add()`` method::

  sub vcl_init {
	new prefix = re2.set(anchor=start);
	prefix.add("/foo", string="www.domain1.com");
	prefix.add("/bar", string="www.domain2.com");
	prefix.add("/baz", string="www.domain3.com");
	prefix.add("/quux", string="www.domain4.com");

	new appmatcher = re2.set(anchor=start);
	appmatcher.add("/foo", backend=app1);
	appmatcher.add("/bar", backend=app2);
	appmatcher.add("/baz", backend=app3);
	appmatcher.add("/quux", backend=app4);
  }

After a successful match, the string or backend associated with the
matching pattern can be retrieved with the ``.string()`` and
``.backend()`` methods. This makes it possible, for example, to
construct a redirect response or choose the backend with code that is
both efficient and compact, even with a large set of patterns to be
matched::

  # Use the prefix object to construct a redirect response from
  # a matching request URL.
  sub vcl_recv {
      if (prefix.match(req.url)) {
          # Pass the string associated with the matching pattern
          # to vcl_synth.
          return(synth(1301, prefix.string()));
      }
  }

  sub vcl_synth {
      # The string associated with the matching pattern is in
      # resp.reason.
      if (resp.status == 1301) {
          set resp.http.Location = "http://" + resp.reason + req.url;
          set resp.status = 301;
          set resp.reason = "Moved Permanently";
      }
  }

  # Use the appmatcher object to choose a backend based on the
  # request URL prefix.
  sub vcl_recv {
      if (appmatcher.match(req.url)) {
          set req.backend_hint = appmatcher.backend();
      }
  }

regex options
-------------

Where a pattern is compiled -- in the ``regex`` and ``set`` constructors, and
in functions that require compilation -- options may be specified that can
affect the interpretation of the pattern or the operation of the matcher. There
are default values for each option, and it is only necessary to specify options
in VCL that differ from the defaults. Options specified in a ``set``
constructor apply to all of the patterns in the resulting alternation.

``utf8``
  If true, characters in a pattern match Unicode code points, and hence may
  match more than one byte. If false, the pattern and strings to be matched
  are interpreted as Latin-1 (ISO 8859-1), and a pattern character matches
  exactly one byte. Default is **false**. Note that this differs from the
  RE2 default.
``posix_syntax``
  If true, patterns are restricted to POSIX (egrep) syntax. Otherwise,
  the pattern syntax resembles that of PCRE, with some deviations. See the
  link in ``SEE ALSO`` for the syntax specification. Default is **false**.
  The options ``perl_classes``, ``word_boundary`` and ``one_line`` are
  only consulted when this option is true.
``longest_match``
  If true, the matcher searches for the longest possible match where
  alternatives are possible. Otherwise, search for the first match. For
  example with the pattern ``a(b|bb)`` and the string ``abb``, ``abb``
  matches when ``longest_match`` is true, and backref 1 is ``bb``. Otherwise,
  ``ab`` matches, and backref 1 is ``b``. Default is **false**.
``max_mem``
  An upper bound (in bytes) for the size of the compiled pattern. If ``max_mem``
  is too small, the matcher may fall back to less efficient algorithms, or the
  pattern may fail to compile. Default is the RE2 default (8MB), which should
  suffice for typical patterns.
``literal``
  If true, the pattern is interpreted as a literal string, and no regex
  metacharacters (such as ``*``, ``+``, ``^`` and so forth) have their special
  meaning. Default is **false**.
``never_nl``
  If true, the newline character ``\n`` in a string is never matched, even if it
  appears in the pattern. Default is **false**.
``dot_nl``
  If true, then the dot character ``.`` in a pattern matches everything,
  including newline. Otherwise, ``.`` never matches newline. Default is
  **false**.
``never_capture``
  If true, parentheses in a pattern are interpreted as non-capturing,
  and all invocations of the ``backref`` and ``namedref`` methods or
  functions will lead to VCL faillure (see `ERRORS`_), including
  ``backref(0)`` after a successful match. Default is **false**,
  except for set objects, for which ``never_capture`` is always true
  (and cannot be changed), since back references are not possible with
  sets.
``case_sensitive``
  If true, matches are case-sensitive. A pattern can override this option with
  the ``(?i)`` flag, unless ``posix_syntax`` is true. Default is **true**.

The following options are only consulted when ``posix_syntax`` is true. If
``posix_syntax`` is false, then these features are always enabled and cannot be
turned off.

``perl_classes``
  If true, then the perl character classes ``\d``, ``\s``, ``\w``, ``\D``,
  ``\S`` and ``\W`` are permitted in a pattern. Default is **false**.
``word_boundary``
  If true, the perl assertions ``\b`` and ``\B`` (word boundary and not a word
  boundary) are permitted. Default is **false**.
``one_line``
  If true, then ``^`` and ``$`` only match at the beginning and end of the
  string to be matched, regardless of newlines. Otherwise, ``^`` also matches
  just after a newline, and ``$`` also matches just before a newline. Default is
  **false**.

$Object regex(STRING pattern, BOOL utf8=0, BOOL posix_syntax=0,
              BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0,
	      BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0,
	      BOOL case_sensitive=1, BOOL perl_classes=0,
	      BOOL word_boundary=0, BOOL one_line=0)

Create a regex object from ``pattern`` and the given options (or
option defaults). If the pattern is invalid, then VCL will fail to
load and the VCC compiler will emit an error message.

Example::

  sub vcl_init {
      new domainmatcher = re2.regex("^www\.([^.]+)\.com$");
      new maxagematcher = re2.regex("max-age\s*=\s*(\d+)");

      # Group possible subdomains without capturing
      new submatcher = re2.regex("^www\.(domain1|domain2)\.com$",
	                         never_capture=true);
  }

$Method BOOL .match(STRING)

Returns ``true`` if and only if the compiled regex matches the given
string; corresponds to VCL's infix operator ``~``.

Example::

  if (myregex.match(req.http.Host)) {
     call do_on_match;
  }

$Method STRING .backref(INT ref, STRING fallback = "**BACKREF METHOD FAILED**")

Returns the `nth` captured subexpression from the most recent
successful call of the ``.match()`` method for this object in the same
client or backend context, or a fallback string in case the capture
fails. Backref 0 indicates the entire matched string. Thus this
function behaves like the ``\n`` notation in the native VCL functions
``regsub`` and ``regsuball``, and the ``$1``, ``$2`` ... variables in
Perl.

Since Varnish client and backend operations run in different threads,
``.backref()`` can only refer back to a ``.match()`` call in the same
thread. Thus a ``.backref()`` call in any of the ``vcl_backend_*``
subroutines -- the backend context -- refers back to a previous
``.match()`` in any of those same subroutines; and a call in any of
the other VCL subroutines -- the client context -- refers back to a
``.match()`` in the same client context.

``.backref()`` may return ``fallback`` after a successful match, if no
captured group in the matching string corresponds to the backref
number. For example, when the pattern ``(a|(b))c`` matches the string
``ac``, there is no backref 2, since nothing matches ``b`` in the
string. The default value of ``fallback`` is ``"**BACKREF METHOD
FAILED**"``, but you may set another value (such as the empty string).

After unsuccessful matches, ``.backref()`` invokes VCL failure (see
`ERRORS`_).  ``.backref()`` always fails after a failed match, even if
``.match()`` had been called successfully before the failure.

The VCL infix operators ``~`` and ``!~`` do not affect this method,
nor do the functions ``regsub`` or ``regsuball``. Nor is it affected
by the matches performed by any other method or function in this VMOD
(such as the ``sub()``, ``suball()`` or ``extract()`` methods or
functions, or the ``set`` object's ``.match()`` method).

``.backref()`` invokes VCL failure under the following conditions,
even if a previous match was successful and a substring could have
been captured (see `ERRORS`_):

* The ``fallback`` string is undefined, for example if set from an unset
  header variable.
* The ``never_capture`` option was set to ``true`` for this object. In this
  case, even ``.backref(0)`` fails after a successful match (otherwise, backref
  0 always returns the full matched string).
* ``ref`` (the backref number) is out of range, i.e. it is larger than the
  highest number for a capturing group in the pattern.
* ``.match()`` was never called for this object prior to calling ``.backref()``.
* There is insufficient workspace for the string to be returned.

Example::

  if (domainmatcher.match(req.http.Host)) {
     set req.http.X-Domain = domainmatcher.backref(1);
  }

$Method STRING .namedref(STRING name,
                         STRING fallback = "**NAMEDREF METHOD FAILED**")

Returns the captured subexpression designated by ``name`` from the
most recent successful call to ``.match()`` in the current context
(client or backend).

Named capturing groups are written in RE2 as: ``(?P<name>re)``. (Note
that this syntax with ``P``, inspired by Python, differs from the
notation for named capturing groups in PCRE.) Thus when
``(?P<foo>.+)bar$`` matches ``bazbar``, then ``.namedref("foo")``
returns ``baz``.

Note that a named capturing group can also be referenced as a numbered
group. So in the previous example, ``.backref(1)`` also returns
``baz``.

``fallback`` is returned when the named reference did not match. The
default fallback is ``"**NAMEDREF METHOD FAILED**"``.

Like ``.backref()``, ``.namedref()`` is not affected by native VCL
regex operations, nor by any other matches performed by methods or
functions of the VMOD, except for a prior ``.match()`` for the same
object.

``.namedref()`` invokes VCL failure (see `ERRORS`_) if:

* The ``fallback`` string is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``.match()`` was not called for this object.
* There is insufficient workspace for the string to be returned.

Example::

  sub vcl_init {
  	new domainmatcher = re2.regex("^www\.(?P<domain>[^.]+)\.com$");
  }

  sub vcl_recv {
  	if (domainmatcher.match(req.http.Host)) {
  	   set req.http.X-Domain = domainmatcher.namedref("domain");
	}
  }

$Method STRING .sub(STRING text, STRING rewrite,
                    STRING fallback = "**SUB METHOD FAILED**")

If the compiled pattern for this regex object matches ``text``, then
return the result of replacing the first match in ``text`` with
``rewrite``. Within ``rewrite``, ``\1`` through ``\9`` can be used to
insert the the numbered capturing group from the pattern, and ``\0``
to insert the entire matching text. This method corresponds to the VCL
native function ``regsub()``.

``fallback`` is returned if the pattern does not match ``text``. The
default fallback is ``"**SUB METHOD FAILED**"``.

``.sub()`` invokes VCL failure (see `ERRORS`_) if:

* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

  sub vcl_init {
      new bmatcher = re2.regex("b+");
  }

  sub vcl_recv {
      # If Host contains "www.yabba.dabba.doo.com", then this will
      # set X-Yada to "www.yada.dabba.doo.com".
      set req.http.X-Yada = bmatcher.sub(req.http.Host, "d");
  }

$Method STRING .suball(STRING text, STRING rewrite,
                       STRING fallback = "**SUBALL METHOD FAILED**")

Like ``.sub()``, except that all successive non-overlapping matches in
``text`` are replaced with ``rewrite``. This method corresponds to VCL
native ``regsuball()``.

The default fallback is ``"**SUBALL METHOD FAILED**"``. ``.suball()``
fails under the same conditions as ``.sub()``.

Since only non-overlapping matches are substituted, replacing
``"ana"`` within ``"banana"`` only results in one substitution, not
two.

Example::

  sub vcl_init {
      new bmatcher = re2.regex("b+");
  }

  sub vcl_recv {
      # If Host contains "www.yabba.dabba.doo.com", then set X-Yada to
      # "www.yada.dada.doo.com".
      set req.http.X-Yada = bmatcher.suball(req.http.Host, "d");
  }

$Method STRING .extract(STRING text, STRING rewrite,
                        STRING fallback = "**EXTRACT METHOD FAILED**")

If the compiled pattern for this regex object matches ``text``, then
return ``rewrite`` with substitutions from the matching portions of
``text``. Non-matching substrings of ``text`` are ignored.

The default fallback is ``"**EXTRACT METHOD FAILED**"``. Like
``.sub()`` and ``.suball()``, ``.extract()`` fails if:

* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

	sub vcl_init {
	    new email = re2.regex("(.*)@([^.]*)");
	}

	sub vcl_deliver {
	    # Sets X-UUCP to "kremvax!boris"
	    set resp.http.X-UUCP = email.extract("boris@kremvax.ru", "\2!\1");
	}

$Method INT .cost()

Return a numeric measurement > 0 for this regex object from the RE2
library.  According to the RE2 documentation:

  ... a very approximate measure of a regexp's "cost". Larger numbers
  are more expensive than smaller numbers.

The absolute numeric values are opaque and not relevant, but they are
meaningful relative to one another -- more complex regexen have a
higher cost than less complex regexen. This may be useful during
development and optimization of regular expressions.

Example::

  std.log("r1 cost=" + r1.cost() + " r_alt cost=" + r_alt.cost());

regex functional interface
==========================

$Function BOOL match(PRIV_TASK, STRING pattern, STRING subject, BOOL utf8=0,
                     BOOL posix_syntax=0, BOOL longest_match=0,
		     INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0,
		     BOOL dot_nl=0, BOOL never_capture=0,
		     BOOL case_sensitive=1, BOOL perl_classes=0,
		     BOOL word_boundary=0, BOOL one_line=0)

Like the ``regex.match()`` method, return ``true`` if ``pattern``
matches ``subject``, where ``pattern`` is compiled with the given
options (or default options) on each invocation.

If ``pattern`` fails to compile, then VCL failure is invoked (see
`ERRORS`_).

Example::

  # Match the bereq Host header against a backend response header
  if (re2.match(pattern=bereq.http.Host, subject=beresp.http.X-Host)) {
     call do_on_match;
  }

$Function STRING backref(PRIV_TASK, INT ref,
                         STRING fallback = "**BACKREF FUNCTION FAILED**")

Returns the `nth` captured subexpression from the most recent
successful call of the ``match()`` function in the current client or
backend context, or a fallback string if the capture fails. The
default ``fallback`` is ``"**BACKREF FUNCTION FAILED**"``.

Similarly to the ``regex.backref()`` method, ``fallback`` is returned
if there is no captured group corresponding to the backref number. The
function is not affected by native VCL regex operations, or any other
method or function of the VMOD except for the ``match()`` function.

The function invokes VCL failure under the same conditions as the
corresponding method (see `ERRORS`_):

* ``fallback`` is undefined.
* ``never_capture`` was true in the previous invocation of the ``match()``
  function.
* ``ref`` is out of range.
* The ``match()`` function was never called in this context, or if the
  previous ``match()`` call failed (returned ``false``).
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured subexpression.

Example::

  # Match against a pattern provided in a beresp header, and capture
  # subexpression 1.
  if (re2.match(pattern=beresp.http.X-Pattern, bereq.http.X-Foo)) {
     set beresp.http.X-Capture = re2.backref(1);
  }

$Function STRING namedref(PRIV_TASK, STRING name,
                          STRING fallback = "**NAMEDREF FUNCTION FAILED**")

Returns the captured subexpression designated by ``name`` from the
most recent successful call to the ``match()`` function in the current
context, or ``fallback`` if the corresponding group did not match. The
default fallback is ``"**NAMEDREF FUNCTION FAILED**"``.

The function invokes VCL failure under the same conditions as the
corresponding method (see `ERRORS`_):

* ``fallback`` is undefined.
* ``name`` is undefined or the empty string.
* The ``never_capture`` option was set to ``true``.
* There is no such named group.
* ``match()`` was not called in this context, or the previous call failed.
* The pattern failed to compile for the previous ``match()`` call.
* There is insufficient workspace for the captured expression.

Example::

  if (re2.match(beresp.http.X-Pattern-With-Names, bereq.http.X-Foo)) {
     set beresp.http.X-Capture = re2.namedref("foo");
  }

$Function STRING sub(STRING pattern, STRING text, STRING rewrite,
                     STRING fallback = "**SUB FUNCTION FAILED**",
		     BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0,
		     INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0,
		     BOOL dot_nl=0, BOOL never_capture=0, BOOL case_sensitive=1,
		     BOOL perl_classes=0, BOOL word_boundary=0, BOOL one_line=0)

Compiles ``pattern`` with the given options, and if it matches
``text``, then return the result of replacing the first match in
``text`` with ``rewrite``. As with the ``regex.sub()`` method, ``\0``
through ``\9`` may be used in ``rewrite`` to substitute captured
groups from the pattern.

``fallback`` is returned if the pattern does not match ``text``. The
default fallback is ``"**SUB FUNCTION FAILED**"``.

``sub()`` invokes VCL failure (see `ERRORS`_) if:

* ``pattern`` cannot be compiled.
* Any of ``text``, ``rewrite`` or ``fallback`` are undefined.
* There is insufficient workspace for the rewritten string.

Example::

  # If the beresp header X-Sub-Letters contains "b+", and Host contains
  # "www.yabba.dabba.doo.com", then set X-Yada to
  # "www.yada.dabba.doo.com".
  set beresp.http.X-Yada = re2.sub(beresp.http.X-Sub-Letters,
                                   bereq.http.Host, "d");

$Function STRING suball(STRING pattern, STRING text, STRING rewrite,
                        STRING fallback = "**SUBALL FUNCTION FAILED**",
		        BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0,
		        INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0,
		        BOOL dot_nl=0, BOOL never_capture=0,
			BOOL case_sensitive=1, BOOL perl_classes=0,
			BOOL word_boundary=0, BOOL one_line=0)

Like the ``sub()`` function, except that all successive
non-overlapping matches in ``text`` are replace with ``rewrite``.

The default fallback is ``"**SUBALL FUNCTION FAILED**"``. The
``suball()`` function fails under the same conditions as ``sub()``.

Example::

  # If the beresp header X-Sub-Letters contains "b+", and Host contains
  # "www.yabba.dabba.doo.com", then set X-Yada to
  # "www.yada.dada.doo.com".
  set beresp.http.X-Yada = re2.suball(beresp.http.X-Sub-Letters,
                                      bereq.http.Host, "d");

$Function STRING extract(STRING pattern, STRING text, STRING rewrite,
                         STRING fallback = "**EXTRACT FUNCTION FAILED**",
		         BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0,
		         INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0,
		         BOOL dot_nl=0, BOOL never_capture=0,
			 BOOL case_sensitive=1, BOOL perl_classes=0,
			 BOOL word_boundary=0, BOOL one_line=0)

Compiles ``pattern`` with the given options, and if it matches
``text``, then return ``rewrite`` with substitutions from the matching
portions of ``text``, ignoring the non-matching portions.

The default fallback is ``"**EXTRACT FUNCTION FAILED**"``. The
``extract()`` function fails under the same conditions as ``sub()``
and ``suball()``.

Example::

  # If beresp header X-Params contains "(foo|bar)=(baz|quux)", and the
  # URL contains "bar=quux", then set X-Query to "bar:quux".
  set beresp.http.X-Query = re2.extract(beresp.http.X-Params, bereq.url,
                                        "\1:\2");

$Function INT cost(STRING pattern, BOOL utf8=0, BOOL posix_syntax=0,
	           BOOL longest_match=0, INT max_mem=8388608, BOOL literal=0,
		   BOOL never_nl=0, BOOL dot_nl=0, BOOL never_capture=0,
		   BOOL case_sensitive=1, BOOL perl_classes=0,
		   BOOL word_boundary=0, BOOL one_line=0)

Like the ``.cost()`` method above, return a numeric measurement > 0
from the RE2 library for ``pattern`` with the given options. More
complex regexen have a higher cost than less complex regexen.

Invokes VCL failure if ``pattern`` cannot be compiled (see `ERRORS`_).

Example::

  std.log("simple cost=" + re2.cost("simple")
          + " complex cost=" + re2.cost("complex{1,128}"));

$Object set(PRIV_TASK, ENUM { none, start, both } anchor="none",
            BOOL utf8=0, BOOL posix_syntax=0, BOOL longest_match=0,
	    INT max_mem=8388608, BOOL literal=0, BOOL never_nl=0, BOOL dot_nl=0,
	    BOOL case_sensitive=1, BOOL perl_classes=0, BOOL word_boundary=0,
	    BOOL one_line=0)

Initialize a set object that represents several patterns combined by
alternation -- ``|`` for "or".

Optional parameters control the interpretation of the resulting
composed pattern. The ``anchor`` parameter is an enum that can have
the values ``none``, ``start`` or ``both``, where ``none`` is the
default. ``start`` means that each pattern is matched as if it begins
with ``^`` for start-of-text, and ``both`` means that each pattern is
anchored with both ``^`` at the beginning and ``$`` for end-of-text at
the end. ``none`` means that each pattern is interpreted as a partial
match (although individual patterns within the set may have either of
``^`` of ``$``).

For example, if a set is initialized with ``anchor=both``, and the
patterns ``foo`` and ``bar`` are added, then matches against the set
match a string against ``^foo$|^bar$``, or equivalently
``^(foo|bar)$``.

The usual regex options can be set, which then control matching
against the resulting composed pattern. However, the ``never_capture``
option cannot be set, and is always implicitly true, since backrefs
and namedrefs are not possible with sets.

Sets are compiled automatically when ``vcl_init`` finishes (or when
the deprecated ``.compile()`` method is called). Compilation fails if
any of the added patterns cannot be compiled, or if no patterns were
added to the set. It may also fail if the ``max_mem`` setting is not
large enough for the composed pattern. In that case, the VCL load will
fail with an error message (then consider a larger value for
``max_mem`` in the set constructor).

Example::

  sub vcl_init {
	# Initialize a regex set for partial matches
	# with default options
	new foo = re2.set();

	# Initialize a regex set for case insensitive matches
	# with anchors on both ends (^ and $).
	new bar = re2.set(anchor=both, case_sensitive=false);

	# Initialize a regex set using POSIX syntax, but allowing
	# Perl character classes, and anchoring at the left (^).
	new baz = re2.set(anchor=start, posix_syntax=true,
	                  perl_classes=true);
  }

$Method VOID .add(STRING, [STRING string], [BACKEND backend], [BOOL save],
                  [BOOL never_capture], [INT integer], [SUB sub])

Add the given pattern to the set. If the pattern is invalid,
``.add()`` fails, and the VCL will fail to load, with an error message
describing the problem.

If values for the ``string``, ``backend``, ``integer`` and/or ``sub``
parameters are provided, then these values can be retrieved with the
``.string()``, ``.backend()``, ``.integer()`` and ``.subroutine()``
methods, respectively, as described below. This makes it possible to
associate data with the added pattern after it matches
successfully. By default the pattern is not associated with any such
value.

If ``save`` is true, then the given pattern is compiled and saved as a
``regex`` object, just as if the ``regex`` constructor described above
is invoked. This object is stored internally in the ``set`` object as
an independent matcher, separate from "compound" pattern formed by the
set as an alternation of the patterns added to it. By default,
``save`` is **false**.

When the ``.match()`` method on the set is successful, and one of the
patterns that matched is associated with a saved internal ``regex``
object, then that object may be used for subsequent method invocations
such as ``.sub()`` on the set object, whose meanings are the same as
documented above for ``regex`` objects. Details are described below.

When an internal ``regex`` object is saved (i.e. when ``save`` is
true), it is compiled with the same options that were provided to the
set object in the constructor. The ``never_capture`` option can also
be set to false for the individual regex, even though it is implicitly
set to true for the full set object (default is false).

``.add()`` MUST be called in ``vcl_init``, and MAY NOT be called after
``.compile()``.  VCL failure is invoked if ``.add()`` is called in any
other subroutine (see `ERRORS`_). If it is called in ``vcl_init``
after ``.compile()``, then the VCL load will fail with an error
message. Note that ``.compile()`` is now unnecessary and deprecated.

When the ``.matched(INT)`` method is called after a successful match,
the numbering corresponds to the order in which patterns were added.
The same is true of the INT arguments that may be given for methods
such as ``.string()``, ``.backend()`` or ``.sub()``, as described
below.

Example::

  sub vcl_init {
      # literal=true means that the dots are interpreted as literal
      # dots, not "match any character".
      new hostmatcher = re2.set(anchor=both, case_sensitive=false,
                                literal=true);
      hostmatcher.add("www.domain1.com");
      hostmatcher.add("www.domain2.com");
      hostmatcher.add("www.domain3.com");
  }

  # See the documentation of the .string() and .backend() methods
  # below for uses of the parameters string and backend for .add().

$Method VOID .compile()

**This method is deprecated**, and will be removed in a future
version.  ``.compile()`` may be omitted, since compilation now happens
automatically when ``vcl_init`` finishes.

Compile the compound pattern represented by the set -- an alternation
of all patterns added by ``.add()``.

Compilation may fail for any of the reasons described for automatic
compilation of set objects as described above.

``.compile()`` MUST be called in ``vcl_init``, and MAY NOT be called
more than once for a set object. VCL failure is invoked if it is
called in any other subroutine. If it is called a second time in
``vcl_init``, the VCL load will fail.

$Method BOOL .match(STRING)

Returns ``true`` if the given string matches the compound pattern
represented by the set, i.e. if it matches any of the patterns that
were added to the set.

The matcher identifies all of the patterns that were added to the set
and match the given string. These can be determined after a successful
match using the ``.matched(INT)`` and ``.nmatches()`` methods
described below.

A match may also fail (leading to VCL failure) if the internal memory
limit imposed by the ``max_mem`` parameter in the constructor is
exceeded. (With the default value of ``max_mem``, this ordinarily
requires very large patterns and/or a very large string to be
matched.)  Since about version 2017-12-01, the RE2 library reports
this condition. If matches fail due to the out-of-memory condition,
increase the ``max_mem`` parameter in the constructor.

Example::

  if (hostmatcher.match(req.http.Host)) {
     call do_when_a_host_matched;
  }

$Method BOOL .matched(INT)

Returns ``true`` after a successful match if the ``nth`` pattern that
was added to the set is among the patterns that matched, ``false``
otherwise. The numbering of the patterns corresponds to the order in
which patterns were added in ``vcl_init``, counting from 1.

The method refers back to the most recent invocation of ``.match()``
for the same object in the same client or backend context. It always
returns ``false``, for every value of the parameter, if it is called
after an unsuccessful match (``.match()`` returned ``false``).

``.matched()`` invokes VCL failure (see `ERRORS`_) if:

* The ``.match()`` method was not called for this object in the same
  client or backend scope.

* The integer parameter is out of range; that is, if it is less than 1
  or greater than the number of patterns added to the set.

Example::

  if (hostmatcher.match(req.http.Host)) {
      if (hostmatcher.matched(1)) {
          call do_domain1;
      }
      if (hostmatcher.matched(2)) {
          call do_domain2;
      }
      if (hostmatcher.matched(3)) {
          call do_domain3;
      }
  }

$Method INT .nmatches()

Returns the number of patterns that were matched by the most recent
invocation of ``.match()`` for the same object in the same client or
backend context. The method always returns 0 after an unsuccessful
match (``.match()`` returned ``false``).

If ``.match()`` was not called for this object in the same client or
backend scope, ``.nmatches()`` invokes VCL failure (see `ERRORS`_).

Example::

  if (myset.match(req.url)) {
      std.log("URL matched " + myset.nmatches()
              + " patterns from the set");
  }

$Method INT .which(ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Returns a number indicating which pattern in a set matched in the most
recent invocation of ``.match()`` in the client or backend
context. The number corresponds to the order in which patterns were
added to the set in ``vcl_init``, counting from 1.

If exactly one pattern matched in the most recent ``.match()`` call
(so that ``.nmatches()`` returns 1), and the ``select`` ENUM is set to
``UNIQUE``, then the number for that pattern is returned. ``select``
defaults to ``UNIQUE``, so it can be left out in this case.

If more than one pattern matched in the most recent ``.match()`` call
(``.nmatches()`` > 1), then the ``select`` ENUM determines the integer
that is returned. The values ``FIRST`` and ``LAST`` specify that, of
the patterns that matched, the first or last one added via the
``.add()`` method is chosen, and the number for that pattern is
returned.

``.which()`` invokes VCL failure (see `ERRORS`_) if:

* ``.match()`` was not called for the set in the current client or
  backend transaction, or if the previous call returned ``false``.

* More than one pattern in the set matched in the previous
  ``.match()`` call, but the ``select`` parameter is set to ``UNIQUE``
  (or left out, since ``select`` defaults to ``UNIQUE``).

Examples::

  sub vcl_init {
      new myset = re2.set();
      myset.add("foo");	# Pattern 1
      myset.add("bar");	# Pattern 2
      myset.add("baz");	# Pattern 3
      myset.compile();
  }

  sub vcl_recv {
      if (myset.match("bar")) {
          # myset.which() returns 2.
      }
      if (myset.which("foobaz")) {
          # myset.which() fails and returns 0, with a log
          #               message indicating that 2 patterns
          #               matched.
          # myset.which(FIRST) returns 1.
          # myset.which(LAST) returns 3.
      }
      if (myset.match("quux")) {
          # ...
      }
      else {
          # myset.which() fails and returns 0, with either or
          # no value for the select ENUM, with a log message
          # indicating that the previous .match() call was
          # unsuccessful.
      }

$Method STRING .string(INT n=0, ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Returns the string associated with the `nth` pattern added to the set,
or with the pattern in the set that matched in the most recent call to
``.match()`` in the same task scope (client or backend context). The
string set with the ``string`` parameter of the ``.add()`` method
during ``vcl_init`` is returned.

The pattern is identified with the parameters ``n`` and ``select``
according to these rules, which also hold for all further ``set``
methods documented in the following.

* If ``n`` > 0, then select the `nth` pattern added to the set with
  the ``.add()`` method, counting from 1. This identifies the `nth`
  pattern in any context, regardless of whether ``.match()`` was
  called previously, or whether a previous call returned ``true`` or
  ``false``. The ``select`` parameter is ignored in this case.

* If ``n`` <= 0, then select a pattern in the set that matched
  successfully in the most recent call to ``.match()`` in the same
  task scope. Since ``n`` is 0 by default, ``n`` can be left out for
  this purpose.

* If ``n`` <= 0 and exactly one pattern in the set matched in the most
  recent invocation of ``.match()`` (and hence ``.nmatches()`` returns
  1), and ``select`` is set to ``UNIQUE``, then select that
  pattern. ``select`` defaults to ``UNIQUE``, so when exactly one
  pattern in the set matched, both ``n`` and ``select`` can be left
  out.

* If ``n`` <= 0 and more than one pattern matched in the most recent
  ``.match()`` call (``.nmatches()`` > 1), then the selection of a
  pattern is determined by the ``select`` parameter. As with
  ``.which()``, ``FIRST`` and ``LAST`` specify the first or last
  matching pattern added via the ``.add()`` method.

For the pattern selected by these rules, return the string that was
set with the ``string`` parameter in the ``.add()`` method that added
the pattern to the set.

``.string()`` invokes VCL failure (see `ERRORS`_) if:

* The values of ``n`` and ``select`` are invalid:

  * ``n`` is greater than the number of patterns in the set.

  * ``n`` <= 0 (or left to the default), but ``.match()`` was not
    called earlier in the same task scope (client or backend context).

  * ``n`` <= 0, but the previous ``.match()`` call returned ``false``.

  * ``n`` <= 0 and the ``select`` ENUM is ``UNIQUE`` (or default), but
    more than one pattern matched in the previous ``.match()`` call.
    This can be avoided by checking for ``.nmatches() == 1``.

* No string was associated with the pattern selected by ``n`` and
  ``select``; that is, the ``string`` parameter was not set in the
  ``.add()`` call that added the pattern. This can be avoided by
  checking the ``.saved()`` method (see below).

Examples::

  # Match the request URL against a set of patterns, and generate
  # a synthetic redirect response with a Location header derived
  # from the string assoicated with the matching pattern.

  # In the first example, exactly one pattern in the set matches.

  sub vcl_init {
      # With anchor=both, we specify exact matches.
      new matcher = re2.set(anchor=both);
      matcher.add("/foo/bar", "/baz/quux");
      matcher.add("/baz/bar/foo", "/baz/quux/foo");
      matcher.add("/quux/bar/baz/foo", "/baz/quux/foo/bar");
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          # Confirm that there was exactly one match
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          # Divert to vcl_synth, sending the string associated
          # with the matching pattern in the "reason" field.
          return(synth(1301, matcher.string()));
      }
  }

  sub vcl_synth {
      # Construct a redirect response, using the path set in
      # resp.reason.
      if (resp.status == 1301) {
          set resp.http.Location
              = "http://otherdomain.org" + resp.reason;
          set resp.status = 301;
          set resp.reason = "Moved Permanently";
          return(deliver);
      }
  }

  # In the second example, the patterns that may match have
  # common prefixes, and more than one pattern may match. We
  # add patterns to the set in a "more specific" to "less
  # specific" order, and we choose the most specific pattern
  # that matches, by specifying the first matching pattern in
  # the set.

  sub vcl_init {
      # With anchor=start, we specify matching prefixes.
      new matcher = re2.set(anchor=start);
      matcher.add("/foo/bar/baz/quux", "/baz/quux");
      matcher.add("/foo/bar/baz", "/baz/quux/foo");
      matcher.add("/foo/bar", "/baz/quux/foo/bar");
      matcher.add("/foo", "/baz");
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          # Select the first matching pattern
          return(synth(1301, matcher.string(select=FIRST)));
      }
  }

  # vcl_synth is implemented as shown above

$Method BACKEND .backend(INT n=0, ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Returns the backend associated with the `nth` pattern added to the
set, or with the pattern in the set that matched in the most recent
call to ``.match()`` in the same task scope (client or backend
context).

The rules for selecting a pattern from the set and its associated
backend based on ``n`` and ``select`` are the same as described above
for ``.string()``.

``.backend()`` invokes VCL failure under the same conditions described
for ``.string()`` above -- ``n`` and ``select`` are invalid, or no
backend was associated with the selected pattern with the ``.add()``
method (see `ERRORS`_).

Example::

  # Choose a backend based on the URL prefix.

  # In this example, assume that backends b1 through b4
  # have been defined.

  sub vcl_init {
      # Use anchor=start to match prefixes.
      # The prefixes are unique, so exactly one will match.
      new matcher = re2.set(anchor=start);
      matcher.add("/foo", backend=b1);
      matcher.add("/bar", backend=b2);
      matcher.add("/baz", backend=b3);
      matcher.add("/quux", backend=b4);
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          # Confirm that there was exactly one match
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          # Set the backend hint to the backend associated
          # with the matching pattern.
          set req.backend_hint = matcher.backend();
      }
  }

$Method INT .integer(INT n=0, ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Returns the integer associated with the `nth` pattern added to the
set, or with the pattern in the set that matched in the most recent
call to ``.match()`` in the same task scope.

The rules for selecting a pattern from the set and its associated
integer based on ``n`` and ``select`` are the same as described above
for ``.string()``.

``.integer()`` invokes VCL failure under the same error conditions
described for ``.string()`` above -- ``n`` and ``select`` are invalid,
or no integer was associated with the selected pattern with the
``.add()`` method (see `ERRORS`_).

Example::

  # Generate redirect responses based on the Host header. In the
  # example, subdomains are removed in the new Location, and the
  # associated integer is used to set the redirect status code.

  sub vcl_init {
      # No more than one pattern can match the same string. So it
      # is safe to call .integer() with default select=UNIQUE in
      # vcl_recv below (no risk of VCL failure).
      new redir = re2.set(anchor=both);
      redir.add("www\.[^.]+\.foo\.com", integer=301, string="www.foo.com");
      redir.add("www\.[^.]+\.bar\.com", integer=302, string="www.bar.com");
      redir.add("www\.[^.]+\.baz\.com", integer=303, string="www.baz.com");
      redir.add("www\.[^.]+\.quux\.com", integer=307, string="www.quux.com");
      redir.compile();
  }

  sub vcl_recv {
      if (redir.match(req.http.Host)) {
          # Construct a Location header that will be used in the
          # synthetic redirect response.
          set req.http.Location = "http://" + redir.string() + req.url;

	  # Set the response status from the associated integer.
	  return( synth(redir.integer()) );
      }
  }

  sub vcl_synth {
      if (resp.status >= 301 && resp.status <= 307) {
          # We come here from the synth return for the redirect
	  # response. The status code was set from .integer().
          set resp.http.Location = req.http.Location;
	  return(deliver);
      }
  }

$Method STRING .sub(STRING text, STRING rewrite,
                    STRING fallback="**SUB METHOD FAILED**",
		    INT n=0, ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Returns the result of the method call ``.sub(text, rewrite, fallback)``,
as documented above for the ``regex`` interface, invoked on the `nth`
pattern added to the set, or on the pattern in the set that matched in
the most recent call to ``.match()`` in the same task scope.

``.sub()`` requires that the pattern it identifies was saved as an
internal ``regex`` object, by setting ``save`` to true when it was
added with the ``.add()`` method.

The associated pattern is determined by ``n`` and ``select`` according
to the rules given above. If an internal ``regex`` object was saved
for that pattern, then the result of the ``.sub()`` method invoked on
that object is returned.

``.sub()`` invokes VCL failkure (see `ERRORS`_) if:

* The values of ``n`` and ``select`` are invalid, according to the
  rules given above.

* ``save`` was false in the ``.add()`` method for the pattern
  identified by ``n`` and ``select``; that is, no internal ``regex``
  object was saved on which the ``.sub()`` method could have been
  invoked.

* The ``.sub()`` method invoked on the ``regex`` object fails for any
  of the reasons described for ``regex.sub()``.

Examples::

  # Generate synthethic redirect responses on URLs that match a set of
  # patterns, rewriting the URL according to the matched pattern.

  # In this example, we set the new URL in the redirect location to
  # the path that comes after the prefix of the original req.url.
  sub vcl_init {
      new matcher = re2.set(anchor=start);
      matcher.add("/foo/(.*)", save=true);
      matcher.add("/bar/(.*)", save=true);
      matcher.add("/baz/(.*)", save=true);
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          return(synth(1301));
      }
  }

  sub vcl_synth {
      if (resp.status == 1301) {
      	  # matcher.sub() rewrites the URL to the subpath after the
	  # original prefix.
          set resp.http.Location
              = "http://www.otherdomain.org" + matcher.sub(req.url, "/\1");
          return(deliver);
      }
  }

$Method STRING .suball(STRING text, STRING rewrite,
                       STRING fallback="**SUBALL METHOD FAILED**",
		       INT n=0, ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Like the ``.sub()`` method, this returns the result of calling
``.suball(text, rewrite, fallback)`` from the regex interface on the
`nth` pattern added to the set, or the pattern that most recently
matched in a ``.match()`` call.

``.suball()`` is subject to the same conditions as the ``.sub()`` method:

* The pattern to which it is applied is identified by ``n`` and
  ``select`` according to the rules given above.

* It fails if:

  * The pattern that it identifies was not saved with ``.add(save=true)``.

  * The values of ``n`` or ``select`` are invalid.

  * The ``.suball()`` method invoked on the saved ``regex`` object
    fails.

Example::

  # In any URL that matches one of the words given below, replace all
  # occurrences of the matching word with "quux" (for example to
  # rewrite path components or elements of query strings).
  sub vcl_init {
      new matcher = re2.set();
      matcher.add("\bfoo\b", save=true);
      matcher.add("\bbar\b", save=true);
      matcher.add("\bbaz\b", save=true);
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          set req.url = matcher.suball(req.url, "quux");
      }
  }

$Method STRING .extract(STRING text, STRING rewrite,
                        STRING fallback="**EXTRACT METHOD FAILED**",
		        INT n=0, ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Like the ``.sub()`` and ``.suball()`` methods, this method returns the
result of calling ``.extract(text, rewrite, fallback)`` from the regex
interface on the `nth` pattern added to the set, or the pattern that most
recently matched in a ``.match()`` call.

``.extract()`` is subject to the same conditions as the other rewrite
methods:

* The pattern to which it is applied is identified by ``n`` and
  ``select`` according to the rules given above.

* It fails if:

  * The pattern that it identifies was not saved with ``.add(save=true)``.

  * The values of ``n`` or ``select`` are invalid.

  * The ``.extract()`` method invoked on the saved ``regex`` object
    fails.

Example::

  # Rewrite any URL that matches one of the patterns in the set
  # by exchanging the path components.
  sub vcl_init {
      new matcher = re2.set(anchor=both);
      matcher.add("/(foo)/(bar)/", save=true);
      matcher.add("/(bar)/(baz)/", save=true);
      matcher.add("/(baz)/(quux)/", save=true);
      matcher.compile();
  }

  sub vcl_recv {
      if (matcher.match(req.url)) {
          if (matcher.nmatches() != 1) {
              return(fail);
          }
          set req.url = matcher.extract(req.url, "/\2/\1/");
      }
  }

$Method SUB .subroutine(INT n=0, ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Returns the subroutine set by the ``sub`` parameter for the element of
the set indicated by ``n`` and ``select``, according to the rules
given above. The subroutine may be invoked with VCL ``call``.

**Note**: you must ensure that the subroutine may invoked legally in
the context in which it is called. This means that:

* The subroutine may only refer to VCL elements that are legal in the
  invocation context. For example, if the subroutine only refers to
  headers in ``req.http.*``, then it may be called in ``vcl_recv``,
  but not if it refers to any header in ``resp.http.*``. See
  ``vcl-var(7)`` for the specification of which VCL variables may be
  used in which contexts.

* Recursive subroutine calls are not permitted in VCL. The subroutine
  invocation may not appear anywhere in its own call stack.

For standard subroutine invocations with ``call``, the VCL compiler
checks these conditions and issues a compile-time error if either one
is violated. This is not possible with invocations using
``.subroutine()``; the error can only be determined at runtime. So it
is advisable to test the use of ``.subroutine()`` carefully before
using it in production. You can use the ``.check_call()`` method
described below to determine if the subroutine call is legal.

``.subroutine()`` invokes VCL failure (See `ERRORS`_) if:

* The rules for ``n`` and ``select`` indicate failure.

* No subroutine was set with the ``sub`` parameter in ``.add()``.

* The subroutine is invoked with ``call``, but the call is not legal
  in the invocation context, for the reasons given above.

Example::

  # Due to the use of resp.http.*, this subroutine may only be invoked
  # in vcl_deliver or vcl_synth, as documented in vcl-var(7). Note
  # that subroutine definitions must appear before vcl_init to
  # permitted for the sub parameter in .add().
  sub resp_sub {
      set resp.http.Call-Me = "but only in deliver or synth";
  }

  sub vcl_init {
      new myset = re2.set();
      myset.add("/foo", sub=resp_sub);
      myset.add("/foo/bar", sub=some_other_sub);
      # ...
  }

  sub vcl_deliver {
      if (myset.match(req.url)) {
      	 call myset.subroutine(select=FIRST);
      }
  }

$Method BOOL .check_call(INT n=0, ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Returns ``true`` iff the subroutine returned by ``.subroutine()`` for
the element of the set indicated by ``n`` and ``select`` may be
invoked legally in the current context. The conditions for legal
invocation are documented for ``.subroutine()`` above.

``.check_call()`` never invokes VCL failure, but rather returns
``false`` under conditions for which the use of ``.subroutine()``
would invoke VCL failure. In that case, a message is emitted to the
Vanrish log using the ``Notice`` tag (the same message that would
appear with the ``VCL_Error`` tag if the subroutine were called).

``Notice`` messages in the log produced by this VMOD are always
prefixed with the string ``vmod_re2: ``.

Example::

  # Assume that myset is declared as in the example above.
  sub vcl_deliver {
      if (myset.match(req.url)) {
      	 if (myset.check_call(select=FIRST)) {
      	    call myset.subroutine(select=FIRST);
         }
         else {
            call do_if_resp_sub_is_illegal;
         }
      }
  }

$Method BOOL .saved(ENUM {REGEX, STR, BE, INT, SUB} which=REGEX, INT n=0,
                    ENUM {FIRST, LAST, UNIQUE} select=UNIQUE)

Returns true if and only if an object of the type indicated by
``which`` was saved at initialization time for the ``nth`` pattern
added to the set, or for the pattern indicated by ``select`` after the
most recent ``.match()`` call.

In other words, ``.saved()`` returns true:

* for ``which=REGEX`` if the individual regex was saved with
  ``.add(save=true)`` for the indicated pattern

* for ``which=STR`` if a string was stored with the ``string``
  parameter in ``.add()``

* for ``which=BE`` if a backend was stored with the ``backend``
  attribute.

* for ``which=INT`` if an integer was stored with the ``integer``
  attribute.

* for ``which=SUB`` if an integer was stored with the ``sub``
  attribute.

The default value of ``which`` is ``REGEX``.

The pattern in the set is identified by ``n`` and ``select`` according
to the rules given above. ``.saved()`` invokes VCL failure if the
values of ``n`` or ``select`` are invalid (see `ERRORS`_).

Example::

  sub vcl_init {
      new s = re2.set();
      s.add("1", save=true, string="1", backend=b1);
      s.add("2", save=true, string="2");
      s.add("3", save=true, backend=b3);
      s.add("4", save=true);
      s.add("5", string="5", backend=b5);
      s.add("6", string="6");
      s.add("7", backend=b7);
      s.add("8");
      s.compile();
  }

  # Then the following holds for this set:
  # s.saved(n=1) == true	# for any value of which
  # s.saved(which=REGEX, n=2) == true
  # s.saved(which=STR, n=2)   == true
  # s.saved(which=BE, n=2)    == false
  # s.saved(which=REGEX, n=3) == true
  # s.saved(which=STR, n=3)   == false
  # s.saved(which=BE, n=3)    == true
  # s.saved(which=REGEX, n=4) == true
  # s.saved(which=STR, n=4)   == false
  # s.saved(which=BE, n=4)    == false
  # s.saved(which=REGEX, n=5) == false
  # s.saved(which=STR, n=5)   == true
  # s.saved(which=BE, n=5)    == true
  # s.saved(which=REGEX, n=6) == false
  # s.saved(which=STR, n=6)   == true
  # s.saved(which=BE, n=6)    == false
  # s.saved(which=REGEX, n=7) == false
  # s.saved(which=STR, n=7)   == false
  # s.saved(which=BE, n=7)    == true
  # s.saved(n=8) == false	# for any value of which

  if (s.match("4")) {
     # The fourth pattern has been uniquely matched.
     # So in this context: s.saved() == true
     # Since save=true was used in .add() for the 4th pattern,
     # and which=REGEX by default.
  }

$Method VOID .hdr_filter(HTTP, BOOL whitelist=1)

Filters the headers in the HTTP object, which may be one of ``req``,
``resp``, ``bereq``, or ``beresp``. In other words, filter the headers
in the client or backend request or response.

If ``whitelist`` is ``true``, then headers that match one of the
patterns in the set are retained, and all other headers are removed.
Otherwise, headers that match a pattern in the set are removed, and
all others are retained. By default, ``whitelist`` is ``true``.

Example::

  sub vcl_init {
	# Header whitelist
	new white = re2.set(anchor=start);
	white.add("Foo:");
	white.add("Bar:");
	white.add("Baz: baz$");
	white.compile();

	# Header blacklist
	new black = re2.set(anchor=start);
	black.add("Chaotic:");
	black.add("Evil:");
	black.add("Wicked: wicked$");
	black.compile();
  }

  sub vcl_recv {
	# Filter the client request header with the whitelist.
	# Headers that do not match any pattern in the set are removed.
	white.hdr_filter(req);
  }

  sub vcl_deliver {
	# Filter the client response header with the blacklist.
	# Headers that match any pattern in the set are removed.
	black.hdr_filter(resp, false);
  }

$Function STRING quotemeta(STRING)

Returns a copy of the argument string with all regex metacharacters
escaped via backslash. When the returned string is used as a regular
expression, it will exactly match the original string, regardless of
any special characters. This function has a purpose similar to a
``\Q..\E`` sequence within a regex, or the ``literal=true`` setting in
a regex constructor.

The function invokes VCL failure if there is insufficient workspace
for the return string (see `ERRORS`_).

Example::

  # The following are always true:
  re2.quotemeta("1.5-2.0?") == "1\.5\-2\.0\?"
  re2.match(re2.quotemeta("1.5-2.0?"), "1.5-2.0?")

$Function STRING version()

Return the version string for this VMOD.

Example::

  std.log("Using VMOD re2 version: " + re2.version());

ERRORS
======

Functions and methods of the VMOD may invoke VCL failure under
unrecoverable error conditions. The effects of VCL failure depend on
the VCL subroutine in which it takes place:

* If invoked during ``vcl_init``, then the VCL load fails, and an
  error message is returned over the CLI (reported by
  ``varnishadm(1)``).

* If invoked during any other subroutine besides ``vcl_synth``, then
  an error message is recorded in the log with the ``VCL_Error`` tag,
  further processing is aborted immediately, and a response with
  status 503 (Service Not Available) is returned with the reason
  string "VCL failed".

* If invoked during ``vcl_synth``, then further processing is aborted,
  the error message is logged with ``VCL_Error``, and the client
  connection is immediately closed -- the client receives no response.

Errors that lead to VCL failure include:

* Any regex compilation failure.

* Out of workspace errors (see `LIMITATIONS`_).

* Failures reported by the RE2 library for: matches, backrefs,
  namedrefs, the rewrite operations (sub, suball and extract), the
  ``.cost()`` function or method, and the ``.quotemeta()``
  function. The VMOD detects most common errors that would lead to
  library errors, and invokes VCL failure in such cases without
  calling the library. But library errors may happen under conditions
  such as out of memory.

* Functions and methods that require a previous successful match when
  there was no prior match, or when the previous match was
  unsuccessful.  These include backrefs, namedrefs, and the data
  retrieval methods for set objects.

* Any of the following parameters are undefined, for example when
  set from an unset header: fallbacks; patterns for the regex functions
  (which are compiled at runtime); the text and rewrite parameters
  for rewrite operations; the name parameter for namedrefs.

* The name parameter for namedrefs is the empty string.

* Backref number is out of range (greater than the number of backrefs
  in the pattern).

* Backref or namedref attempted when the ``never_capture`` option was
  set to ``true`` for the pattern.

* For set objects:

  * Numeric index (parameter ``n``) is out of range (greater than the
    number of patterns in the set).

  * Use of ``select=UNIQUE`` after more than one pattern was matched.
    The ``.nmatches()`` can be used to check for this condition, to
    avoid VCL failure -- ``UNIQUE`` will fail in ``.namtches()`` > 1.

  * Retrieval of data from a set (such as a string, backend etc) by
    numeric index (``n``) or "associatively" (after a match) when no
    such object was saved for the corresponding pattern. Use the
    ``.saved()`` and ``.check_call()`` methods to check for this.

  * Calling the subroutine returned by ``.subrooutine()`` may be
    illegal, if it is not permitted in the subroutine from which it is
    called, or if it would lead to recursive calls. Use the
    ``.check_call()`` method to check for this.

REQUIREMENTS
============

The VMOD requires Varnish since version 6.6, or the master branch. See
the source repository for versions of the VMOD that are compatible
with other Varnish versions.

It requires the RE2 library, and has been tested against RE2 versions
since 2015-06-01 (through 2021-04-01 at the time of writing).

If the VMOD is built against versions of RE2 since 2017-12-01, it uses
a version of the set match operation that reports out-of-memory
conditions during a match. (Versions of RE2 since June 2019 no longer
have this error, but nevertheless the different internal call is used
for set matches.) In that case, the VMOD is not compatible with
earlier versions of RE2. This is only a problem if the runtime version
of the library differs from the version against which the VMOD was
built. If you encounter this error, consider re-building the VMOD
against the runtime version of RE2, or installing a newer version of
RE2.

LIMITATIONS
===========

The VMOD allocates Varnish workspace for captured groups and rewritten
strings. If operations fail with "insufficient workspace" error
messages in the Varnish log (with the ``VCL_Error`` tag), increase the
varnishd runtime parameters ``workspace_client`` and/or
``workspace_backend``.

The RE2 documentation states that successful matches are slowed quite
a bit when they also capture substrings. There is also additional
overhead from the VMOD, unless the ``never_capture`` flag is true, to
manage data about captured groups in the workspace. This overhead is
incurred even if there are no capturing expressions in a pattern,
since it is always possible to call ``backref(0)`` to obtain the
matched portion of a string.

So if you are using a pattern only to match against strings, and never
to capture subexpressions, consider setting the ``never_capture``
option to true, to eliminate the extra work for both RE2 and the VMOD.

AUTHOR
======

* Geoffrey Simmons <geoff@uplex.de>

UPLEX Nils Goroll Systemoptimierung

SEE ALSO
========

* varnishd(1)

* vcl(7)

* VMOD source repository: https://code.uplex.de/uplex-varnish/libvmod-re2

  * Gitlab mirror: https://gitlab.com/uplex/varnish/libvmod-re2

* RE2 git repo: https://github.com/google/re2

* RE2 syntax: https://github.com/google/re2/wiki/Syntax

* "Implementing Regular Expressions": https://swtch.com/~rsc/regexp/

  * Series of articles motivating the design of RE2, with discussion
    of how RE2 compares with PCRE

$Event event