File: writingrules.rst

package info (click to toggle)
yara 4.5.5-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 13,884 kB
  • sloc: ansic: 52,295; yacc: 2,895; lex: 2,019; cpp: 863; makefile: 479; javascript: 85; sh: 47; python: 35
file content (1858 lines) | stat: -rw-r--r-- 58,466 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
*******************
Writing YARA rules
*******************

YARA rules are easy to write and understand, and they have a syntax that
resembles the C language. Here is the simplest rule that you can write for
YARA, which does absolutely nothing:

.. code-block:: yara

    rule dummy
    {
        condition:
            false
    }

Each rule in YARA starts with the keyword ``rule`` followed by a rule
identifier. Identifiers must follow the same lexical conventions of the C
programming language, they can contain any alphanumeric character and the
underscore character, but the first character cannot be a digit. Rule
identifiers are case sensitive and cannot exceed 128 characters. The following
keywords are reserved and cannot be used as an identifier:


.. list-table:: YARA keywords
   :widths: 10 10 10 10 10 10 10 10

   * - all
     - and
     - any
     - ascii
     - at
     - base64
     - base64wide
     - condition
   * - contains
     - endswith
     - entrypoint
     - false
     - filesize
     - for
     - fullword
     - global
   * - import
     - icontains
     - iendswith
     - iequals
     - in
     - include
     - int16
     - int16be
   * - int32
     - int32be
     - int8
     - int8be
     - istartswith
     - matches
     - meta
     - nocase
   * - none
     - not
     - of
     - or
     - private
     - rule
     - startswith
     - strings
   * - them
     - true
     - uint16
     - uint16be
     - uint32
     - uint32be
     - uint8
     - uint8be
   * - wide
     - xor
     - defined
     -
     -
     -
     -
     -

Rules are generally composed of two sections: strings definition and condition.
The strings definition section can be omitted if the rule doesn't rely on any
string, but the condition section is always required. The strings definition
section is where the strings that will be part of the rule are defined. Each
string has an identifier consisting of a $ character followed by a sequence of
alphanumeric characters and underscores, these identifiers can be used in the
condition section to refer to the corresponding string. Strings can be defined
in text or hexadecimal form, as shown in the following example:

.. code-block:: yara

    rule ExampleRule
    {
        strings:
            $my_text_string = "text here"
            $my_hex_string = { E2 34 A1 C8 23 FB }

        condition:
            $my_text_string or $my_hex_string
    }

Text strings are enclosed in double quotes just like in the C language. Hex
strings are enclosed by curly brackets, and they are composed by a sequence of
hexadecimal numbers that can appear contiguously or separated by spaces. Decimal
numbers are not allowed in hex strings.

The condition section is where the logic of the rule resides. This section must
contain a boolean expression telling under which circumstances a file or process
satisfies the rule or not. Generally, the condition will refer to previously
defined strings by using their identifiers. In this context the string
identifier acts as a boolean variable which evaluate to true if the string was
found in the file or process memory, or false if otherwise.

Comments
========

You can add comments to your YARA rules just as if it was a C source file, both
single-line and multi-line C-style comments are supported.

.. code-block:: yara

    /*
        This is a multi-line comment ...
    */

    rule CommentExample   // ... and this is single-line comment
    {
        condition:
            false  // just a dummy rule, don't do this
    }

Strings
=======

There are three types of strings in YARA: hexadecimal strings, text strings and
regular expressions. Hexadecimal strings are used for defining raw sequences of
bytes, while text strings and regular expressions are useful for defining
portions of legible text. However text strings and regular expressions can be
also used for representing raw bytes by mean of escape sequences as will be
shown below.

Hexadecimal strings
-------------------

Hexadecimal strings allow four special constructions that make them more
flexible: wild-cards, not operators, jumps, and alternatives. Wild-cards are just placeholders
that you can put into the string indicating that some bytes are unknown and they
should match anything. The placeholder character is the question mark (?). Here
you have an example of a hexadecimal string with wild-cards:

.. code-block:: yara

    rule WildcardExample
    {
        strings:
            $hex_string = { E2 34 ?? C8 A? FB }

        condition:
            $hex_string
    }

As shown in the example the wild-cards are nibble-wise, which means that you can
define just one nibble of the byte and leave the other unknown.

Starting with version 4.3.0, you may specify that a byte is not a specific
value. For that you can use the not operator with a byte value:

.. code-block:: yara

    rule NotExample
    {
        strings:
            $hex_string = { F4 23 ~00 62 B4 }
            $hex_string2 = { F4 23 ~?0 62 B4 }
        condition:
            $hex_string and $hex_string2
    }

In the example above we have a byte prefixed with a tilde (~), which is the not operator.
This defines that the byte in that location can take any value except the value specified.
In this case the first string will only match if the byte is not 00. The not operator can
also be used with nibble-wise wild-cards, so the second string will only match if the
second nibble is not zero.

Wild-cards and not operators are useful when defining strings whose content can vary but you know
the length of the variable chunks, however, this is not always the case. In some
circumstances you may need to define strings with chunks of variable content and
length. In those situations you can use jumps instead of wild-cards:

.. code-block:: yara

    rule JumpExample
    {
        strings:
            $hex_string = { F4 23 [4-6] 62 B4 }

        condition:
            $hex_string
    }

In the example above we have a pair of numbers enclosed in square brackets and
separated by a hyphen, that's a jump. This jump is indicating that any arbitrary
sequence from 4 to 6 bytes can occupy the position of the jump. Any of the
following strings will match the pattern::

    F4 23 01 02 03 04 62 B4
    F4 23 00 00 00 00 00 62 B4
    F4 23 15 82 A3 04 45 22 62 B4

Any jump [X-Y] must meet the condition 0 <= X <= Y. In previous versions of
YARA both X and Y must be lower than 256, but starting with YARA 2.0 there is
no limit for X and Y.

These are valid jumps::

    FE 39 45 [0-8] 89 00
    FE 39 45 [23-45] 89 00
    FE 39 45 [1000-2000] 89 00

This is invalid::

    FE 39 45 [10-7] 89 00

If the lower and higher bounds are equal you can write a single number enclosed
in brackets, like this::

    FE 39 45 [6] 89 00

The above string is equivalent to both of these::

    FE 39 45 [6-6] 89 00
    FE 39 45 ?? ?? ?? ?? ?? ?? 89 00

Starting with YARA 2.0 you can also use unbounded jumps::

    FE 39 45 [10-] 89 00
    FE 39 45 [-] 89 00

The first one means ``[10-infinite]``, the second one means ``[0-infinite]``.

There are also situations in which you may want to provide different
alternatives for a given fragment of your hex string. In those situations you
can use a syntax which resembles a regular expression:

.. code-block:: yara

    rule AlternativesExample1
    {
        strings:
            $hex_string = { F4 23 ( 62 B4 | 56 ) 45 }

        condition:
            $hex_string
    }

This rule will match any file containing ``F42362B445`` or ``F4235645``.

But more than two alternatives can be also expressed. In fact, there are no
limits to the amount of alternative sequences you can provide, and neither to
their lengths.

.. code-block:: yara

    rule AlternativesExample2
    {
        strings:
            $hex_string = { F4 23 ( 62 B4 | 56 | 45 ?? 67 ) 45 }

        condition:
            $hex_string
    }

As can be seen also in the above example, strings containing wild-cards are
allowed as part of alternative sequences.

Text strings
------------

As shown in previous sections, text strings are generally defined like this:

.. code-block:: yara

    rule TextExample
    {
        strings:
            $text_string = "foobar"

        condition:
            $text_string
    }

This is the simplest case: an ASCII-encoded, case-sensitive string. However,
text strings can be accompanied by some useful modifiers that alter the way in
which the string will be interpreted. Those modifiers are appended at the end of
the string definition separated by spaces, as will be discussed below.

Text strings can also contain the following subset of the escape sequences
available in the C language:

.. list-table::
   :widths: 3 10

   * - ``\"``
     - Double quote
   * - ``\\``
     - Backslash
   * - ``\r``
     - Carriage return
   * - ``\t``
     - Horizontal tab
   * - ``\n``
     - New line
   * - ``\xdd``
     - Any byte in hexadecimal notation

In all versions of YARA before 4.1.0 text strings accepted any kind of unicode
characters, regardless of their encoding. Those characters were interpreted by
YARA as raw bytes, and therefore the final string was actually determined by the
encoding format used by your text editor. This never meant to be a feature, the
original intention always was that YARA strings should be ASCII-only and YARA
4.1.0 started to raise warnings about non-ASCII characters in strings. This
limitation does not apply to strings in the metadata section or comments. See
more details `here <https://github.com/VirusTotal/yara/wiki/Unicode-characters-in-YARA>`_.


Case-insensitive strings
^^^^^^^^^^^^^^^^^^^^^^^^

Text strings in YARA are case-sensitive by default, however you can turn your
string into case-insensitive mode by appending the modifier ``nocase`` at the end
of the string definition, in the same line:

.. code-block:: yara

    rule CaseInsensitiveTextExample
    {
        strings:
            $text_string = "foobar" nocase

        condition:
            $text_string
    }

With the ``nocase`` modifier the string *foobar* will match *Foobar*, *FOOBAR*,
and *fOoBaR*. This modifier can be used in conjunction with any modifier,
except ``base64``, ``base64wide`` and ``xor``.

Wide-character strings
^^^^^^^^^^^^^^^^^^^^^^

The ``wide`` modifier can be used to search for strings encoded with two bytes
per character, something typical in many executable binaries.

For example, if the string "Borland" appears encoded as two bytes per
character (i.e. ``B\x00o\x00r\x00l\x00a\x00n\x00d\x00``), then the following rule will match:

.. code-block:: yara

    rule WideCharTextExample1
    {
        strings:
            $wide_string = "Borland" wide

        condition:
            $wide_string
    }

However, keep in mind that this modifier just interleaves the ASCII codes of
the characters in the string with zeroes, it does not support truly UTF-16
strings containing non-English characters. If you want to search for strings
in both ASCII and wide form, you can use the ``ascii`` modifier in conjunction
with ``wide`` , no matter the order in which they appear.

.. code-block:: yara

    rule WideCharTextExample2
    {
        strings:
            $wide_and_ascii_string = "Borland" wide ascii

        condition:
            $wide_and_ascii_string
    }

The ``ascii`` modifier can appear alone, without an accompanying ``wide``
modifier, but it's not necessary to write it because in absence of ``wide`` the
string is assumed to be ASCII by default.

XOR strings
^^^^^^^^^^^

The ``xor`` modifier can be used to search for strings with a single byte XOR
applied to them.

The following rule will search for every single byte XOR applied to the string
"This program cannot" (including the plaintext string):

.. code-block:: yara

    rule XorExample1
    {
        strings:
            $xor_string = "This program cannot" xor

        condition:
            $xor_string
    }

The above rule is logically equivalent to:

.. code-block:: yara

    rule XorExample2
    {
        strings:
            $xor_string_00 = "This program cannot"
            $xor_string_01 = "Uihr!qsnfs`l!b`oonu"
            $xor_string_02 = "Vjkq\"rpmepco\"acllmv"
            // Repeat for every single byte XOR
        condition:
            any of them
    }

You can also combine the ``xor`` modifier with ``fullword``, ``wide``, and ``ascii``
modifiers. For example, to search for the ``wide`` and ``ascii`` versions of a
string after every single byte XOR has been applied you would use:

.. code-block:: yara

    rule XorExample3
    {
        strings:
            $xor_string = "This program cannot" xor wide ascii
        condition:
            $xor_string
    }

The ``xor`` modifier is applied after every other modifier. This means that
using the ``xor`` and ``wide`` together results in the XOR applying to the
interleaved zero bytes. For example, the following two rules are logically
equivalent:

.. code-block:: yara

    rule XorExample4
    {
        strings:
            $xor_string = "This program cannot" xor wide
        condition:
            $xor_string
    }

    rule XorExample4
    {
        strings:
            $xor_string_00 = "T\x00h\x00i\x00s\x00 \x00p\x00r\x00o\x00g\x00r\x00a\x00m\x00 \x00c\x00a\x00n\x00n\x00o\x00t\x00"
            $xor_string_01 = "U\x01i\x01h\x01r\x01!\x01q\x01s\x01n\x01f\x01s\x01`\x01l\x01!\x01b\x01`\x01o\x01o\x01n\x01u\x01"
            $xor_string_02 = "V\x02j\x02k\x02q\x02\"\x02r\x02p\x02m\x02e\x02p\x02c\x02o\x02\"\x02a\x02c\x02l\x02l\x02m\x02v\x02"
            // Repeat for every single byte XOR operation.
        condition:
            any of them
    }

Since YARA 3.11, if you want more control over the range of bytes used with the ``xor`` modifier, use:

.. code-block:: yara

    rule XorExample5
    {
        strings:
            $xor_string = "This program cannot" xor(0x01-0xff)
        condition:
            $xor_string
    }

The above example will apply the bytes from 0x01 to 0xff, inclusively, to the
string when searching. The general syntax is ``xor(minimum-maximum)``.

Base64 strings
^^^^^^^^^^^^^^

The ``base64`` modifier can be used to search for strings that have been base64
encoded. A good explanation of the technique is at:

https://www.leeholmes.com/searching-for-content-in-base-64-strings/

The following rule will search for the three base64 permutations of the string
"This program cannot":

.. code-block:: yara

    rule Base64Example1
    {
        strings:
            $a = "This program cannot" base64

        condition:
            $a
    }

This will cause YARA to search for these three permutations:

| VGhpcyBwcm9ncmFtIGNhbm5vd
| RoaXMgcHJvZ3JhbSBjYW5ub3
| UaGlzIHByb2dyYW0gY2Fubm90

The ``base64wide`` modifier works just like the ``base64`` modifier but the results
of the ``base64`` modifier are converted to wide.

The interaction between ``base64`` (or ``base64wide``) and ``wide`` and
``ascii`` is as you might expect. ``wide`` and ``ascii`` are applied to the
string first, and then the ``base64`` and ``base64wide`` modifiers are applied.
At no point is the plaintext of the ``ascii`` or ``wide`` versions of the
strings included in the search. If you want to also include those you can put
them in a secondary string.

The ``base64`` and ``base64wide`` modifiers also support a custom alphabet. For
example:

.. code-block:: yara

    rule Base64Example2
    {
        strings:
            $a = "This program cannot" base64("!@#$%^&*(){}[].,|ABCDEFGHIJ\x09LMNOPQRSTUVWXYZabcdefghijklmnopqrstu")

        condition:
            $a
    }

The alphabet must be 64 bytes long.

The ``base64`` and ``base64wide`` modifiers are only supported with text
strings. Using these modifiers with a hexadecimal string or a regular expression
will cause a compiler error. Also, the ``xor``, ``fullword``, and ``nocase``
modifiers used in combination with ``base64`` or ``base64wide`` will cause
a compiler error.

Because of the way that YARA strips the leading and trailing characters after
base64 encoding, one of the base64 encodings of "Dhis program cannow" and
"This program cannot" are identical. Similarly, using the ``base64`` keyword on
single ASCII characters is not recommended. For example, "a" with the
``base64`` keyword matches "\`", "b", "c", "!", "\\xA1", or "\\xE1" after base64
encoding, and will not match where the base64 encoding matches the
``[GWm2][EFGH]`` regular expression.

Searching for full words
^^^^^^^^^^^^^^^^^^^^^^^^

Another modifier that can be applied to text strings is ``fullword``. This
modifier guarantees that the string will match only if it appears in the file
delimited by non-alphanumeric characters. For example the string *domain*, if
defined as ``fullword``, doesn't match *www.mydomain.com* but it matches
*www.my-domain.com* and *www.domain.com*.

Regular expressions
-------------------

Regular expressions are one of the most powerful features of YARA. They are
defined in the same way as text strings, but enclosed in forward slashes instead
of double-quotes, like in the Perl programming language.

.. code-block:: yara

    rule RegExpExample1
    {
        strings:
            $re1 = /md5: [0-9a-fA-F]{32}/
            $re2 = /state: (on|off)/

        condition:
            $re1 and $re2
    }

Regular expressions can be also followed by ``nocase``, ``ascii``, ``wide``,
and ``fullword`` modifiers just like in text strings. The semantics of these
modifiers are the same in both cases.

Additionally, they can be followed by the characters ``i`` and ``s`` just after
the closing slash, which is a very common convention for specifying that the
regular expression is case-insensitive and that the dot (``.``) can match
new-line characters. For example:

.. code-block:: yara

    rule RegExpExample2
    {
        strings:
            $re1 = /foo/i    // This regexp is case-insentitive
            $re2 = /bar./s   // In this regexp the dot matches everything, including new-line
            $re3 = /baz./is  // Both modifiers can be used together
        condition:
            any of them
    }

Notice that ``/foo/i`` is equivalent to ``/foo/ nocase``, but we recommend the
latter when defining strings. The ``/foo/i`` syntax is useful when writing
case-insensitive regular expressions for the ``matches`` operator.

In previous versions of YARA, external libraries like PCRE and RE2 were used
to perform regular expression matching, but starting with version 2.0 YARA uses
its own regular expression engine. This new engine implements most features
found in PCRE, except a few of them like capture groups, POSIX character
classes ([[:isalpha:]], [[:isdigit:]], etc) and backreferences.

YARA’s regular expressions recognise the following metacharacters:

.. list-table::
   :widths: 3 10

   * - ``\``
     - Quote the next metacharacter
   * - ``^``
     - Match the beginning of the file or negates a character class when used
       as the first character after the opening bracket
   * - ``$``
     - Match the end of the file
   * - ``.``
     - Matches any single character except a newline character
   * - ``|``
     - Alternation
   * - ``()``
     - Grouping
   * - ``[]``
     - Bracketed character class

The following quantifiers are recognised as well:

.. list-table::
   :widths: 3 10

   * - ``*``
     - Match 0 or more times
   * - ``+``
     - Match 1 or more times
   * - ``?``
     - Match 0 or 1 times
   * - ``{n}``
     - Match exactly n times
   * - ``{n,}``
     - Match at least n times
   * - ``{,m}``
     - Match at most m times
   * - ``{n,m}``
     - Match n to m times

All these quantifiers have a non-greedy variant, followed by a question
mark (?):

.. list-table::
   :widths: 3 10

   * - ``*?``
     - Match 0 or more times, non-greedy
   * - ``+?``
     - Match 1 or more times, non-greedy
   * - ``??``
     - Match 0 or 1 times, non-greedy
   * - ``{n}?``
     - Match exactly n times, non-greedy
   * - ``{n,}?``
     - Match at least n times, non-greedy
   * - ``{,m}?``
     - Match at most m times, non-greedy
   * - ``{n,m}?``
     - Match n to m times, non-greedy

The following escape sequences are recognised:

.. list-table::
   :widths: 3 10

   * - ``\t``
     - Tab (HT, TAB)
   * - ``\n``
     - New line (LF, NL)
   * - ``\r``
     - Return (CR)
   * - ``\f``
     - Form feed (FF)
   * - ``\a``
     - Alarm bell
   * - ``\xNN``
     - Character whose ordinal number is the given hexadecimal number


These are the recognised character classes:

.. list-table::
   :widths: 3 10

   * - ``\w``
     - Match a *word* character (alphanumeric plus “_”)
   * - ``\W``
     - Match a *non-word* character
   * - ``\s``
     - Match a whitespace character
   * - ``\S``
     - Match a non-whitespace character
   * - ``\d``
     - Match a decimal digit character
   * - ``\D``
     - Match a non-digit character


Starting with version 3.3.0 these zero-width assertions are also recognized:

.. list-table::
   :widths: 3 10

   * - ``\b``
     - Match a word boundary
   * - ``\B``
     - Match except at a word boundary


Private strings
---------------

All strings in YARA can be marked as ``private`` which means they will never be
included in the output of YARA. They are treated as normal strings everywhere
else, so you can still use them as you wish in the condition, but they will
never be shown with the ``-s`` flag or seen in the YARA callback if you're using
the C API.

.. code-block:: yara

    rule PrivateStringExample
    {
        strings:
            $text_string = "foobar" private

        condition:
            $text_string
    }

Unreferenced strings
--------------------

YARA 4.5.0 allows for unreferenced strings in the condition. If a string
identifier starts with an ``_`` then it does not have to be referenced in the
condition. Any other string must be referenced in the condition. This is useful
if you want to search for particular strings and handle them in a custom
callback but don't really need them for your condition logic.

.. code-block:: yara

    rule PrivateStringExample
    {
        strings:
            $_unreferenced = "AXSERS"

        condition:
            true
    }

String Modifier Summary
-----------------------

The following string modifiers are processed in the following order, but are only applicable
to the string types listed.

.. list-table:: Text string modifiers
   :widths: 3 5 10 10
   :header-rows: 1

   * - Keyword
     - String Types
     - Summary
     - Restrictions
   * - ``nocase``
     - Text, Regex
     - Ignore case
     - Cannot use with ``xor``, ``base64``, or ``base64wide``
   * - ``wide``
     - Text, Regex
     - Emulate UTF16 by interleaving null (0x00) characters
     - None
   * - ``ascii``
     - Text, Regex
     - Also match ASCII characters, only required if ``wide`` is used
     - None
   * - ``xor``
     - Text
     - XOR text string with single byte keys
     - Cannot use with ``nocase``, ``base64``, or ``base64wide``
   * - ``base64``
     - Text
     - Convert to 3 base64 encoded strings
     - Cannot use with ``nocase``, ``xor``, or ``fullword``
   * - ``base64wide``
     - Text
     - Convert to 3 base64 encoded strings, then interleaving null characters like ``wide``
     - Cannot use with ``nocase``, ``xor``, or ``fullword``
   * - ``fullword``
     - Text, Regex
     - Match is not preceded or followed by an alphanumeric character
     - Cannot use with ``base64`` or ``base64wide``
   * - ``private``
     - Hex, Text, Regex
     - Match never included in output
     - None


Conditions
==========

Conditions are nothing more than Boolean expressions as those that can be found
in all programming languages, for example in an *if* statement. They can contain
the typical Boolean operators ``and``, ``or``, and ``not``, and relational operators
``>=``, ``<=``, ``<``, ``>``, ``==`` and ``!=``. Also, the arithmetic operators
(``+``, ``-``, ``*``, ``\``, ``%``) and bitwise operators
(``&``, ``|``, ``<<``, ``>>``, ``~``, ``^``) can be used on numerical expressions.

Integers are always 64-bits long, even the results of functions like `uint8`,
`uint16` and `uint32` are promoted to 64-bits. This is something you must take
into account, specially while using bitwise operators (for example, ~0x01 is not
0xFE but 0xFFFFFFFFFFFFFFFE).

The following table lists the precedence and associativity of all operators. The
table is sorted in descending precedence order, which means that operators listed
on a higher row in the list are grouped prior operators listed in rows further
below it. Operators within the same row have the same precedence, if they appear
together in a expression the associativity determines how they are grouped.

==========  ===========  =========================================  =============
Precedence  Operator     Description                                Associativity
==========  ===========  =========================================  =============
1           []           Array subscripting                         Left-to-right

            .            Structure member access
----------  -----------  -----------------------------------------  -------------
2           `-`          Unary minus                                Right-to-left

            `~`          Bitwise not
----------  -----------  -----------------------------------------  -------------
3           `*`          Multiplication                             Left-to-right

            \\           Division

            %            Remainder
----------  -----------  -----------------------------------------  -------------
4           `+`          Addition                                   Left-to-right

            `-`          Subtraction
----------  -----------  -----------------------------------------  -------------
5           `<<`         Bitwise left shift                         Left-to-right

            `>>`         Bitwise right shift
----------  -----------  -----------------------------------------  -------------
6           &            Bitwise AND                                Left-to-right
----------  -----------  -----------------------------------------  -------------
7           ^            Bitwise XOR                                Left-to-right
----------  -----------  -----------------------------------------  -------------
8           `|`          Bitwise OR                                 Left-to-right
----------  -----------  -----------------------------------------  -------------
9           <            Less than                                  Left-to-right

            <=           Less than or equal to

            >            Greater than

            >=           Greater than or equal to
----------  -----------  -----------------------------------------  -------------
10          ==           Equal to                                   Left-to-right

            !=           Not equal to

            contains     String contains substring

            icontains    Like contains but case-insensitive

            startswith   String starts with substring

            istartswith  Like startswith but case-insensitive

            endswith     String ends with substring

            iendswith    Like endswith but case-insensitive

            iequals      Case-insensitive string comparison

            matches      String matches regular expression
----------  -----------  -----------------------------------------  -------------
11          not          Logical NOT                                Right-to-left
            defined      Check if an expression is defined
----------  -----------  -----------------------------------------  -------------
12          and          Logical AND                                Left-to-right
----------  -----------  -----------------------------------------  -------------
13          or           Logical OR                                 Left-to-right
==========  ===========  =========================================  =============


String identifiers can be also used within a condition, acting as Boolean
variables whose value depends on the presence or not of the associated string
in the file.

.. code-block:: yara

    rule Example
    {
        strings:
            $a = "text1"
            $b = "text2"
            $c = "text3"
            $d = "text4"

        condition:
            ($a or $b) and ($c or $d)
    }



Counting strings
----------------

Sometimes we need to know not only if a certain string is present or not,
but how many times the string appears in the file or process memory. The number
of occurrences of each string is represented by a variable whose name is the
string identifier but with a # character in place of the $ character.
For example:

.. code-block:: yara

    rule CountExample
    {
        strings:
            $a = "dummy1"
            $b = "dummy2"

        condition:
            #a == 6 and #b > 10
    }


This rule matches any file or process containing the string $a exactly six times,
and more than ten occurrences of string $b.

Starting with YARA 4.2.0 it is possible to express the count of a string in an
integer range, like this:

.. code-block:: yara

    #a in (filesize-500..filesize) == 2

In this example the number of 'a' strings in the last 500 bytes of the file must
equal exactly 2.

.. _string-offsets:

String offsets or virtual addresses
-----------------------------------

In the majority of cases, when a string identifier is used in a condition, we
are willing to know if the associated string is anywhere within the file or
process memory, but sometimes we need to know if the string is at some specific
offset on the file or at some virtual address within the process address space.
In such situations the operator ``at`` is what we need. This operator is used as
shown in the following example:

.. code-block:: yara

    rule AtExample
    {
        strings:
            $a = "dummy1"
            $b = "dummy2"

        condition:
            $a at 100 and $b at 200
    }

The expression ``$a at 100`` in the above example is true only if string $a is
found at offset 100 within the file (or at virtual address 100 if applied to
a running process). The string $b should appear at offset 200. Please note
that both offsets are decimal, however hexadecimal numbers can be written by
adding the prefix 0x before the number as in the C language, which comes very
handy when writing virtual addresses. Also note the higher precedence of the
operator ``at`` over the ``and``.

While the ``at`` operator allows to search for a string at some fixed offset in
the file or virtual address in a process memory space, the ``in`` operator
allows to search for the string within a range of offsets or addresses.

.. code-block:: yara

    rule InExample
    {
        strings:
            $a = "dummy1"
            $b = "dummy2"

        condition:
            $a in (0..100) and $b in (100..filesize)
    }

In the example above the string $a must be found at an offset between 0 and
100, while string $b must be at an offset between 100 and the end of the file.
Again, numbers are decimal by default.

You can also get the offset or virtual address of the i-th occurrence of string
$a by using @a[i]. The indexes are one-based, so the first occurrence would be
@a[1] the second one @a[2] and so on. If you provide an index greater than the
number of occurrences of the string, the result will be a NaN (Not A Number)
value.


Match length
------------

For many regular expressions and hex strings containing jumps, the length of
the match is variable. If you have the regular expression /fo*/ the strings
"fo", "foo" and "fooo" can be matches, all of them with a different length.

You can use the length of the matches as part of your condition by using the
character ! in front of the string identifier, in a similar way you use the @
character for the offset. !a[1] is the length for the first match of $a, !a[2]
is the length for the second match, and so on. !a is a abbreviated form of
!a[1].


File size
---------

String identifiers are not the only variables that can appear in a condition
(in fact, rules can be defined without any string definition as will be shown
below), there are other special variables that can be used as well. One of
these special variables is ``filesize``, which holds, as its name indicates,
the size of the file being scanned. The size is expressed in bytes.

.. code-block:: yara

    rule FileSizeExample
    {
        condition:
            filesize > 200KB
    }

The previous example also demonstrates the use of the ``KB`` postfix. This
postfix, when attached to a numerical constant, automatically multiplies the
value of the constant by 1024. The ``MB`` postfix can be used to multiply the
value by 2^20. Both postfixes can be used only with decimal constants.

The use of ``filesize`` only makes sense when the rule is applied to a file. If
the rule is applied to a running process it won’t ever match because
``filesize`` doesn’t make sense in this context.

Executable entry point
----------------------

Another special variable than can be used in a rule is ``entrypoint``. If the
file is a Portable Executable (PE) or Executable and Linkable Format (ELF),
this variable holds the raw offset of the executable’s entry point in case we
are scanning a file. If we are scanning a running process, the entrypoint will
hold the virtual address of the main executable’s entry point. A typical use of
this variable is to look for some pattern at the entry point to detect packers
or simple file infectors.

.. code-block:: yara

    rule EntryPointExample1
    {
        strings:
            $a = { E8 00 00 00 00 }

        condition:
            $a at entrypoint
    }

    rule EntryPointExample2
    {
        strings:
            $a = { 9C 50 66 A1 ?? ?? ?? 00 66 A9 ?? ?? 58 0F 85 }

        condition:
            $a in (entrypoint..entrypoint + 10)
    }

The presence of the ``entrypoint`` variable in a rule implies that only PE or
ELF files can satisfy that rule. If the file is not a PE or ELF, any rule using
this variable evaluates to false.

.. warning:: The ``entrypoint`` variable is deprecated, you should use the
    equivalent ``pe.entry_point`` from the :ref:`pe-module` instead. Starting
    with YARA 3.0 you'll get a warning if you use ``entrypoint`` and it will be
    completely removed in future versions.


Accessing data at a given position
----------------------------------

There are many situations in which you may want to write conditions that depend
on data stored at a certain file offset or virtual memory address, depending on
if we are scanning a file or a running process. In those situations you can use
one of the following functions to read data from the file at the given offset::

    int8(<offset or virtual address>)
    int16(<offset or virtual address>)
    int32(<offset or virtual address>)

    uint8(<offset or virtual address>)
    uint16(<offset or virtual address>)
    uint32(<offset or virtual address>)

    int8be(<offset or virtual address>)
    int16be(<offset or virtual address>)
    int32be(<offset or virtual address>)

    uint8be(<offset or virtual address>)
    uint16be(<offset or virtual address>)
    uint32be(<offset or virtual address>)

The ``intXX`` functions read 8, 16, and 32 bits signed integers from
<offset or virtual address>, while functions ``uintXX`` read unsigned integers.
Both 16 and 32 bit integers are considered to be little-endian. If you
want to read a big-endian integer use the corresponding function ending
in ``be``. The <offset or virtual address> parameter can be any expression returning
an unsigned integer, including the return value of one the ``uintXX`` functions
itself. As an example let's see a rule to distinguish PE files:

.. code-block:: yara

    rule IsPE
    {
        condition:
            // MZ signature at offset 0 and ...
            uint16(0) == 0x5A4D and
            // ... PE signature at offset stored in MZ header at 0x3C
            uint32(uint32(0x3C)) == 0x00004550
    }


.. _sets-of-strings:

Sets of strings
---------------

There are circumstances in which it is necessary to express that the file should
contain a certain number strings from a given set. None of the strings in the
set are required to be present, but at least some of them should be. In these
situations the ``of`` operator can be used.

.. code-block:: yara

    rule OfExample1
    {
        strings:
            $a = "dummy1"
            $b = "dummy2"
            $c = "dummy3"

        condition:
            2 of ($a,$b,$c)
    }

This rule requires that at least two of the strings in the set ($a,$b,$c)
must be present in the file, but it does not matter which two. Of course, when
using this operator, the number before the ``of`` keyword must be less than or
equal to the number of strings in the set.

The elements of the set can be explicitly enumerated like in the previous
example, or can be specified by using wild cards. For example:

.. code-block:: yara

    rule OfExample2
    {
        strings:
            $foo1 = "foo1"
            $foo2 = "foo2"
            $foo3 = "foo3"

        condition:
            2 of ($foo*)  // equivalent to 2 of ($foo1,$foo2,$foo3)
    }

    rule OfExample3
    {
        strings:
            $foo1 = "foo1"
            $foo2 = "foo2"

            $bar1 = "bar1"
            $bar2 = "bar2"

        condition:
            3 of ($foo*,$bar1,$bar2)
    }

You can even use ``($*)`` to refer to all the strings in your rule, or write
the equivalent keyword ``them`` for more legibility.

.. code-block:: yara

    rule OfExample4
    {
        strings:
            $a = "dummy1"
            $b = "dummy2"
            $c = "dummy3"

        condition:
            1 of them // equivalent to 1 of ($*)
    }

In all the examples above, the number of strings have been specified by a
numeric constant, but any expression returning a numeric value can be used.
The keywords ``any``, ``all`` and ``none`` can be used as well.

.. code-block:: yara

    all of them       // all strings in the rule
    any of them       // any string in the rule
    all of ($a*)      // all strings whose identifier starts by $a
    any of ($a,$b,$c) // any of $a, $b or $c
    1 of ($*)         // same that "any of them"
    none of ($b*)     // zero of the set of strings that start with "$b"

.. warning:: Due to the way YARA works internally, using "0 of them" is an
    ambiguous part of the language which should be avoided in favor of "none
    of them". To understand this, consider the meaning of "2 of them", which
    is true if 2 or more of the strings match. Historically, "0 of them"
    followed this principle and would evaluate to true if at least one of the
    strings matched. This ambiguity is resolved in YARA 4.3.0 by making "0 of
    them" evaluate to true if exactly 0 of the strings match. To improve on
    the situation and make the intent clear, it is encouraged to use "none" in
    place of 0. By not using an integer it is easier to reason about the meaning
    of "none of them" without the historical understanding of "at least 0"
    clouding the issue.


Starting with YARA 4.2.0 it is possible to express a set of strings in an
integer range, like this:

.. code-block:: yara

    all of ($a*) in (filesize-500..filesize)
    any of ($a*, $b*) in (1000..2000)

Starting with YARA 4.3.0 it is possible to express a set of strings at a
specific offset, like this:

.. code-block:: yara

    any of ($a*) at 0


Applying the same condition to many strings
-------------------------------------------

There is another operator very similar to ``of`` but even more powerful, the
``for..of`` operator. The syntax is:

.. code-block:: yara

    for expression of string_set : ( boolean_expression )

And its meaning is: from those strings in ``string_set`` at least ``expression``
of them must satisfy ``boolean_expression``.

In other words: ``boolean_expression`` is evaluated for every string in
``string_set`` and there must be at least ``expression`` of them returning
True.

Of course, ``boolean_expression`` can be any boolean expression accepted in
the condition section of a rule, except for one important detail: here you
can (and should) use a dollar sign ($) as a place-holder for the string being
evaluated. Take a look at the following expression:

.. code-block:: yara

    for any of ($a,$b,$c) : ( $ at pe.entry_point  )

The $ symbol in the boolean expression is not tied to any particular string,
it will be $a, and then $b, and then $c in the three successive evaluations
of the expression.

Maybe you already realised that the ``of`` operator is a special case of
``for..of``. The following expressions are the same:

.. code-block:: yara

    any of ($a,$b,$c)
    for any of ($a,$b,$c) : ( $ )

You can also employ the symbols #, @, and ! to make reference to the number of
occurrences, the first offset, and the length of each string respectively.

.. code-block:: yara

    for all of them : ( # > 3 )
    for all of ($a*) : ( @ > @b )


Starting with YARA 4.3.0 you can express conditions over text strings like this:

.. code-block:: yara

    for any s in ("71b36345516e076a0663e0bea97759e4", "1e7f7edeb06de02f2c2a9319de99e033") : ( pe.imphash() == s )

It is worth remembering here that the two hashes referenced in the rule are
normal text strings, and have nothing to do with the string section of the rule.
Inside the loop condition the result of the `pe.imphash()` function is compared
to each of the text strings, resulting in a more concise rule.


Using anonymous strings with ``of`` and ``for..of``
---------------------------------------------------

When using the ``of`` and ``for..of`` operators followed by ``them``, the
identifier assigned to each string of the rule is usually superfluous. As
we are not referencing any string individually we do not need to provide
a unique identifier for each of them. In those situations you can declare
anonymous strings with identifiers consisting only of the $ character, as in
the following example:

.. code-block:: yara

    rule AnonymousStrings
    {
        strings:
            $ = "dummy1"
            $ = "dummy2"

        condition:
            1 of them
    }


Iterating over string occurrences
---------------------------------

As seen in :ref:`string-offsets`, the offsets or virtual addresses where a given
string appears within a file or process address space can be accessed by
using the syntax: @a[i], where i is an index indicating which occurrence
of the string $a you are referring to. (@a[1], @a[2],...).

Sometimes you will need to iterate over some of these offsets and guarantee
they satisfy a given condition. In such cases you can use the ``for..in`` syntax,
for example:

.. code-block:: yara

    rule Occurrences
    {
        strings:
            $a = "dummy1"
            $b = "dummy2"

        condition:
            for all i in (1,2,3) : ( @a[i] + 10 == @b[i] )
    }

The previous rule says that the first occurrence of $b should be 10 bytes
after the first occurrence of $a, and the same should happen with the second
and third occurrences of the two strings.

The same condition could be written also as:

.. code-block:: yara

    for all i in (1..3) : ( @a[i] + 10 == @b[i] )

Notice that we’re using a range (1..3) instead of enumerating the index
values (1,2,3). Of course, we’re not forced to use constants to specify range
boundaries, we can use expressions as well like in the following example:

.. code-block:: yara

    for all i in (1..#a) : ( @a[i] < 100 )

In this case we’re iterating over every occurrence of $a (remember that #a
represents the number of occurrences of $a). This rule is specifying that every
occurrence of $a should be within the first 100 bytes of the file.

In case you want to express that only some occurrences of the string
should satisfy your condition, the same logic seen in the ``for..of`` operator
applies here:

.. code-block:: yara

    for any i in (1..#a) : ( @a[i] < 100 )
    for 2 i in (1..#a) : ( @a[i] < 100 )

The ``for..in`` operator is similar to ``for..of``, but the latter iterates over
a set of strings, while the former iterates over ranges, enumerations, arrays and
dictionaries.


Iterators
---------

In YARA 4.0 the ``for..in`` operator was improved and now it can be used to
iterate not only over integer enumerations and ranges (e.g: 1,2,3,4 and 1..4),
but also over any kind of iterable data type, like arrays and dictionaries
defined by YARA modules. For example, the following expression is valid in
YARA 4.0:

.. code-block:: yara

    for any section in pe.sections : ( section.name == ".text" )

This is equivalent to:

.. code-block:: yara

    for any i in (0..pe.number_of_sections-1) : ( pe.sections[i].name == ".text" )

The new syntax is more natural and easy to understand, and is the recommended
way of expressing this type of conditions in newer versions of YARA.

While iterating dictionaries you must provide two variable names that will
hold the key and value for each entry in the dictionary, for example:

.. code-block:: yara

    for any k,v in some_dict : ( k == "foo" and v == "bar" )

In general the ``for..in`` operator has the form:

.. code-block:: yara

    for <quantifier> <variables> in <iterable> : ( <some condition using the loop variables> )

Where `<quantifier>` is either `any`, `all` or an expression that evaluates to
the number of items in the iterator that must satisfy the condition, `<variables>`
is a comma-separated list of variable names that holds the values for the
current item (the number of variables depend on the type of `<iterable>`) and
`<iterable>` is something that can be iterated.


.. _referencing-rules:

Referencing other rules
-----------------------

When writing the condition for a rule you can also make reference to a
previously defined rule in a manner that resembles a function invocation of
traditional programming languages. In this way you can create rules that
depend on others. Let's see an example:

.. code-block:: yara

    rule Rule1
    {
        strings:
            $a = "dummy1"

        condition:
            $a
    }

    rule Rule2
    {
        strings:
            $a = "dummy2"

        condition:
            $a and Rule1
    }

As can be seen in the example, a file will satisfy Rule2 only if it contains
the string "dummy2" and satisfies Rule1. Note that it is strictly necessary to
define the rule being invoked before the one that will make the invocation.

Another way to reference other rules was introduced in 4.2.0 and that is sets
of rules, which operate similarly to sets of strings (see
:ref:`sets-of-strings)`. For example:

.. code-block:: yara

    rule Rule1
    {
        strings:
            $a = "dummy1"

        condition:
            $a
    }

    rule Rule2
    {
        strings:
            $a = "dummy2"

        condition:
            $a
    }

    rule MainRule
    {
        strings:
            $a = "dummy2"

        condition:
            any of (Rule*)
    }

This example demonstrates how to use rule sets to describe higher order logic
in a way which automatically grows with your rules. If you define another rule
named ``Rule3`` before ``MainRule`` then it will automatically be included in
the expansion of ``Rule*`` in the condition for MainRule.

To use rule sets all of the rules included in the set **must** exist prior to
the rule set being used. For example, the following will produce a compiler
error because ``a2`` is defined after the rule set is used in ``x``:

.. code-block:: yara

    rule a1 { condition: true }
    rule x { condition: 1 of (a*) }
    rule a2 { condition: true }

More about rules
================

There are some aspects of YARA rules that have not been covered yet, but are
still very important. These are: global rules, private rules, tags and
metadata.

Global rules
------------

Global rules give you the possibility of imposing restrictions in all your
rules at once. For example, suppose that you want all your rules to ignore
files that exceed a certain size limit. You could go rule by rule making
the required modifications to their conditions, or just write a global rule
like this one:

.. code-block:: yara

    global rule SizeLimit
    {
        condition:
            filesize < 2MB
    }

You can define as many global rules as you want, they will be evaluated
before the rest of the rules, which in turn will be evaluated only if all
global rules are satisfied.

Private rules
-------------

Private rules are a very simple concept. They are just rules that are not
reported by YARA when they match on a given file. Rules that are not reported
at all may seem sterile at first glance, but when mixed with the possibility
offered by YARA of referencing one rule from another (see
:ref:`referencing-rules`) they become useful. Private rules can serve as
building blocks for other rules, and at the same time prevent cluttering
YARA's output with irrelevant information. To declare a rule as private
just add the keyword ``private`` before the rule declaration.

.. code-block:: yara

    private rule PrivateRuleExample
    {
        ...
    }

You can apply both ``private`` and ``global`` modifiers to a rule, resulting in
a global rule that does not get reported by YARA but must be satisfied.

Rule tags
---------

Another useful feature of YARA is the possibility of adding tags to rules.
Those tags can be used later to filter YARA's output and show only the rules
that you are interested in. You can add as many tags as you want to a rule,
they are declared after the rule identifier as shown below:

.. code-block:: yara

    rule TagsExample1 : Foo Bar Baz
    {
        ...
    }

    rule TagsExample2 : Bar
    {
        ...
    }


Tags must follow the same lexical convention of rule identifiers, therefore
only alphanumeric characters and underscores are allowed, and the tag cannot
start with a digit. They are also case sensitive.

When using YARA you can output only those rules which are tagged with the tag
or tags that you provide.


Metadata
--------

Besides the string definition and condition sections, rules can also have a
metadata section where you can put additional information about your rule.
The metadata section is defined with the keyword ``meta`` and contains
identifier/value pairs like in the following example:

.. code-block:: yara

    rule MetadataExample
    {
        meta:
            my_identifier_1 = "Some string data"
            my_identifier_2 = 24
            my_identifier_3 = true

        strings:
            $my_text_string = "text here"
            $my_hex_string = { E2 34 A1 C8 23 FB }

        condition:
            $my_text_string or $my_hex_string
    }

As can be seen in the example, metadata identifiers are always followed by
an equals sign and the value assigned to them. The assigned values can be
strings (valid UTF8 only), integers, or one of the boolean values true or false.
Note that identifier/value pairs defined in the metadata section cannot be used
in the condition section, their only purpose is to store additional information
about the rule.

.. _using-modules:

Using modules
=============

Modules are extensions to YARA's core functionality. Some modules like
the :ref:`PE module <pe-module>` and the :ref:`Cuckoo module <cuckoo-module>`
are officially distributed with YARA and additional ones can be created by
third-parties or even yourself as described in :ref:`writing-modules`.

The first step to using a module is importing it with the ``import`` statement.
These statements must be placed outside any rule definition and followed by
the module name enclosed in double-quotes. Like this:

.. code-block:: yara

    import "pe"
    import "cuckoo"

After importing the module you can make use of its features, always using
``<module name>.`` as a prefix to any variable or function exported by the
module. For example:

.. code-block:: yara

    pe.entry_point == 0x1000
    cuckoo.http_request(/someregexp/)

.. _undefined-values:

Undefined values
================

Modules often leave variables in an undefined state, for example when the
variable doesn't make sense in the current context (think of ``pe.entry_point``
while scanning a non-PE file). YARA handles undefined values in a way that allows
the rule to keep its meaningfulness. Take a look at this rule:

.. code-block:: yara

    import "pe"

    rule Test
    {
        strings:
            $a = "some string"

        condition:
            $a and pe.entry_point == 0x1000
    }

If the scanned file is not a PE you wouldn't expect this rule to match the file,
even if it contains the string, because **both** conditions (the presence of
the string and the right value for the entry point) must be satisfied. However,
if the condition is changed to:

.. code-block:: yara

    $a or pe.entry_point == 0x1000

You would expect the rule to match in this case if the file contains the string,
even if it isn't a PE file. That's exactly how YARA behaves. The logic is as
follows:

* If the expression in the condition is undefined, it would be translated to
  ``false`` and the rule won't match.

* Boolean operators ``and`` and ``or`` will treat undefined operands as ``false``,
  Which means that:

  * ``undefined and true`` is ``false``
  * ``undefined and false`` is ``false``
  * ``undefined or true`` is ``true``
  * ``undefined or false`` is ``false``

* All the remaining operators, including the ``not`` operator, return undefined
  if any of their operands is undefined.

In the expression above, ``pe.entry_point == 0x1000`` will be undefined for non-PE
files, because ``pe.entry_point`` is undefined for those files. This implies that
``$a or pe.entry_point == 0x1000`` will be ``true`` if and only if ``$a`` is ``true``.

If the condition is ``pe.entry_point == 0x1000`` alone, it will evaluate to ``false``
for non-PE files, and so will do ``pe.entry_point != 0x1000`` and
``not pe.entry_point == 0x1000``, as none of these expressions make sense for non-PE
files.

To check if expression is defined use unary operator ``defined``. Example:

.. code-block:: yara

    defined pe.entry_point



External variables
==================

External variables allow you to define rules that depend on values provided
from the outside. For example, you can write the following rule:

.. code-block:: yara

    rule ExternalVariableExample1
    {
        condition:
            ext_var == 10
    }

In this case ``ext_var`` is an external variable whose value is assigned at
run-time (see ``-d`` option of command-line tool, and ``externals`` parameter of
``compile`` and ``match`` methods in yara-python). External variables could be
of types: integer, string or boolean; their type depends on the value assigned
to them. An integer variable can substitute any integer constant in the
condition and boolean variables can occupy the place of boolean expressions.
For example:

.. code-block:: yara

    rule ExternalVariableExample2
    {
        condition:
            bool_ext_var or filesize < int_ext_var
    }

External variables of type string can be used with the operators: ``contains``,
``startswith``, ``endswith`` and their case-insensitive counterparts: ``icontains``,
``istartswith`` and ``iendswith``. They can be used also with the ``matches``
operator, which returns true if the string matches a given regular expression.
Case-insensitive string comparison can be done through special operator ``iequals``
which only works with strings. For case-sensitive comparison use regular ``==``.

.. code-block:: yara

    rule ContainsExample
    {
        condition:
            string_ext_var contains "text"
    }

    rule CaseInsensitiveContainsExample
    {
        condition:
            string_ext_var icontains "text"
    }

    rule StartsWithExample
    {
        condition:
            string_ext_var startswith "prefix"
    }

    rule EndsWithExample
    {
        condition:
            string_ext_var endswith "suffix"
    }

    rule IequalsExample
    {
        condition:
            string_ext_var iequals "string"
    }

    rule MatchesExample
    {
        condition:
            string_ext_var matches /[a-z]+/
    }

You can use regular expression modifiers along with the ``matches`` operator,
for example, if you want the regular expression from the previous example
to be case insensitive you can use ``/[a-z]+/i``. Notice the ``i`` following the
regular expression in a Perl-like manner. You can also use the ``s`` modifier
for single-line mode, in this mode the dot matches all characters including
line breaks. Of course both modifiers can be used simultaneously, like in the
following example:

.. code-block:: yara

    rule ExternalVariableExample5
    {
        condition:
            /* case insensitive single-line mode */
            string_ext_var matches /[a-z]+/is
    }

Keep in mind that every external variable used in your rules must be defined
at run-time, either by using the ``-d`` option of the command-line tool, or by
providing the ``externals`` parameter to the appropriate method in
``yara-python``.


Including files
===============

In order to allow for more flexible organization of your rules files,
YARA provides the ``include`` directive. This directive works in a similar way
to the *#include* pre-processor directive in C programs, which inserts the
content of the specified source file into the current file during compilation.
The following example will include the content of *other.yar* into the current
file:

.. code-block:: yara

    include "other.yar"

The base path when searching for a file in an ``include`` directive will be the
directory where the current file resides. For this reason, the file *other.yar*
in the previous example should be located in the same directory of the current
file. However, you can also specify relative paths like these:

.. code-block:: yara

    include "./includes/other.yar"
    include "../includes/other.yar"

Or use absolute paths:

.. code-block:: yara

    include "/home/plusvic/yara/includes/other.yar"

In Windows, both forward and back slashes are accepted, but don’t forget to
write the drive letter:

.. code-block:: yara

    include "c:/yara/includes/other.yar"
    include "c:\\yara\\includes\\other.yar"