1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858
|
*******************
Writing YARA rules
*******************
YARA rules are easy to write and understand, and they have a syntax that
resembles the C language. Here is the simplest rule that you can write for
YARA, which does absolutely nothing:
.. code-block:: yara
rule dummy
{
condition:
false
}
Each rule in YARA starts with the keyword ``rule`` followed by a rule
identifier. Identifiers must follow the same lexical conventions of the C
programming language, they can contain any alphanumeric character and the
underscore character, but the first character cannot be a digit. Rule
identifiers are case sensitive and cannot exceed 128 characters. The following
keywords are reserved and cannot be used as an identifier:
.. list-table:: YARA keywords
:widths: 10 10 10 10 10 10 10 10
* - all
- and
- any
- ascii
- at
- base64
- base64wide
- condition
* - contains
- endswith
- entrypoint
- false
- filesize
- for
- fullword
- global
* - import
- icontains
- iendswith
- iequals
- in
- include
- int16
- int16be
* - int32
- int32be
- int8
- int8be
- istartswith
- matches
- meta
- nocase
* - none
- not
- of
- or
- private
- rule
- startswith
- strings
* - them
- true
- uint16
- uint16be
- uint32
- uint32be
- uint8
- uint8be
* - wide
- xor
- defined
-
-
-
-
-
Rules are generally composed of two sections: strings definition and condition.
The strings definition section can be omitted if the rule doesn't rely on any
string, but the condition section is always required. The strings definition
section is where the strings that will be part of the rule are defined. Each
string has an identifier consisting of a $ character followed by a sequence of
alphanumeric characters and underscores, these identifiers can be used in the
condition section to refer to the corresponding string. Strings can be defined
in text or hexadecimal form, as shown in the following example:
.. code-block:: yara
rule ExampleRule
{
strings:
$my_text_string = "text here"
$my_hex_string = { E2 34 A1 C8 23 FB }
condition:
$my_text_string or $my_hex_string
}
Text strings are enclosed in double quotes just like in the C language. Hex
strings are enclosed by curly brackets, and they are composed by a sequence of
hexadecimal numbers that can appear contiguously or separated by spaces. Decimal
numbers are not allowed in hex strings.
The condition section is where the logic of the rule resides. This section must
contain a boolean expression telling under which circumstances a file or process
satisfies the rule or not. Generally, the condition will refer to previously
defined strings by using their identifiers. In this context the string
identifier acts as a boolean variable which evaluate to true if the string was
found in the file or process memory, or false if otherwise.
Comments
========
You can add comments to your YARA rules just as if it was a C source file, both
single-line and multi-line C-style comments are supported.
.. code-block:: yara
/*
This is a multi-line comment ...
*/
rule CommentExample // ... and this is single-line comment
{
condition:
false // just a dummy rule, don't do this
}
Strings
=======
There are three types of strings in YARA: hexadecimal strings, text strings and
regular expressions. Hexadecimal strings are used for defining raw sequences of
bytes, while text strings and regular expressions are useful for defining
portions of legible text. However text strings and regular expressions can be
also used for representing raw bytes by mean of escape sequences as will be
shown below.
Hexadecimal strings
-------------------
Hexadecimal strings allow four special constructions that make them more
flexible: wild-cards, not operators, jumps, and alternatives. Wild-cards are just placeholders
that you can put into the string indicating that some bytes are unknown and they
should match anything. The placeholder character is the question mark (?). Here
you have an example of a hexadecimal string with wild-cards:
.. code-block:: yara
rule WildcardExample
{
strings:
$hex_string = { E2 34 ?? C8 A? FB }
condition:
$hex_string
}
As shown in the example the wild-cards are nibble-wise, which means that you can
define just one nibble of the byte and leave the other unknown.
Starting with version 4.3.0, you may specify that a byte is not a specific
value. For that you can use the not operator with a byte value:
.. code-block:: yara
rule NotExample
{
strings:
$hex_string = { F4 23 ~00 62 B4 }
$hex_string2 = { F4 23 ~?0 62 B4 }
condition:
$hex_string and $hex_string2
}
In the example above we have a byte prefixed with a tilde (~), which is the not operator.
This defines that the byte in that location can take any value except the value specified.
In this case the first string will only match if the byte is not 00. The not operator can
also be used with nibble-wise wild-cards, so the second string will only match if the
second nibble is not zero.
Wild-cards and not operators are useful when defining strings whose content can vary but you know
the length of the variable chunks, however, this is not always the case. In some
circumstances you may need to define strings with chunks of variable content and
length. In those situations you can use jumps instead of wild-cards:
.. code-block:: yara
rule JumpExample
{
strings:
$hex_string = { F4 23 [4-6] 62 B4 }
condition:
$hex_string
}
In the example above we have a pair of numbers enclosed in square brackets and
separated by a hyphen, that's a jump. This jump is indicating that any arbitrary
sequence from 4 to 6 bytes can occupy the position of the jump. Any of the
following strings will match the pattern::
F4 23 01 02 03 04 62 B4
F4 23 00 00 00 00 00 62 B4
F4 23 15 82 A3 04 45 22 62 B4
Any jump [X-Y] must meet the condition 0 <= X <= Y. In previous versions of
YARA both X and Y must be lower than 256, but starting with YARA 2.0 there is
no limit for X and Y.
These are valid jumps::
FE 39 45 [0-8] 89 00
FE 39 45 [23-45] 89 00
FE 39 45 [1000-2000] 89 00
This is invalid::
FE 39 45 [10-7] 89 00
If the lower and higher bounds are equal you can write a single number enclosed
in brackets, like this::
FE 39 45 [6] 89 00
The above string is equivalent to both of these::
FE 39 45 [6-6] 89 00
FE 39 45 ?? ?? ?? ?? ?? ?? 89 00
Starting with YARA 2.0 you can also use unbounded jumps::
FE 39 45 [10-] 89 00
FE 39 45 [-] 89 00
The first one means ``[10-infinite]``, the second one means ``[0-infinite]``.
There are also situations in which you may want to provide different
alternatives for a given fragment of your hex string. In those situations you
can use a syntax which resembles a regular expression:
.. code-block:: yara
rule AlternativesExample1
{
strings:
$hex_string = { F4 23 ( 62 B4 | 56 ) 45 }
condition:
$hex_string
}
This rule will match any file containing ``F42362B445`` or ``F4235645``.
But more than two alternatives can be also expressed. In fact, there are no
limits to the amount of alternative sequences you can provide, and neither to
their lengths.
.. code-block:: yara
rule AlternativesExample2
{
strings:
$hex_string = { F4 23 ( 62 B4 | 56 | 45 ?? 67 ) 45 }
condition:
$hex_string
}
As can be seen also in the above example, strings containing wild-cards are
allowed as part of alternative sequences.
Text strings
------------
As shown in previous sections, text strings are generally defined like this:
.. code-block:: yara
rule TextExample
{
strings:
$text_string = "foobar"
condition:
$text_string
}
This is the simplest case: an ASCII-encoded, case-sensitive string. However,
text strings can be accompanied by some useful modifiers that alter the way in
which the string will be interpreted. Those modifiers are appended at the end of
the string definition separated by spaces, as will be discussed below.
Text strings can also contain the following subset of the escape sequences
available in the C language:
.. list-table::
:widths: 3 10
* - ``\"``
- Double quote
* - ``\\``
- Backslash
* - ``\r``
- Carriage return
* - ``\t``
- Horizontal tab
* - ``\n``
- New line
* - ``\xdd``
- Any byte in hexadecimal notation
In all versions of YARA before 4.1.0 text strings accepted any kind of unicode
characters, regardless of their encoding. Those characters were interpreted by
YARA as raw bytes, and therefore the final string was actually determined by the
encoding format used by your text editor. This never meant to be a feature, the
original intention always was that YARA strings should be ASCII-only and YARA
4.1.0 started to raise warnings about non-ASCII characters in strings. This
limitation does not apply to strings in the metadata section or comments. See
more details `here <https://github.com/VirusTotal/yara/wiki/Unicode-characters-in-YARA>`_.
Case-insensitive strings
^^^^^^^^^^^^^^^^^^^^^^^^
Text strings in YARA are case-sensitive by default, however you can turn your
string into case-insensitive mode by appending the modifier ``nocase`` at the end
of the string definition, in the same line:
.. code-block:: yara
rule CaseInsensitiveTextExample
{
strings:
$text_string = "foobar" nocase
condition:
$text_string
}
With the ``nocase`` modifier the string *foobar* will match *Foobar*, *FOOBAR*,
and *fOoBaR*. This modifier can be used in conjunction with any modifier,
except ``base64``, ``base64wide`` and ``xor``.
Wide-character strings
^^^^^^^^^^^^^^^^^^^^^^
The ``wide`` modifier can be used to search for strings encoded with two bytes
per character, something typical in many executable binaries.
For example, if the string "Borland" appears encoded as two bytes per
character (i.e. ``B\x00o\x00r\x00l\x00a\x00n\x00d\x00``), then the following rule will match:
.. code-block:: yara
rule WideCharTextExample1
{
strings:
$wide_string = "Borland" wide
condition:
$wide_string
}
However, keep in mind that this modifier just interleaves the ASCII codes of
the characters in the string with zeroes, it does not support truly UTF-16
strings containing non-English characters. If you want to search for strings
in both ASCII and wide form, you can use the ``ascii`` modifier in conjunction
with ``wide`` , no matter the order in which they appear.
.. code-block:: yara
rule WideCharTextExample2
{
strings:
$wide_and_ascii_string = "Borland" wide ascii
condition:
$wide_and_ascii_string
}
The ``ascii`` modifier can appear alone, without an accompanying ``wide``
modifier, but it's not necessary to write it because in absence of ``wide`` the
string is assumed to be ASCII by default.
XOR strings
^^^^^^^^^^^
The ``xor`` modifier can be used to search for strings with a single byte XOR
applied to them.
The following rule will search for every single byte XOR applied to the string
"This program cannot" (including the plaintext string):
.. code-block:: yara
rule XorExample1
{
strings:
$xor_string = "This program cannot" xor
condition:
$xor_string
}
The above rule is logically equivalent to:
.. code-block:: yara
rule XorExample2
{
strings:
$xor_string_00 = "This program cannot"
$xor_string_01 = "Uihr!qsnfs`l!b`oonu"
$xor_string_02 = "Vjkq\"rpmepco\"acllmv"
// Repeat for every single byte XOR
condition:
any of them
}
You can also combine the ``xor`` modifier with ``fullword``, ``wide``, and ``ascii``
modifiers. For example, to search for the ``wide`` and ``ascii`` versions of a
string after every single byte XOR has been applied you would use:
.. code-block:: yara
rule XorExample3
{
strings:
$xor_string = "This program cannot" xor wide ascii
condition:
$xor_string
}
The ``xor`` modifier is applied after every other modifier. This means that
using the ``xor`` and ``wide`` together results in the XOR applying to the
interleaved zero bytes. For example, the following two rules are logically
equivalent:
.. code-block:: yara
rule XorExample4
{
strings:
$xor_string = "This program cannot" xor wide
condition:
$xor_string
}
rule XorExample4
{
strings:
$xor_string_00 = "T\x00h\x00i\x00s\x00 \x00p\x00r\x00o\x00g\x00r\x00a\x00m\x00 \x00c\x00a\x00n\x00n\x00o\x00t\x00"
$xor_string_01 = "U\x01i\x01h\x01r\x01!\x01q\x01s\x01n\x01f\x01s\x01`\x01l\x01!\x01b\x01`\x01o\x01o\x01n\x01u\x01"
$xor_string_02 = "V\x02j\x02k\x02q\x02\"\x02r\x02p\x02m\x02e\x02p\x02c\x02o\x02\"\x02a\x02c\x02l\x02l\x02m\x02v\x02"
// Repeat for every single byte XOR operation.
condition:
any of them
}
Since YARA 3.11, if you want more control over the range of bytes used with the ``xor`` modifier, use:
.. code-block:: yara
rule XorExample5
{
strings:
$xor_string = "This program cannot" xor(0x01-0xff)
condition:
$xor_string
}
The above example will apply the bytes from 0x01 to 0xff, inclusively, to the
string when searching. The general syntax is ``xor(minimum-maximum)``.
Base64 strings
^^^^^^^^^^^^^^
The ``base64`` modifier can be used to search for strings that have been base64
encoded. A good explanation of the technique is at:
https://www.leeholmes.com/searching-for-content-in-base-64-strings/
The following rule will search for the three base64 permutations of the string
"This program cannot":
.. code-block:: yara
rule Base64Example1
{
strings:
$a = "This program cannot" base64
condition:
$a
}
This will cause YARA to search for these three permutations:
| VGhpcyBwcm9ncmFtIGNhbm5vd
| RoaXMgcHJvZ3JhbSBjYW5ub3
| UaGlzIHByb2dyYW0gY2Fubm90
The ``base64wide`` modifier works just like the ``base64`` modifier but the results
of the ``base64`` modifier are converted to wide.
The interaction between ``base64`` (or ``base64wide``) and ``wide`` and
``ascii`` is as you might expect. ``wide`` and ``ascii`` are applied to the
string first, and then the ``base64`` and ``base64wide`` modifiers are applied.
At no point is the plaintext of the ``ascii`` or ``wide`` versions of the
strings included in the search. If you want to also include those you can put
them in a secondary string.
The ``base64`` and ``base64wide`` modifiers also support a custom alphabet. For
example:
.. code-block:: yara
rule Base64Example2
{
strings:
$a = "This program cannot" base64("!@#$%^&*(){}[].,|ABCDEFGHIJ\x09LMNOPQRSTUVWXYZabcdefghijklmnopqrstu")
condition:
$a
}
The alphabet must be 64 bytes long.
The ``base64`` and ``base64wide`` modifiers are only supported with text
strings. Using these modifiers with a hexadecimal string or a regular expression
will cause a compiler error. Also, the ``xor``, ``fullword``, and ``nocase``
modifiers used in combination with ``base64`` or ``base64wide`` will cause
a compiler error.
Because of the way that YARA strips the leading and trailing characters after
base64 encoding, one of the base64 encodings of "Dhis program cannow" and
"This program cannot" are identical. Similarly, using the ``base64`` keyword on
single ASCII characters is not recommended. For example, "a" with the
``base64`` keyword matches "\`", "b", "c", "!", "\\xA1", or "\\xE1" after base64
encoding, and will not match where the base64 encoding matches the
``[GWm2][EFGH]`` regular expression.
Searching for full words
^^^^^^^^^^^^^^^^^^^^^^^^
Another modifier that can be applied to text strings is ``fullword``. This
modifier guarantees that the string will match only if it appears in the file
delimited by non-alphanumeric characters. For example the string *domain*, if
defined as ``fullword``, doesn't match *www.mydomain.com* but it matches
*www.my-domain.com* and *www.domain.com*.
Regular expressions
-------------------
Regular expressions are one of the most powerful features of YARA. They are
defined in the same way as text strings, but enclosed in forward slashes instead
of double-quotes, like in the Perl programming language.
.. code-block:: yara
rule RegExpExample1
{
strings:
$re1 = /md5: [0-9a-fA-F]{32}/
$re2 = /state: (on|off)/
condition:
$re1 and $re2
}
Regular expressions can be also followed by ``nocase``, ``ascii``, ``wide``,
and ``fullword`` modifiers just like in text strings. The semantics of these
modifiers are the same in both cases.
Additionally, they can be followed by the characters ``i`` and ``s`` just after
the closing slash, which is a very common convention for specifying that the
regular expression is case-insensitive and that the dot (``.``) can match
new-line characters. For example:
.. code-block:: yara
rule RegExpExample2
{
strings:
$re1 = /foo/i // This regexp is case-insentitive
$re2 = /bar./s // In this regexp the dot matches everything, including new-line
$re3 = /baz./is // Both modifiers can be used together
condition:
any of them
}
Notice that ``/foo/i`` is equivalent to ``/foo/ nocase``, but we recommend the
latter when defining strings. The ``/foo/i`` syntax is useful when writing
case-insensitive regular expressions for the ``matches`` operator.
In previous versions of YARA, external libraries like PCRE and RE2 were used
to perform regular expression matching, but starting with version 2.0 YARA uses
its own regular expression engine. This new engine implements most features
found in PCRE, except a few of them like capture groups, POSIX character
classes ([[:isalpha:]], [[:isdigit:]], etc) and backreferences.
YARA’s regular expressions recognise the following metacharacters:
.. list-table::
:widths: 3 10
* - ``\``
- Quote the next metacharacter
* - ``^``
- Match the beginning of the file or negates a character class when used
as the first character after the opening bracket
* - ``$``
- Match the end of the file
* - ``.``
- Matches any single character except a newline character
* - ``|``
- Alternation
* - ``()``
- Grouping
* - ``[]``
- Bracketed character class
The following quantifiers are recognised as well:
.. list-table::
:widths: 3 10
* - ``*``
- Match 0 or more times
* - ``+``
- Match 1 or more times
* - ``?``
- Match 0 or 1 times
* - ``{n}``
- Match exactly n times
* - ``{n,}``
- Match at least n times
* - ``{,m}``
- Match at most m times
* - ``{n,m}``
- Match n to m times
All these quantifiers have a non-greedy variant, followed by a question
mark (?):
.. list-table::
:widths: 3 10
* - ``*?``
- Match 0 or more times, non-greedy
* - ``+?``
- Match 1 or more times, non-greedy
* - ``??``
- Match 0 or 1 times, non-greedy
* - ``{n}?``
- Match exactly n times, non-greedy
* - ``{n,}?``
- Match at least n times, non-greedy
* - ``{,m}?``
- Match at most m times, non-greedy
* - ``{n,m}?``
- Match n to m times, non-greedy
The following escape sequences are recognised:
.. list-table::
:widths: 3 10
* - ``\t``
- Tab (HT, TAB)
* - ``\n``
- New line (LF, NL)
* - ``\r``
- Return (CR)
* - ``\f``
- Form feed (FF)
* - ``\a``
- Alarm bell
* - ``\xNN``
- Character whose ordinal number is the given hexadecimal number
These are the recognised character classes:
.. list-table::
:widths: 3 10
* - ``\w``
- Match a *word* character (alphanumeric plus “_”)
* - ``\W``
- Match a *non-word* character
* - ``\s``
- Match a whitespace character
* - ``\S``
- Match a non-whitespace character
* - ``\d``
- Match a decimal digit character
* - ``\D``
- Match a non-digit character
Starting with version 3.3.0 these zero-width assertions are also recognized:
.. list-table::
:widths: 3 10
* - ``\b``
- Match a word boundary
* - ``\B``
- Match except at a word boundary
Private strings
---------------
All strings in YARA can be marked as ``private`` which means they will never be
included in the output of YARA. They are treated as normal strings everywhere
else, so you can still use them as you wish in the condition, but they will
never be shown with the ``-s`` flag or seen in the YARA callback if you're using
the C API.
.. code-block:: yara
rule PrivateStringExample
{
strings:
$text_string = "foobar" private
condition:
$text_string
}
Unreferenced strings
--------------------
YARA 4.5.0 allows for unreferenced strings in the condition. If a string
identifier starts with an ``_`` then it does not have to be referenced in the
condition. Any other string must be referenced in the condition. This is useful
if you want to search for particular strings and handle them in a custom
callback but don't really need them for your condition logic.
.. code-block:: yara
rule PrivateStringExample
{
strings:
$_unreferenced = "AXSERS"
condition:
true
}
String Modifier Summary
-----------------------
The following string modifiers are processed in the following order, but are only applicable
to the string types listed.
.. list-table:: Text string modifiers
:widths: 3 5 10 10
:header-rows: 1
* - Keyword
- String Types
- Summary
- Restrictions
* - ``nocase``
- Text, Regex
- Ignore case
- Cannot use with ``xor``, ``base64``, or ``base64wide``
* - ``wide``
- Text, Regex
- Emulate UTF16 by interleaving null (0x00) characters
- None
* - ``ascii``
- Text, Regex
- Also match ASCII characters, only required if ``wide`` is used
- None
* - ``xor``
- Text
- XOR text string with single byte keys
- Cannot use with ``nocase``, ``base64``, or ``base64wide``
* - ``base64``
- Text
- Convert to 3 base64 encoded strings
- Cannot use with ``nocase``, ``xor``, or ``fullword``
* - ``base64wide``
- Text
- Convert to 3 base64 encoded strings, then interleaving null characters like ``wide``
- Cannot use with ``nocase``, ``xor``, or ``fullword``
* - ``fullword``
- Text, Regex
- Match is not preceded or followed by an alphanumeric character
- Cannot use with ``base64`` or ``base64wide``
* - ``private``
- Hex, Text, Regex
- Match never included in output
- None
Conditions
==========
Conditions are nothing more than Boolean expressions as those that can be found
in all programming languages, for example in an *if* statement. They can contain
the typical Boolean operators ``and``, ``or``, and ``not``, and relational operators
``>=``, ``<=``, ``<``, ``>``, ``==`` and ``!=``. Also, the arithmetic operators
(``+``, ``-``, ``*``, ``\``, ``%``) and bitwise operators
(``&``, ``|``, ``<<``, ``>>``, ``~``, ``^``) can be used on numerical expressions.
Integers are always 64-bits long, even the results of functions like `uint8`,
`uint16` and `uint32` are promoted to 64-bits. This is something you must take
into account, specially while using bitwise operators (for example, ~0x01 is not
0xFE but 0xFFFFFFFFFFFFFFFE).
The following table lists the precedence and associativity of all operators. The
table is sorted in descending precedence order, which means that operators listed
on a higher row in the list are grouped prior operators listed in rows further
below it. Operators within the same row have the same precedence, if they appear
together in a expression the associativity determines how they are grouped.
========== =========== ========================================= =============
Precedence Operator Description Associativity
========== =========== ========================================= =============
1 [] Array subscripting Left-to-right
. Structure member access
---------- ----------- ----------------------------------------- -------------
2 `-` Unary minus Right-to-left
`~` Bitwise not
---------- ----------- ----------------------------------------- -------------
3 `*` Multiplication Left-to-right
\\ Division
% Remainder
---------- ----------- ----------------------------------------- -------------
4 `+` Addition Left-to-right
`-` Subtraction
---------- ----------- ----------------------------------------- -------------
5 `<<` Bitwise left shift Left-to-right
`>>` Bitwise right shift
---------- ----------- ----------------------------------------- -------------
6 & Bitwise AND Left-to-right
---------- ----------- ----------------------------------------- -------------
7 ^ Bitwise XOR Left-to-right
---------- ----------- ----------------------------------------- -------------
8 `|` Bitwise OR Left-to-right
---------- ----------- ----------------------------------------- -------------
9 < Less than Left-to-right
<= Less than or equal to
> Greater than
>= Greater than or equal to
---------- ----------- ----------------------------------------- -------------
10 == Equal to Left-to-right
!= Not equal to
contains String contains substring
icontains Like contains but case-insensitive
startswith String starts with substring
istartswith Like startswith but case-insensitive
endswith String ends with substring
iendswith Like endswith but case-insensitive
iequals Case-insensitive string comparison
matches String matches regular expression
---------- ----------- ----------------------------------------- -------------
11 not Logical NOT Right-to-left
defined Check if an expression is defined
---------- ----------- ----------------------------------------- -------------
12 and Logical AND Left-to-right
---------- ----------- ----------------------------------------- -------------
13 or Logical OR Left-to-right
========== =========== ========================================= =============
String identifiers can be also used within a condition, acting as Boolean
variables whose value depends on the presence or not of the associated string
in the file.
.. code-block:: yara
rule Example
{
strings:
$a = "text1"
$b = "text2"
$c = "text3"
$d = "text4"
condition:
($a or $b) and ($c or $d)
}
Counting strings
----------------
Sometimes we need to know not only if a certain string is present or not,
but how many times the string appears in the file or process memory. The number
of occurrences of each string is represented by a variable whose name is the
string identifier but with a # character in place of the $ character.
For example:
.. code-block:: yara
rule CountExample
{
strings:
$a = "dummy1"
$b = "dummy2"
condition:
#a == 6 and #b > 10
}
This rule matches any file or process containing the string $a exactly six times,
and more than ten occurrences of string $b.
Starting with YARA 4.2.0 it is possible to express the count of a string in an
integer range, like this:
.. code-block:: yara
#a in (filesize-500..filesize) == 2
In this example the number of 'a' strings in the last 500 bytes of the file must
equal exactly 2.
.. _string-offsets:
String offsets or virtual addresses
-----------------------------------
In the majority of cases, when a string identifier is used in a condition, we
are willing to know if the associated string is anywhere within the file or
process memory, but sometimes we need to know if the string is at some specific
offset on the file or at some virtual address within the process address space.
In such situations the operator ``at`` is what we need. This operator is used as
shown in the following example:
.. code-block:: yara
rule AtExample
{
strings:
$a = "dummy1"
$b = "dummy2"
condition:
$a at 100 and $b at 200
}
The expression ``$a at 100`` in the above example is true only if string $a is
found at offset 100 within the file (or at virtual address 100 if applied to
a running process). The string $b should appear at offset 200. Please note
that both offsets are decimal, however hexadecimal numbers can be written by
adding the prefix 0x before the number as in the C language, which comes very
handy when writing virtual addresses. Also note the higher precedence of the
operator ``at`` over the ``and``.
While the ``at`` operator allows to search for a string at some fixed offset in
the file or virtual address in a process memory space, the ``in`` operator
allows to search for the string within a range of offsets or addresses.
.. code-block:: yara
rule InExample
{
strings:
$a = "dummy1"
$b = "dummy2"
condition:
$a in (0..100) and $b in (100..filesize)
}
In the example above the string $a must be found at an offset between 0 and
100, while string $b must be at an offset between 100 and the end of the file.
Again, numbers are decimal by default.
You can also get the offset or virtual address of the i-th occurrence of string
$a by using @a[i]. The indexes are one-based, so the first occurrence would be
@a[1] the second one @a[2] and so on. If you provide an index greater than the
number of occurrences of the string, the result will be a NaN (Not A Number)
value.
Match length
------------
For many regular expressions and hex strings containing jumps, the length of
the match is variable. If you have the regular expression /fo*/ the strings
"fo", "foo" and "fooo" can be matches, all of them with a different length.
You can use the length of the matches as part of your condition by using the
character ! in front of the string identifier, in a similar way you use the @
character for the offset. !a[1] is the length for the first match of $a, !a[2]
is the length for the second match, and so on. !a is a abbreviated form of
!a[1].
File size
---------
String identifiers are not the only variables that can appear in a condition
(in fact, rules can be defined without any string definition as will be shown
below), there are other special variables that can be used as well. One of
these special variables is ``filesize``, which holds, as its name indicates,
the size of the file being scanned. The size is expressed in bytes.
.. code-block:: yara
rule FileSizeExample
{
condition:
filesize > 200KB
}
The previous example also demonstrates the use of the ``KB`` postfix. This
postfix, when attached to a numerical constant, automatically multiplies the
value of the constant by 1024. The ``MB`` postfix can be used to multiply the
value by 2^20. Both postfixes can be used only with decimal constants.
The use of ``filesize`` only makes sense when the rule is applied to a file. If
the rule is applied to a running process it won’t ever match because
``filesize`` doesn’t make sense in this context.
Executable entry point
----------------------
Another special variable than can be used in a rule is ``entrypoint``. If the
file is a Portable Executable (PE) or Executable and Linkable Format (ELF),
this variable holds the raw offset of the executable’s entry point in case we
are scanning a file. If we are scanning a running process, the entrypoint will
hold the virtual address of the main executable’s entry point. A typical use of
this variable is to look for some pattern at the entry point to detect packers
or simple file infectors.
.. code-block:: yara
rule EntryPointExample1
{
strings:
$a = { E8 00 00 00 00 }
condition:
$a at entrypoint
}
rule EntryPointExample2
{
strings:
$a = { 9C 50 66 A1 ?? ?? ?? 00 66 A9 ?? ?? 58 0F 85 }
condition:
$a in (entrypoint..entrypoint + 10)
}
The presence of the ``entrypoint`` variable in a rule implies that only PE or
ELF files can satisfy that rule. If the file is not a PE or ELF, any rule using
this variable evaluates to false.
.. warning:: The ``entrypoint`` variable is deprecated, you should use the
equivalent ``pe.entry_point`` from the :ref:`pe-module` instead. Starting
with YARA 3.0 you'll get a warning if you use ``entrypoint`` and it will be
completely removed in future versions.
Accessing data at a given position
----------------------------------
There are many situations in which you may want to write conditions that depend
on data stored at a certain file offset or virtual memory address, depending on
if we are scanning a file or a running process. In those situations you can use
one of the following functions to read data from the file at the given offset::
int8(<offset or virtual address>)
int16(<offset or virtual address>)
int32(<offset or virtual address>)
uint8(<offset or virtual address>)
uint16(<offset or virtual address>)
uint32(<offset or virtual address>)
int8be(<offset or virtual address>)
int16be(<offset or virtual address>)
int32be(<offset or virtual address>)
uint8be(<offset or virtual address>)
uint16be(<offset or virtual address>)
uint32be(<offset or virtual address>)
The ``intXX`` functions read 8, 16, and 32 bits signed integers from
<offset or virtual address>, while functions ``uintXX`` read unsigned integers.
Both 16 and 32 bit integers are considered to be little-endian. If you
want to read a big-endian integer use the corresponding function ending
in ``be``. The <offset or virtual address> parameter can be any expression returning
an unsigned integer, including the return value of one the ``uintXX`` functions
itself. As an example let's see a rule to distinguish PE files:
.. code-block:: yara
rule IsPE
{
condition:
// MZ signature at offset 0 and ...
uint16(0) == 0x5A4D and
// ... PE signature at offset stored in MZ header at 0x3C
uint32(uint32(0x3C)) == 0x00004550
}
.. _sets-of-strings:
Sets of strings
---------------
There are circumstances in which it is necessary to express that the file should
contain a certain number strings from a given set. None of the strings in the
set are required to be present, but at least some of them should be. In these
situations the ``of`` operator can be used.
.. code-block:: yara
rule OfExample1
{
strings:
$a = "dummy1"
$b = "dummy2"
$c = "dummy3"
condition:
2 of ($a,$b,$c)
}
This rule requires that at least two of the strings in the set ($a,$b,$c)
must be present in the file, but it does not matter which two. Of course, when
using this operator, the number before the ``of`` keyword must be less than or
equal to the number of strings in the set.
The elements of the set can be explicitly enumerated like in the previous
example, or can be specified by using wild cards. For example:
.. code-block:: yara
rule OfExample2
{
strings:
$foo1 = "foo1"
$foo2 = "foo2"
$foo3 = "foo3"
condition:
2 of ($foo*) // equivalent to 2 of ($foo1,$foo2,$foo3)
}
rule OfExample3
{
strings:
$foo1 = "foo1"
$foo2 = "foo2"
$bar1 = "bar1"
$bar2 = "bar2"
condition:
3 of ($foo*,$bar1,$bar2)
}
You can even use ``($*)`` to refer to all the strings in your rule, or write
the equivalent keyword ``them`` for more legibility.
.. code-block:: yara
rule OfExample4
{
strings:
$a = "dummy1"
$b = "dummy2"
$c = "dummy3"
condition:
1 of them // equivalent to 1 of ($*)
}
In all the examples above, the number of strings have been specified by a
numeric constant, but any expression returning a numeric value can be used.
The keywords ``any``, ``all`` and ``none`` can be used as well.
.. code-block:: yara
all of them // all strings in the rule
any of them // any string in the rule
all of ($a*) // all strings whose identifier starts by $a
any of ($a,$b,$c) // any of $a, $b or $c
1 of ($*) // same that "any of them"
none of ($b*) // zero of the set of strings that start with "$b"
.. warning:: Due to the way YARA works internally, using "0 of them" is an
ambiguous part of the language which should be avoided in favor of "none
of them". To understand this, consider the meaning of "2 of them", which
is true if 2 or more of the strings match. Historically, "0 of them"
followed this principle and would evaluate to true if at least one of the
strings matched. This ambiguity is resolved in YARA 4.3.0 by making "0 of
them" evaluate to true if exactly 0 of the strings match. To improve on
the situation and make the intent clear, it is encouraged to use "none" in
place of 0. By not using an integer it is easier to reason about the meaning
of "none of them" without the historical understanding of "at least 0"
clouding the issue.
Starting with YARA 4.2.0 it is possible to express a set of strings in an
integer range, like this:
.. code-block:: yara
all of ($a*) in (filesize-500..filesize)
any of ($a*, $b*) in (1000..2000)
Starting with YARA 4.3.0 it is possible to express a set of strings at a
specific offset, like this:
.. code-block:: yara
any of ($a*) at 0
Applying the same condition to many strings
-------------------------------------------
There is another operator very similar to ``of`` but even more powerful, the
``for..of`` operator. The syntax is:
.. code-block:: yara
for expression of string_set : ( boolean_expression )
And its meaning is: from those strings in ``string_set`` at least ``expression``
of them must satisfy ``boolean_expression``.
In other words: ``boolean_expression`` is evaluated for every string in
``string_set`` and there must be at least ``expression`` of them returning
True.
Of course, ``boolean_expression`` can be any boolean expression accepted in
the condition section of a rule, except for one important detail: here you
can (and should) use a dollar sign ($) as a place-holder for the string being
evaluated. Take a look at the following expression:
.. code-block:: yara
for any of ($a,$b,$c) : ( $ at pe.entry_point )
The $ symbol in the boolean expression is not tied to any particular string,
it will be $a, and then $b, and then $c in the three successive evaluations
of the expression.
Maybe you already realised that the ``of`` operator is a special case of
``for..of``. The following expressions are the same:
.. code-block:: yara
any of ($a,$b,$c)
for any of ($a,$b,$c) : ( $ )
You can also employ the symbols #, @, and ! to make reference to the number of
occurrences, the first offset, and the length of each string respectively.
.. code-block:: yara
for all of them : ( # > 3 )
for all of ($a*) : ( @ > @b )
Starting with YARA 4.3.0 you can express conditions over text strings like this:
.. code-block:: yara
for any s in ("71b36345516e076a0663e0bea97759e4", "1e7f7edeb06de02f2c2a9319de99e033") : ( pe.imphash() == s )
It is worth remembering here that the two hashes referenced in the rule are
normal text strings, and have nothing to do with the string section of the rule.
Inside the loop condition the result of the `pe.imphash()` function is compared
to each of the text strings, resulting in a more concise rule.
Using anonymous strings with ``of`` and ``for..of``
---------------------------------------------------
When using the ``of`` and ``for..of`` operators followed by ``them``, the
identifier assigned to each string of the rule is usually superfluous. As
we are not referencing any string individually we do not need to provide
a unique identifier for each of them. In those situations you can declare
anonymous strings with identifiers consisting only of the $ character, as in
the following example:
.. code-block:: yara
rule AnonymousStrings
{
strings:
$ = "dummy1"
$ = "dummy2"
condition:
1 of them
}
Iterating over string occurrences
---------------------------------
As seen in :ref:`string-offsets`, the offsets or virtual addresses where a given
string appears within a file or process address space can be accessed by
using the syntax: @a[i], where i is an index indicating which occurrence
of the string $a you are referring to. (@a[1], @a[2],...).
Sometimes you will need to iterate over some of these offsets and guarantee
they satisfy a given condition. In such cases you can use the ``for..in`` syntax,
for example:
.. code-block:: yara
rule Occurrences
{
strings:
$a = "dummy1"
$b = "dummy2"
condition:
for all i in (1,2,3) : ( @a[i] + 10 == @b[i] )
}
The previous rule says that the first occurrence of $b should be 10 bytes
after the first occurrence of $a, and the same should happen with the second
and third occurrences of the two strings.
The same condition could be written also as:
.. code-block:: yara
for all i in (1..3) : ( @a[i] + 10 == @b[i] )
Notice that we’re using a range (1..3) instead of enumerating the index
values (1,2,3). Of course, we’re not forced to use constants to specify range
boundaries, we can use expressions as well like in the following example:
.. code-block:: yara
for all i in (1..#a) : ( @a[i] < 100 )
In this case we’re iterating over every occurrence of $a (remember that #a
represents the number of occurrences of $a). This rule is specifying that every
occurrence of $a should be within the first 100 bytes of the file.
In case you want to express that only some occurrences of the string
should satisfy your condition, the same logic seen in the ``for..of`` operator
applies here:
.. code-block:: yara
for any i in (1..#a) : ( @a[i] < 100 )
for 2 i in (1..#a) : ( @a[i] < 100 )
The ``for..in`` operator is similar to ``for..of``, but the latter iterates over
a set of strings, while the former iterates over ranges, enumerations, arrays and
dictionaries.
Iterators
---------
In YARA 4.0 the ``for..in`` operator was improved and now it can be used to
iterate not only over integer enumerations and ranges (e.g: 1,2,3,4 and 1..4),
but also over any kind of iterable data type, like arrays and dictionaries
defined by YARA modules. For example, the following expression is valid in
YARA 4.0:
.. code-block:: yara
for any section in pe.sections : ( section.name == ".text" )
This is equivalent to:
.. code-block:: yara
for any i in (0..pe.number_of_sections-1) : ( pe.sections[i].name == ".text" )
The new syntax is more natural and easy to understand, and is the recommended
way of expressing this type of conditions in newer versions of YARA.
While iterating dictionaries you must provide two variable names that will
hold the key and value for each entry in the dictionary, for example:
.. code-block:: yara
for any k,v in some_dict : ( k == "foo" and v == "bar" )
In general the ``for..in`` operator has the form:
.. code-block:: yara
for <quantifier> <variables> in <iterable> : ( <some condition using the loop variables> )
Where `<quantifier>` is either `any`, `all` or an expression that evaluates to
the number of items in the iterator that must satisfy the condition, `<variables>`
is a comma-separated list of variable names that holds the values for the
current item (the number of variables depend on the type of `<iterable>`) and
`<iterable>` is something that can be iterated.
.. _referencing-rules:
Referencing other rules
-----------------------
When writing the condition for a rule you can also make reference to a
previously defined rule in a manner that resembles a function invocation of
traditional programming languages. In this way you can create rules that
depend on others. Let's see an example:
.. code-block:: yara
rule Rule1
{
strings:
$a = "dummy1"
condition:
$a
}
rule Rule2
{
strings:
$a = "dummy2"
condition:
$a and Rule1
}
As can be seen in the example, a file will satisfy Rule2 only if it contains
the string "dummy2" and satisfies Rule1. Note that it is strictly necessary to
define the rule being invoked before the one that will make the invocation.
Another way to reference other rules was introduced in 4.2.0 and that is sets
of rules, which operate similarly to sets of strings (see
:ref:`sets-of-strings)`. For example:
.. code-block:: yara
rule Rule1
{
strings:
$a = "dummy1"
condition:
$a
}
rule Rule2
{
strings:
$a = "dummy2"
condition:
$a
}
rule MainRule
{
strings:
$a = "dummy2"
condition:
any of (Rule*)
}
This example demonstrates how to use rule sets to describe higher order logic
in a way which automatically grows with your rules. If you define another rule
named ``Rule3`` before ``MainRule`` then it will automatically be included in
the expansion of ``Rule*`` in the condition for MainRule.
To use rule sets all of the rules included in the set **must** exist prior to
the rule set being used. For example, the following will produce a compiler
error because ``a2`` is defined after the rule set is used in ``x``:
.. code-block:: yara
rule a1 { condition: true }
rule x { condition: 1 of (a*) }
rule a2 { condition: true }
More about rules
================
There are some aspects of YARA rules that have not been covered yet, but are
still very important. These are: global rules, private rules, tags and
metadata.
Global rules
------------
Global rules give you the possibility of imposing restrictions in all your
rules at once. For example, suppose that you want all your rules to ignore
files that exceed a certain size limit. You could go rule by rule making
the required modifications to their conditions, or just write a global rule
like this one:
.. code-block:: yara
global rule SizeLimit
{
condition:
filesize < 2MB
}
You can define as many global rules as you want, they will be evaluated
before the rest of the rules, which in turn will be evaluated only if all
global rules are satisfied.
Private rules
-------------
Private rules are a very simple concept. They are just rules that are not
reported by YARA when they match on a given file. Rules that are not reported
at all may seem sterile at first glance, but when mixed with the possibility
offered by YARA of referencing one rule from another (see
:ref:`referencing-rules`) they become useful. Private rules can serve as
building blocks for other rules, and at the same time prevent cluttering
YARA's output with irrelevant information. To declare a rule as private
just add the keyword ``private`` before the rule declaration.
.. code-block:: yara
private rule PrivateRuleExample
{
...
}
You can apply both ``private`` and ``global`` modifiers to a rule, resulting in
a global rule that does not get reported by YARA but must be satisfied.
Rule tags
---------
Another useful feature of YARA is the possibility of adding tags to rules.
Those tags can be used later to filter YARA's output and show only the rules
that you are interested in. You can add as many tags as you want to a rule,
they are declared after the rule identifier as shown below:
.. code-block:: yara
rule TagsExample1 : Foo Bar Baz
{
...
}
rule TagsExample2 : Bar
{
...
}
Tags must follow the same lexical convention of rule identifiers, therefore
only alphanumeric characters and underscores are allowed, and the tag cannot
start with a digit. They are also case sensitive.
When using YARA you can output only those rules which are tagged with the tag
or tags that you provide.
Metadata
--------
Besides the string definition and condition sections, rules can also have a
metadata section where you can put additional information about your rule.
The metadata section is defined with the keyword ``meta`` and contains
identifier/value pairs like in the following example:
.. code-block:: yara
rule MetadataExample
{
meta:
my_identifier_1 = "Some string data"
my_identifier_2 = 24
my_identifier_3 = true
strings:
$my_text_string = "text here"
$my_hex_string = { E2 34 A1 C8 23 FB }
condition:
$my_text_string or $my_hex_string
}
As can be seen in the example, metadata identifiers are always followed by
an equals sign and the value assigned to them. The assigned values can be
strings (valid UTF8 only), integers, or one of the boolean values true or false.
Note that identifier/value pairs defined in the metadata section cannot be used
in the condition section, their only purpose is to store additional information
about the rule.
.. _using-modules:
Using modules
=============
Modules are extensions to YARA's core functionality. Some modules like
the :ref:`PE module <pe-module>` and the :ref:`Cuckoo module <cuckoo-module>`
are officially distributed with YARA and additional ones can be created by
third-parties or even yourself as described in :ref:`writing-modules`.
The first step to using a module is importing it with the ``import`` statement.
These statements must be placed outside any rule definition and followed by
the module name enclosed in double-quotes. Like this:
.. code-block:: yara
import "pe"
import "cuckoo"
After importing the module you can make use of its features, always using
``<module name>.`` as a prefix to any variable or function exported by the
module. For example:
.. code-block:: yara
pe.entry_point == 0x1000
cuckoo.http_request(/someregexp/)
.. _undefined-values:
Undefined values
================
Modules often leave variables in an undefined state, for example when the
variable doesn't make sense in the current context (think of ``pe.entry_point``
while scanning a non-PE file). YARA handles undefined values in a way that allows
the rule to keep its meaningfulness. Take a look at this rule:
.. code-block:: yara
import "pe"
rule Test
{
strings:
$a = "some string"
condition:
$a and pe.entry_point == 0x1000
}
If the scanned file is not a PE you wouldn't expect this rule to match the file,
even if it contains the string, because **both** conditions (the presence of
the string and the right value for the entry point) must be satisfied. However,
if the condition is changed to:
.. code-block:: yara
$a or pe.entry_point == 0x1000
You would expect the rule to match in this case if the file contains the string,
even if it isn't a PE file. That's exactly how YARA behaves. The logic is as
follows:
* If the expression in the condition is undefined, it would be translated to
``false`` and the rule won't match.
* Boolean operators ``and`` and ``or`` will treat undefined operands as ``false``,
Which means that:
* ``undefined and true`` is ``false``
* ``undefined and false`` is ``false``
* ``undefined or true`` is ``true``
* ``undefined or false`` is ``false``
* All the remaining operators, including the ``not`` operator, return undefined
if any of their operands is undefined.
In the expression above, ``pe.entry_point == 0x1000`` will be undefined for non-PE
files, because ``pe.entry_point`` is undefined for those files. This implies that
``$a or pe.entry_point == 0x1000`` will be ``true`` if and only if ``$a`` is ``true``.
If the condition is ``pe.entry_point == 0x1000`` alone, it will evaluate to ``false``
for non-PE files, and so will do ``pe.entry_point != 0x1000`` and
``not pe.entry_point == 0x1000``, as none of these expressions make sense for non-PE
files.
To check if expression is defined use unary operator ``defined``. Example:
.. code-block:: yara
defined pe.entry_point
External variables
==================
External variables allow you to define rules that depend on values provided
from the outside. For example, you can write the following rule:
.. code-block:: yara
rule ExternalVariableExample1
{
condition:
ext_var == 10
}
In this case ``ext_var`` is an external variable whose value is assigned at
run-time (see ``-d`` option of command-line tool, and ``externals`` parameter of
``compile`` and ``match`` methods in yara-python). External variables could be
of types: integer, string or boolean; their type depends on the value assigned
to them. An integer variable can substitute any integer constant in the
condition and boolean variables can occupy the place of boolean expressions.
For example:
.. code-block:: yara
rule ExternalVariableExample2
{
condition:
bool_ext_var or filesize < int_ext_var
}
External variables of type string can be used with the operators: ``contains``,
``startswith``, ``endswith`` and their case-insensitive counterparts: ``icontains``,
``istartswith`` and ``iendswith``. They can be used also with the ``matches``
operator, which returns true if the string matches a given regular expression.
Case-insensitive string comparison can be done through special operator ``iequals``
which only works with strings. For case-sensitive comparison use regular ``==``.
.. code-block:: yara
rule ContainsExample
{
condition:
string_ext_var contains "text"
}
rule CaseInsensitiveContainsExample
{
condition:
string_ext_var icontains "text"
}
rule StartsWithExample
{
condition:
string_ext_var startswith "prefix"
}
rule EndsWithExample
{
condition:
string_ext_var endswith "suffix"
}
rule IequalsExample
{
condition:
string_ext_var iequals "string"
}
rule MatchesExample
{
condition:
string_ext_var matches /[a-z]+/
}
You can use regular expression modifiers along with the ``matches`` operator,
for example, if you want the regular expression from the previous example
to be case insensitive you can use ``/[a-z]+/i``. Notice the ``i`` following the
regular expression in a Perl-like manner. You can also use the ``s`` modifier
for single-line mode, in this mode the dot matches all characters including
line breaks. Of course both modifiers can be used simultaneously, like in the
following example:
.. code-block:: yara
rule ExternalVariableExample5
{
condition:
/* case insensitive single-line mode */
string_ext_var matches /[a-z]+/is
}
Keep in mind that every external variable used in your rules must be defined
at run-time, either by using the ``-d`` option of the command-line tool, or by
providing the ``externals`` parameter to the appropriate method in
``yara-python``.
Including files
===============
In order to allow for more flexible organization of your rules files,
YARA provides the ``include`` directive. This directive works in a similar way
to the *#include* pre-processor directive in C programs, which inserts the
content of the specified source file into the current file during compilation.
The following example will include the content of *other.yar* into the current
file:
.. code-block:: yara
include "other.yar"
The base path when searching for a file in an ``include`` directive will be the
directory where the current file resides. For this reason, the file *other.yar*
in the previous example should be located in the same directory of the current
file. However, you can also specify relative paths like these:
.. code-block:: yara
include "./includes/other.yar"
include "../includes/other.yar"
Or use absolute paths:
.. code-block:: yara
include "/home/plusvic/yara/includes/other.yar"
In Windows, both forward and back slashes are accepted, but don’t forget to
write the drive letter:
.. code-block:: yara
include "c:/yara/includes/other.yar"
include "c:\\yara\\includes\\other.yar"
|