1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405
|
<pre>Internet Engineering Task Force (IETF) E. Boschi
Request for Comments: 6235 B. Trammell
Category: Experimental ETH Zurich
ISSN: 2070-1721 May 2011
<span class="h1">IP Flow Anonymization Support</span>
Abstract
This document describes anonymization techniques for IP flow data and
the export of anonymized data using the IP Flow Information Export
(IPFIX) protocol. It categorizes common anonymization schemes and
defines the parameters needed to describe them. It provides
guidelines for the implementation of anonymized data export and
storage over IPFIX, and describes an information model and Options-
based method for anonymization metadata export within the IPFIX
protocol or storage in IPFIX Files.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for examination, experimental implementation, and
evaluation.
This document defines an Experimental Protocol for the Internet
community. This document is a product of the Internet Engineering
Task Force (IETF). It represents the consensus of the IETF
community. It has received public review and has been approved for
publication by the Internet Engineering Steering Group (IESG). Not
all documents approved by the IESG are a candidate for any level of
Internet Standard; see <a href="./rfc5741#section-2">Section 2 of RFC 5741</a>.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
<a href="http://www.rfc-editor.org/info/rfc6235">http://www.rfc-editor.org/info/rfc6235</a>.
<span class="grey">Boschi & Trammell Experimental [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a> and the IETF Trust's Legal
Provisions Relating to IETF Documents
(<a href="http://trustee.ietf.org/license-info">http://trustee.ietf.org/license-info</a>) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
<a href="#section-1">1</a>. Introduction ....................................................<a href="#page-4">4</a>
<a href="#section-1.1">1.1</a>. IPFIX Protocol Overview ....................................<a href="#page-4">4</a>
<a href="#section-1.2">1.2</a>. IPFIX Documents Overview ...................................<a href="#page-5">5</a>
<a href="#section-1.3">1.3</a>. Anonymization within the IPFIX Architecture ................<a href="#page-5">5</a>
<a href="#section-1.4">1.4</a>. Supporting Experimentation with Anonymization ..............<a href="#page-6">6</a>
<a href="#section-2">2</a>. Terminology .....................................................<a href="#page-6">6</a>
<a href="#section-3">3</a>. Categorization of Anonymization Techniques ......................<a href="#page-7">7</a>
<a href="#section-4">4</a>. Anonymization of IP Flow Data ...................................<a href="#page-8">8</a>
<a href="#section-4.1">4.1</a>. IP Address Anonymization ..................................<a href="#page-10">10</a>
<a href="#section-4.1.1">4.1.1</a>. Truncation .........................................<a href="#page-11">11</a>
<a href="#section-4.1.2">4.1.2</a>. Reverse Truncation .................................<a href="#page-11">11</a>
<a href="#section-4.1.3">4.1.3</a>. Permutation ........................................<a href="#page-11">11</a>
<a href="#section-4.1.4">4.1.4</a>. Prefix-Preserving Pseudonymization .................<a href="#page-12">12</a>
<a href="#section-4.2">4.2</a>. MAC Address Anonymization .................................<a href="#page-12">12</a>
<a href="#section-4.2.1">4.2.1</a>. Truncation .........................................<a href="#page-13">13</a>
<a href="#section-4.2.2">4.2.2</a>. Reverse Truncation .................................<a href="#page-13">13</a>
<a href="#section-4.2.3">4.2.3</a>. Permutation ........................................<a href="#page-14">14</a>
<a href="#section-4.2.4">4.2.4</a>. Structured Pseudonymization ........................<a href="#page-14">14</a>
<a href="#section-4.3">4.3</a>. Timestamp Anonymization ...................................<a href="#page-15">15</a>
<a href="#section-4.3.1">4.3.1</a>. Precision Degradation ..............................<a href="#page-15">15</a>
<a href="#section-4.3.2">4.3.2</a>. Enumeration ........................................<a href="#page-16">16</a>
<a href="#section-4.3.3">4.3.3</a>. Random Shifts ......................................<a href="#page-16">16</a>
<a href="#section-4.4">4.4</a>. Counter Anonymization .....................................<a href="#page-16">16</a>
<a href="#section-4.4.1">4.4.1</a>. Precision Degradation ..............................<a href="#page-17">17</a>
<a href="#section-4.4.2">4.4.2</a>. Binning ............................................<a href="#page-17">17</a>
<a href="#section-4.4.3">4.4.3</a>. Random Noise Addition ..............................<a href="#page-17">17</a>
<a href="#section-4.5">4.5</a>. Anonymization of Other Flow Fields ........................<a href="#page-18">18</a>
<a href="#section-4.5.1">4.5.1</a>. Binning ............................................<a href="#page-18">18</a>
<a href="#section-4.5.2">4.5.2</a>. Permutation ........................................<a href="#page-18">18</a>
<a href="#section-5">5</a>. Parameters for the Description of Anonymization Techniques .....<a href="#page-19">19</a>
<a href="#section-5.1">5.1</a>. Stability .................................................<a href="#page-19">19</a>
<span class="grey">Boschi & Trammell Experimental [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
<a href="#section-5.2">5.2</a>. Truncation Length .........................................<a href="#page-19">19</a>
<a href="#section-5.3">5.3</a>. Bin Map ...................................................<a href="#page-20">20</a>
<a href="#section-5.4">5.4</a>. Permutation ...............................................<a href="#page-20">20</a>
<a href="#section-5.5">5.5</a>. Shift Amount ..............................................<a href="#page-20">20</a>
<a href="#section-6">6</a>. Anonymization Export Support in IPFIX ..........................<a href="#page-20">20</a>
6.1. Anonymization Records and the Anonymization
Options Template ..........................................<a href="#page-21">21</a>
6.2. Recommended Information Elements for Anonymization
Metadata ..................................................<a href="#page-23">23</a>
<a href="#section-6.2.1">6.2.1</a>. informationElementIndex ............................<a href="#page-23">23</a>
<a href="#section-6.2.2">6.2.2</a>. anonymizationTechnique .............................<a href="#page-23">23</a>
<a href="#section-6.2.3">6.2.3</a>. anonymizationFlags .................................<a href="#page-25">25</a>
<a href="#section-7">7</a>. Applying Anonymization Techniques to IPFIX Export and Storage ..27
<a href="#section-7.1">7.1</a>. Arrangement of Processes in IPFIX Anonymization ...........<a href="#page-28">28</a>
<a href="#section-7.2">7.2</a>. IPFIX-Specific Anonymization Guidelines ...................<a href="#page-30">30</a>
7.2.1. Appropriate Use of Information Elements for
Anonymized Data ....................................<a href="#page-30">30</a>
<a href="#section-7.2.2">7.2.2</a>. Export of Perimeter-Based Anonymization Policies ...<a href="#page-31">31</a>
<a href="#section-7.2.3">7.2.3</a>. Anonymization of Header Data .......................<a href="#page-32">32</a>
<a href="#section-7.2.4">7.2.4</a>. Anonymization of Options Data ......................<a href="#page-32">32</a>
<a href="#section-7.2.5">7.2.5</a>. Special-Use Address Space Considerations ...........<a href="#page-34">34</a>
7.2.6. Protecting Out-of-Band Configuration and
Management Data ....................................<a href="#page-34">34</a>
<a href="#section-8">8</a>. Examples .......................................................<a href="#page-34">34</a>
<a href="#section-9">9</a>. Security Considerations ........................................<a href="#page-39">39</a>
<a href="#section-10">10</a>. IANA Considerations ...........................................<a href="#page-41">41</a>
<a href="#section-11">11</a>. Acknowledgments ...............................................<a href="#page-41">41</a>
<a href="#section-12">12</a>. References ....................................................<a href="#page-41">41</a>
<a href="#section-12.1">12.1</a>. Normative References .....................................<a href="#page-41">41</a>
<a href="#section-12.2">12.2</a>. Informative References ...................................<a href="#page-42">42</a>
<span class="grey">Boschi & Trammell Experimental [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
The standardization of an IP Flow Information Export (IPFIX) protocol
[<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>] and associated representations removes a technical barrier
to the sharing of IP flow data across organizational boundaries and
with network operations, security, and research communities for a
wide variety of purposes. However, with wider dissemination comes
greater risks to the privacy of the users of networks under
measurement, and to the security of those networks. While it is not
a complete solution to the issues posed by distribution of IP flow
information, anonymization (i.e., the deletion or transformation of
information that is considered sensitive and that could be used to
reveal the identity of subjects involved in a communication) is an
important tool for the protection of privacy within network
measurement infrastructures.
This document presents a mechanism for representing anonymized data
within IPFIX and guidelines for using it. It is not intended as a
general statement on the applicability of specific flow data
anonymization techniques to specific situations or as a
recommendation of any particular application of anonymization to flow
data export. Exporters or publishers of anonymized data must take
care that the applied anonymization technique is appropriate for the
data source, the purpose, and the risk of deanonymization of a given
application.
It begins with a categorization of anonymization techniques. It then
describes the applicability of each technique to commonly
anonymizable fields of IP flow data, organized by information element
data type and semantics as in [<a href="./rfc5102" title=""Information Model for IP Flow Information Export"">RFC5102</a>]; enumerates the parameters
required by each of the applicable anonymization techniques; and
provides guidelines for the use of each of these techniques in
accordance with current best practices in data protection. Finally,
it specifies a mechanism for exporting anonymized data and binding
anonymization metadata to Templates and Options Templates using IPFIX
Options.
<span class="h3"><a class="selflink" id="section-1.1" href="#section-1.1">1.1</a>. IPFIX Protocol Overview</span>
In the IPFIX protocol, { type, length, value } tuples are expressed
in Templates containing { type, length } pairs, specifying which
{ value } fields are present in data records conforming to the
Template, giving great flexibility as to what data is transmitted.
Since Templates are sent very infrequently compared with Data
Records, this results in significant bandwidth savings. Various
different data formats may be transmitted simply by sending new
Templates specifying the { type, length } pairs for the new data
format. See [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>] for more information.
<span class="grey">Boschi & Trammell Experimental [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
The IPFIX information model [<a href="./rfc5102" title=""Information Model for IP Flow Information Export"">RFC5102</a>] defines a large number of
standard Information Elements (IEs) that provide the necessary
{ type } information for Templates. The use of standard elements
enables interoperability among different vendors' implementations.
Additionally, non-standard enterprise-specific elements may be
defined for private use.
<span class="h3"><a class="selflink" id="section-1.2" href="#section-1.2">1.2</a>. IPFIX Documents Overview</span>
"Specification of the IP Flow Information Export (IPFIX) Protocol for
the Exchange of IP Traffic Flow Information" [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>] and its
associated documents define the IPFIX protocol, which provides
network engineers and administrators with access to IP traffic flow
information.
"Architecture for IP Flow Information Export" [<a href="./rfc5470" title=""Architecture for IP Flow Information Export"">RFC5470</a>] defines the
architecture for the export of measured IP flow information out of an
IPFIX Exporting Process to an IPFIX Collecting Process, and the basic
terminology used to describe the elements of this architecture, per
the requirements defined in "Requirements for IP Flow Information
Export" [<a href="./rfc3917" title=""Requirements for IP Flow Information Export (IPFIX)"">RFC3917</a>]. The IPFIX Protocol document [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>] then covers
the details of the method for transporting IPFIX Data Records and
Templates via a congestion-aware transport protocol from an IPFIX
Exporting Process to an IPFIX Collecting Process.
"Information Model for IP Flow Information Export" [<a href="./rfc5102" title=""Information Model for IP Flow Information Export"">RFC5102</a>]
describes the Information Elements used by IPFIX, including details
on Information Element naming, numbering, and data type encoding.
Finally, "IP Flow Information Export (IPFIX) Applicability" [<a href="./rfc5472" title=""IP Flow Information Export (IPFIX) Applicability"">RFC5472</a>]
describes the various applications of the IPFIX protocol and their
use of information exported via IPFIX and relates the IPFIX
architecture to other measurement architectures and frameworks.
Additionally, "Specification of the IP Flow Information Export
(IPFIX) File Format" [<a href="./rfc5655" title=""Specification of the IP Flow Information Export (IPFIX) File Format"">RFC5655</a>] describes a file format based upon the
IPFIX protocol for the storage of flow data.
This document references the Protocol and Architecture documents for
terminology and extends the IPFIX Information Model to provide new
Information Elements for anonymization metadata. The anonymization
techniques described herein are equally applicable to the IPFIX
protocol and data stored in IPFIX Files.
<span class="h3"><a class="selflink" id="section-1.3" href="#section-1.3">1.3</a>. Anonymization within the IPFIX Architecture</span>
According to [<a href="./rfc5470" title=""Architecture for IP Flow Information Export"">RFC5470</a>], IPFIX Message anonymization is optionally
performed as the final operation before handing the Message to the
transport protocol for export. While no provision is made in the
<span class="grey">Boschi & Trammell Experimental [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
architecture for anonymization metadata as in <a href="#section-6">Section 6</a>, this
arrangement does allow for the rewriting necessary for comprehensive
anonymization of IPFIX export as in <a href="#section-7">Section 7</a>. The development of
the IPFIX Mediation [<a href="./rfc6183" title=""IP Flow Information Export (IPFIX) Mediation: Framework"">RFC6183</a>] framework and the IPFIX File Format
[<a href="./rfc5655" title=""Specification of the IP Flow Information Export (IPFIX) File Format"">RFC5655</a>] expand upon this initial architectural allowance for
anonymization by adding to the list of places that anonymization may
be applied. The former specifies IPFIX Mediators, which rewrite
existing IPFIX Messages, and the latter specifies a method for
storage of IPFIX data in files.
More detail on the applicable architectural arrangements for
anonymization can be found in <a href="#section-7.1">Section 7.1</a>
<span class="h3"><a class="selflink" id="section-1.4" href="#section-1.4">1.4</a>. Supporting Experimentation with Anonymization</span>
The status of this document is Experimental, reflecting the
experimental nature of anonymization export support. Research on
network trace anonymization techniques and attacks against them is
ongoing. Indeed, there is increasing evidence that anonymization
applied to network trace or flow data on its own is insufficient for
many data protection applications as in [<a href="#ref-Bur10" title=""The Role of Network Trace Anonymization Under Attack"">Bur10</a>]. Therefore, this
document explicitly does not recommend any particular technique or
implementation thereof.
The intention of this document is to provide a common basis for
interoperable exchange of anonymized data, furthering research in
this area, both on anonymization techniques themselves as well as to
the application of anonymized data to network measurement. To that
end, the classification in <a href="#section-3">Section 3</a> and anonymization export support
in <a href="#section-6">Section 6</a> can be used to describe and export information even
about data anonymized using techniques that are unacceptably weak for
general application to production datasets on their own.
While the specification herein is designed to be independent of the
anonymization techniques applied and the implementation thereof, open
research in this area may necessitate future updates to the
specification. Assuming the future successful application of this
specification to anonymized data publication and exchange, it may be
brought back to the IPFIX working group for further development and
publication on the Standards Track.
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Terminology</span>
Terms used in this document that are defined in the Terminology
section of the IPFIX Protocol [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>] document are to be
interpreted as defined there. In addition, this document defines the
following terms:
<span class="grey">Boschi & Trammell Experimental [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
Anonymization Record: A record, defined by the Anonymization
Options Template in <a href="#section-6.1">Section 6.1</a>, that defines the properties of
the anonymization applied to a single Information Element within a
single Template or Options Template.
Anonymized Data Record: A Data Record within a Data Set containing
at least one Information Element with anonymized values. The
Information Element(s) within the Template or Options Template
describing this Data Record SHOULD have a corresponding
Anonymization Record.
Intermediate Anonymization Process: An intermediate process that
takes Data Records and transforms them into Anonymized Data
Records.
Note that there is an explicit difference in this document between a
"Data Set" (which is defined as in [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>]) and a "data set". When
in lower case, this term refers to any collection of data (usually,
within the context of this document, flow or packet data) that may
contain identifying information and is therefore subject to
anonymization.
Note also that when the term Template is used in this document,
unless otherwise noted, it applies both to Templates and Options
Templates as defined in [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>]. Specifically, Anonymization
Records may apply to both Templates and Options Templates.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <a href="./rfc2119">RFC 2119</a> [<a href="./rfc2119" title=""Key words for use in RFCs to Indicate Requirement Levels"">RFC2119</a>].
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. Categorization of Anonymization Techniques</span>
Anonymization, as described by this document, is the modification of
a dataset in order to protect the identity of the people or entities
described by the dataset from disclosure. With respect to network
traffic data, anonymization generally attempts to preserve some set
of properties of the network traffic useful for a given application
or applications, while ensuring the data cannot be traced back to the
specific networks, hosts, or users generating the traffic.
Anonymization may be broadly classified according to two properties:
recoverability and countability. All anonymization techniques map
the real space of identifiers or values into a separate, anonymized
space, according to some function. A technique is said to be
recoverable when the function used is invertible or can otherwise be
reversed and a real identifier can be recovered from a given
replacement identifier. "Recoverability" as used within this
<span class="grey">Boschi & Trammell Experimental [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
categorization does not refer to recoverability under attack; that
is, techniques wherein the function used can only be reversed using
additional information, such as an encryption key, or knowledge of
injected traffic within the dataset, are not considered to be
recoverable.
Countability compares the dimension of the anonymized space (N) to
the dimension of the real space (M), and denotes how the count of
unique values is preserved by the anonymization function. If the
anonymized space is smaller than the real space, then the function is
said to generalize the input, mapping more than one input point to
each anonymous value (e.g., as with aggregation). By definition,
generalization is not recoverable.
If the dimensions of the anonymized and real spaces are the same,
such that the count of unique values is preserved, then the function
is said to be a direct substitution function. If the dimension of
the anonymized space is larger, such that each real value maps to a
set of anonymized values, then the function is said to be a set
substitution function. Note that with set substitution functions,
the sets of anonymized values are not necessarily disjoint. Either
direct or set substitution functions are said to be one-way if there
exists no non-brute force method for recovering the real data point
from an anonymized one in isolation (i.e., if the only way to recover
the data point is to attack the anonymized data set as a whole, e.g.,
through fingerprinting or data injection).
This classification is summarized in the table below.
+------------------------+-----------------+------------------------+
| Recoverability / | Recoverable | Non-recoverable |
| Countability | | |
+------------------------+-----------------+------------------------+
| N < M | N.A. | Generalization |
| N = M | Direct | One-way Direct |
| | Substitution | Substitution |
| N > M | Set | One-way Set |
| | Substitution | Substitution |
+------------------------+-----------------+------------------------+
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. Anonymization of IP Flow Data</span>
In anonymizing IP flow data as treated by this document, the goal is
generally two-way address untraceability: to remove the ability to
assert that endpoint X contacted endpoint Y at time T. Address
untraceability is important as IP addresses are the most suitable
field in IP flow records to identify real-world entities. Each IP
address is associated with an interface on a network host and can
<span class="grey">Boschi & Trammell Experimental [Page 8]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-9" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
potentially be identified with a single user. Additionally, IP
addresses are structured identifiers; that is, partial IP address
prefixes may be used to identify networks just as full IP addresses
identify hosts. This leads IP flow data anonymization to be
concerned first and foremost with IP address anonymization.
Any form of aggregation that combines flows from multiple endpoints
into a single record (e.g., aggregation by subnetwork, aggregation
removing addressing completely) may also provide address
untraceability; however, anonymization by aggregation is out of scope
for this document. Additionally, of potential interest in this
problem space but out of scope are anonymization techniques that are
applied over multiple fields or multiple records in a way that
introduces dependencies among anonymized fields or records. This
document is concerned solely with anonymization techniques applied at
the resolution of single fields within a flow record.
Even so, attacks against these anonymization techniques use entire
flows and relationships between hosts and flows within a given
dataset. Therefore, fields that may not necessarily be identifying
by themselves may be anonymized in order to increase the anonymity of
the dataset as a whole.
Due to the restricted semantics of IP flow data, there is a
relatively limited set of specific anonymization techniques available
on flow data, though each falls into the broad categories discussed
in the previous section. Each type of field that may commonly appear
in a flow record may have its own applicable specific techniques.
As with IP addresses, Media Access Control (MAC) addresses uniquely
identify devices on the network; while they are not often available
in traffic data collected at Layer 3, and cannot be used to locate
devices within the network, some traces may contain sub-IP data
including MAC address data. Hardware addresses may be mappable to
device serial numbers, and to the entities or individuals who
purchased the devices, when combined with external databases. MAC
addresses are also often used in constructing IPv6 addresses (see
<a href="./rfc4291#section-2.5.1">Section 2.5.1 of [RFC4291]</a>) and as such may be used to reconstruct
the low-order bits of anonymized IPv6 addresses in certain
circumstances. Therefore, MAC address anonymization is also
important.
Port numbers identify abstract entities (applications) as opposed to
real-world entities, but they can be used to classify hosts and user
behavior. Passive port fingerprinting, both of well-known and
ephemeral ports, can be used to determine the operating system
<span class="grey">Boschi & Trammell Experimental [Page 9]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-10" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
running on a host. Relative data volumes by port can also be used to
determine the host's function (workstation, web server, etc.); this
information can be used to identify hosts and users.
While not identifiers in and of themselves, timestamps and counters
can reveal the behavior of the hosts and users on a network. Any
given network activity is recognizable by a pattern of relative time
differences and data volumes in the associated sequence of flows,
even without host address information. Therefore, they can be used
to identify hosts and users. Timestamps and counters are also
vulnerable to traffic injection attacks, where traffic with a known
pattern is injected into a network under measurement, and this
pattern is later identified in the anonymized dataset.
The simplest and most extreme form of anonymization, which can be
applied to any field of a flow record, is black-marker anonymization,
or complete deletion of a given field. Note that black-marker
anonymization is equivalent to simply not exporting the field(s) in
question.
While black-marker anonymization completely protects the data in the
deleted fields from the risk of disclosure, it also reduces the
utility of the anonymized dataset as a whole. Techniques that retain
some information while reducing (though not eliminating) the
disclosure risk will be extensively discussed in the following
sections; note that the techniques specifically applicable to IP
addresses, timestamps, ports, and counters will be discussed in
separate sections.
<span class="h3"><a class="selflink" id="section-4.1" href="#section-4.1">4.1</a>. IP Address Anonymization</span>
Since IP addresses are the most common identifiers within flow data
that can be used to directly identify a person, organization, or
host, most of the work on flow and trace data anonymization has gone
into IP address anonymization techniques. Indeed, the aim of most
attacks against anonymization is to recover the map from anonymized
IP addresses to original IP addresses thereby identifying the
identified hosts. Therefore, there is a wide range of IP address
anonymization schemes that fit into the following categories.
+------------------------------------+---------------------+
| Scheme | Action |
+------------------------------------+---------------------+
| Truncation | Generalization |
| Reverse Truncation | Generalization |
| Permutation | Direct Substitution |
| Prefix-preserving Pseudonymization | Direct Substitution |
+------------------------------------+---------------------+
<span class="grey">Boschi & Trammell Experimental [Page 10]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-11" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
<span class="h4"><a class="selflink" id="section-4.1.1" href="#section-4.1.1">4.1.1</a>. Truncation</span>
Truncation removes "n" of the least significant bits from an IP
address, replacing them with zeroes. In effect, it replaces a host
address with a network address for some fixed netblock; for IPv4
addresses, 8-bit truncation corresponds to replacement with a /24
network address. Truncation is a non-reversible generalization
scheme. Note that while truncation is effective for making hosts
non-identifiable, it preserves information that can be used to
identify an organization, a geographic region, a country, or a
continent.
Truncation to an address length of 0 is equivalent to black-marker
anonymization. Complete removal of IP address information is only
recommended for analysis tasks that have no need to separate flow
data by host or network; e.g., as a first stage to per-application
(port) or time-series total volume analyses.
<span class="h4"><a class="selflink" id="section-4.1.2" href="#section-4.1.2">4.1.2</a>. Reverse Truncation</span>
Reverse truncation removes "n" of the most significant bits from an
IP address, replacing them with zeroes. Reverse truncation is a non-
reversible generalization scheme. Reverse truncation is effective
for making networks unidentifiable, partially or completely removing
information that can be used to identify an organization, a
geographic region, a country, or a continent (or Regional Internet
Registry (RIR) region of responsibility). However, it may cause
ambiguity when applied to data collected from more than one network,
since it treats all the hosts with the same address on different
networks as if they are the same host. It is not particularly useful
when publishing data where the network of origin is known or can be
easily guessed by virtue of the identity of the publisher.
Like truncation, reverse truncation to an address length of 0 is
equivalent to black-marker anonymization.
<span class="h4"><a class="selflink" id="section-4.1.3" href="#section-4.1.3">4.1.3</a>. Permutation</span>
Permutation is a direct substitution technique, replacing each IP
address with an address selected from the set of possible IP
addresses, such that each anonymized address represents a unique
original address. The selection function is often random, though it
is not necessarily so. Permutation does not preserve any structural
information about a network, but it does preserve the unique count of
IP addresses. Any application that requires more structure than
host-uniqueness will not be able to use permuted IP addresses.
<span class="grey">Boschi & Trammell Experimental [Page 11]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-12" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
There are many variations of permutation functions, each of which has
trade-offs in performance, security, and guarantees of non-collision;
evaluating these trade-offs is implementation independent. However,
in general, permutation functions applied to anonymization SHOULD be
difficult to reverse without knowing the parameters (e.g., a secret
key for Hashed Message Authentication Code (HMAC). Given the
relatively small space of IPv4 addresses in particular, hash
functions applied without additional parameters could be reversed
through brute force if the hash function is known, and SHOULD NOT be
used as permutation functions. Permutation functions may guarantee
non-collision (i.e., that each anonymized address represents a unique
original address), but need not; however, the probability of
collision SHOULD be low. Nevertheless, we treat even permutations
with low but nonzero collision probability as a direct substitution.
Beyond these guidelines, recommendations for specific permutation
functions are out of scope for this document.
<span class="h4"><a class="selflink" id="section-4.1.4" href="#section-4.1.4">4.1.4</a>. Prefix-Preserving Pseudonymization</span>
Prefix-preserving pseudonymization is a direct substitution
technique, like permutation but further restricted such that the
structure of subnets is preserved at each level while anonymizing IP
addresses. If two real IP addresses match on a prefix of "n" bits,
the two anonymized IP addresses will match on a prefix of "n" bits as
well. This is useful when relationships among networks must be
preserved for a given analysis task, but introduces structure into
the anonymized data that can be exploited in attacks against the
anonymization technique.
Scanning in Internet background traffic can cause particular problems
with this technique: if a scanner uses a predictable and known
sequence of addresses, this information can be used to reverse the
substitution. The low-order portion of the address can be left
unanonymized as a partial defense against this attack.
<span class="h3"><a class="selflink" id="section-4.2" href="#section-4.2">4.2</a>. MAC Address Anonymization</span>
Flow data containing sub-IP information can also contain identifying
information in the form of the hardware (MAC) address. While MAC
address information cannot be used to locate a node within a network,
it can be used to directly and uniquely identify a specific device.
Vendors or organizations within the supply chain may then have the
information necessary to identify the entity or individual that
purchased the device.
MAC address information is not as structured as IP address
information. EUI-48 and EUI-64 MAC addresses contain an
Organizational Unique Identifier (OUI) in the three most significant
<span class="grey">Boschi & Trammell Experimental [Page 12]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-13" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
bytes of the address; this OUI additionally contains bits noting
whether the address is locally or globally administered. Beyond
this, there is no standard relationship among the OUIs assigned to a
given vendor.
Note that MAC address information also appears within IPv6 addresses
as the EAP-64 address, or EAP-48 address encoded as an EAP-64
address, is used as the least significant 64 bits of the IPv6 address
in the case of link-local addressing or stateless autoconfiguration;
the considerations and techniques in this section may then apply to
such IPv6 addresses as well.
+-----------------------------+---------------------+
| Scheme | Action |
+-----------------------------+---------------------+
| Truncation | Generalization |
| Reverse Truncation | Generalization |
| Permutation | Direct Substitution |
| Structured Pseudonymization | Direct Substitution |
+-----------------------------+---------------------+
<span class="h4"><a class="selflink" id="section-4.2.1" href="#section-4.2.1">4.2.1</a>. Truncation</span>
Truncation removes "n" of the least significant bits from a MAC
address, replacing them with zeroes. In effect, it retains bits of
OUI, which identifies the manufacturer, while removing the least
significant bits identifying the particular device. Truncation of 24
bits of an EAP-48 or 40 bits of an EAP-64 address zeroes out the
device identifier while retaining the OUI.
Truncation is effective for making device manufacturers partially or
completely identifiable within a dataset while deleting unique host
identifiers; this can be used to retain and aggregate MAC-layer
behavior by vendor.
Truncation to an address length of 0 is equivalent to black-marker
anonymization.
<span class="h4"><a class="selflink" id="section-4.2.2" href="#section-4.2.2">4.2.2</a>. Reverse Truncation</span>
Reverse truncation removes "n" of the most significant bits from a
MAC address, replacing them with zeroes. Reverse truncation is a
non-reversible generalization scheme. This has the effect of
removing bits of the OUI, which identify manufacturers, before
removing the least significant bits. Reverse truncation of 24 bits
zeroes out the OUI.
<span class="grey">Boschi & Trammell Experimental [Page 13]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-14" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
Reverse truncation is effective for making device manufacturers
partially or completely unidentifiable within a dataset. However, it
may cause ambiguity by introducing the possibility of truncated MAC
address collision. Also, note that the utility of removing
manufacturer information is not particularly well covered by the
literature.
Reverse truncation to an address length of 0 is equivalent to black-
marker anonymization.
<span class="h4"><a class="selflink" id="section-4.2.3" href="#section-4.2.3">4.2.3</a>. Permutation</span>
Permutation is a direct substitution technique, replacing each MAC
address with an address selected from the set of possible MAC
addresses, such that each anonymized address represents a unique
original address. The selection function is often random, though it
is not necessarily so. Permutation does not preserve any structural
information about a network, but it does preserve the unique count of
devices on the network. Any application that requires more structure
than host-uniqueness will not be able to use permuted MAC addresses.
There are many variations of permutation functions, each of which has
trade-offs in performance, security, and guarantees of non-collision;
evaluating these trade-offs is implementation independent. However,
in general, permutation functions applied to anonymization SHOULD be
difficult to reverse without knowing the parameters (e.g., a secret
key for HMAC). While the EAP-48 space is larger than the IPv4
address space, hash functions applied without additional parameters
could be reversed through brute force if the hash function is known,
and SHOULD NOT be used as permutation functions. Permutation
functions may guarantee non-collision (i.e., that each anonymized
address represents a unique original address), but need not; however,
the probability of collision SHOULD be low. Nevertheless, we treat
even permutations with low but nonzero collision probability as a
direct substitution. Beyond these guidelines, recommendations for
specific permutation functions are out of scope for this document.
<span class="h4"><a class="selflink" id="section-4.2.4" href="#section-4.2.4">4.2.4</a>. Structured Pseudonymization</span>
Structured pseudonymization for MAC addresses is a direct
substitution technique, like permutation, but restricted such that
the OUI (the most significant three bytes) is permuted separately
from the node identifier, the remainder. This is useful when the
uniqueness of OUIs must be preserved for a given analysis task, but
introduces structure into the anonymized data that can be exploited
in attacks against the anonymization technique.
<span class="grey">Boschi & Trammell Experimental [Page 14]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-15" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
<span class="h3"><a class="selflink" id="section-4.3" href="#section-4.3">4.3</a>. Timestamp Anonymization</span>
The particular time at which a flow began or ended is not
particularly identifiable information, but it can be used as part of
attacks against other anonymization techniques or for user profiling,
e.g., as in [<a href="#ref-Mur07" title=""Sampled Traffic Analysis by Internet-Exchange-Level Adversaries"">Mur07</a>]. Timestamps can be used in traffic injection
attacks, which use known information about a set of traffic generated
or otherwise known by an attacker to recover mappings of other
anonymized fields, as well as to identify certain activity by
response delay and size fingerprinting, which compares response sizes
and inter-flow times in anonymized data to known values. Note that
these attacks have been shown to be relatively robust against
timestamp anonymization techniques (see [<a href="#ref-Bur10" title=""The Role of Network Trace Anonymization Under Attack"">Bur10</a>]), so the techniques
presented in this section are relatively weak and should be used with
care.
+-----------------------+----------------------------+
| Scheme | Action |
+-----------------------+----------------------------+
| Precision Degradation | Generalization |
| Enumeration | Direct or Set Substitution |
| Random Shifts | Direct Substitution |
+-----------------------+----------------------------+
<span class="h4"><a class="selflink" id="section-4.3.1" href="#section-4.3.1">4.3.1</a>. Precision Degradation</span>
Precision Degradation is a generalization technique that removes the
most precise components of a timestamp, accounting for all events
occurring in each given interval (e.g., one millisecond for
millisecond level degradation) as simultaneous. This has the effect
of potentially collapsing many timestamps into one. With this
technique, time precision is reduced and sequencing may be lost, but
the information regarding at which time the event occurred is
preserved. The anonymized data may not be generally useful for
applications that require strict sequencing of flows.
Note that flow meters with low time precision (e.g., second
precision, or millisecond precision on high-capacity networks)
perform the equivalent of precision degradation anonymization by
their design.
Also, note that degradation to a very low precision (e.g., on the
order of minutes, hours, or days) is commonly used in analyses
operating on time-series aggregated data, and may also be described
as binning; though the time scales are longer and applicability more
restricted, in principle, this is the same operation.
<span class="grey">Boschi & Trammell Experimental [Page 15]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-16" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
Precision degradation to infinitely low precision is equivalent to
black-marker anonymization. Removal of timestamp information is only
recommended for analysis tasks that have no need to separate flows in
time, for example, for counting total volumes or unique occurrences
of other flow keys in an entire dataset.
<span class="h4"><a class="selflink" id="section-4.3.2" href="#section-4.3.2">4.3.2</a>. Enumeration</span>
Enumeration is a substitution function that retains the chronological
order in which events occurred while eliminating time information.
Timestamps are substituted by equidistant timestamps (or numbers)
starting from a randomly chosen start value. The resulting data is
useful for applications requiring strict sequencing, but not for
those requiring good timing information (e.g., delay- or jitter-
measurement for quality-of-service (QoS) applications or service-
level agreement (SLA) validation).
Note that enumeration is functionally equivalent to precision
degradation in any environment into which traffic can be regularly
injected to serve as a clock at the precision of the frequency of the
injected flows.
<span class="h4"><a class="selflink" id="section-4.3.3" href="#section-4.3.3">4.3.3</a>. Random Shifts</span>
Random time shifts add a random offset to every timestamp within a
dataset. Therefore, this reversible substitution technique retains
duration and inter-event interval information as well as the
chronological order of flows. Random time shifts are quite weak and
relatively easy to reverse in the presence of external knowledge
about traffic on the measured network.
<span class="h3"><a class="selflink" id="section-4.4" href="#section-4.4">4.4</a>. Counter Anonymization</span>
Counters (such as packet and octet volumes per flow) are subject to
fingerprinting and injection attacks against anonymization or for
user profiling as timestamps are. Data sets with anonymized counters
are useful only for analysis tasks for which relative or imprecise
magnitudes of activity are useful. Counter information can also be
completely removed, but this is only recommended for analysis tasks
that have no need to evaluate the removed counter, for example, for
counting only unique occurrences of other flow keys.
<span class="grey">Boschi & Trammell Experimental [Page 16]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-17" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
+-----------------------+----------------------------+
| Scheme | Action |
+-----------------------+----------------------------+
| Precision Degradation | Generalization |
| Binning | Generalization |
| Random noise addition | Direct or Set Substitution |
+-----------------------+----------------------------+
<span class="h4"><a class="selflink" id="section-4.4.1" href="#section-4.4.1">4.4.1</a>. Precision Degradation</span>
As with precision degradation in timestamps, precision degradation of
counters removes lower-order bits of the counters, treating all the
counters in a given range as having the same value. Depending on the
precision reduction, this loses information about the relationships
between sizes of similarly sized flows, but keeps relative magnitude
information. Precision degradation to an infinitely low precision is
equivalent to black-marker anonymization.
<span class="h4"><a class="selflink" id="section-4.4.2" href="#section-4.4.2">4.4.2</a>. Binning</span>
Binning can be seen as a special case of precision degradation; the
operation is identical, except for in precision degradation the
counter ranges are uniform, and in binning, they need not be. For
example, consider separating unopened TCP connections from
potentially opened TCP connections. Here, packet counters per flow
would be binned into two bins, one for 1-2 packet flows, and one for
flows with 3 or more packets. Binning schemes are generally chosen
to keep precisely the amount of information required in a counter for
a given analysis task. Note that, also unlike precision degradation,
the bin label need not be within the bin's range. Binning counters
to a single bin is equivalent to black-marker anonymization.
<span class="h4"><a class="selflink" id="section-4.4.3" href="#section-4.4.3">4.4.3</a>. Random Noise Addition</span>
Random noise addition adds a random amount to a counter in each flow;
this is used to keep relative magnitude information and minimize the
disruption to size relationship information while avoiding
fingerprinting attacks against anonymization. Note that there is no
guarantee that random noise addition will maintain ranking order by a
counter among members of a set. Random noise addition is
particularly useful when the derived analysis data will not be
presented in such a way as to require the lower-order bits of the
counters.
<span class="grey">Boschi & Trammell Experimental [Page 17]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-18" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
<span class="h3"><a class="selflink" id="section-4.5" href="#section-4.5">4.5</a>. Anonymization of Other Flow Fields</span>
Other fields, particularly port numbers and protocol numbers, can be
used to partially identify the applications that generated the
traffic in a given flow trace. This information can be used in
fingerprinting attacks, and may be of interest on its own (e.g., to
reveal that a certain application with suspected vulnerabilities is
running on a given network). These fields are generally anonymized
using one of two techniques.
+-------------+---------------------+
| Scheme | Action |
+-------------+---------------------+
| Binning | Generalization |
| Permutation | Direct Substitution |
+-------------+---------------------+
<span class="h4"><a class="selflink" id="section-4.5.1" href="#section-4.5.1">4.5.1</a>. Binning</span>
Binning is a generalization technique mapping a set of potentially
non-uniform ranges into a set of arbitrarily labeled bins. Common
bin arrangements depend on the field type and the analysis
application. For example, an IP protocol bin arrangement may
preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all
other protocols into a single bin, to mitigate the use of uncommon
protocols in fingerprinting attacks. Another example arrangement may
bin source and destination ports into low (0-1023) and high (1024-
65535) bins in order to tell service from ephemeral ports without
identifying individual applications.
Binning other flow key fields to a single bin is equivalent to black-
marker anonymization. Removal of other flow key information is only
recommended for analysis tasks that have no need to differentiate
flows on the removed keys, for example, for total traffic counts or
unique counts of other flow keys.
<span class="h4"><a class="selflink" id="section-4.5.2" href="#section-4.5.2">4.5.2</a>. Permutation</span>
Permutation is a direct substitution technique, replacing each value
with an value selected from the set of possible range, such that each
anonymized value represents a unique original value. This is used to
preserve the count of unique values without preserving information
about, or the ordering of, the values themselves.
While permutation ideally guarantees that each anonymized value
represents a unique original value, such may require significant
state in the Intermediate Anonymization Process. Therefore,
permutation may be implemented by hashing for performance reasons,
<span class="grey">Boschi & Trammell Experimental [Page 18]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-19" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
with hash functions that may have relatively small collision
probabilities. Such techniques are still essentially direct
substitution techniques, despite the nonzero error probability.
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Parameters for the Description of Anonymization Techniques</span>
This section details the abstract parameters used to describe the
anonymization techniques examined in the previous section, on a per-
parameter basis. These parameters and their export safety inform the
design of the IPFIX anonymization metadata export specified in the
following section.
<span class="h3"><a class="selflink" id="section-5.1" href="#section-5.1">5.1</a>. Stability</span>
A stable anonymization will always map a given value in the real
space to a given value in the anonymized space, while an unstable
anonymization will change this mapping over time; a completely
unstable anonymization is essentially indistinguishable from black-
marker anonymization. Any given anonymization technique may be
applied with a varying range of stability. Stability is important
for assessing the comparability of anonymized information in
different datasets, or in the same dataset over different time
periods. In practice, an anonymization may also be stable for every
dataset published by a particular producer to a particular consumer,
stable for a stated time period within a dataset or across datasets,
or stable only for a single dataset.
If no information about stability is available, users of anonymized
data MAY assume that the techniques used are stable across the entire
dataset, but unstable across datasets. Note that stability presents
a risk-utility trade-off, as completely stable anonymization can be
used for longer-term trend analysis tasks but also presents more risk
of attack given the stable mapping. Information about the stability
of a mapping SHOULD be exported along with the anonymized data.
<span class="h3"><a class="selflink" id="section-5.2" href="#section-5.2">5.2</a>. Truncation Length</span>
Truncation and precision degradation are described by the truncation
length or the amount of data still remaining in the anonymized field
after anonymization.
Truncation length can generally be inferred from a given dataset, and
need not be specially exported or protected. For bit-level
truncation, the truncated bits are generally inferable by the least
significant bit set for an instance of an Information Element
described by a given Template (or the most significant bit set, in
the case of reverse truncation). For precision degradation, the
truncation is inferable from the maximum precision given. Note that
<span class="grey">Boschi & Trammell Experimental [Page 19]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-20" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
while this inference method is generally applicable, it is data
dependent: there is no guarantee that it will recover the exact
truncation length used to prepare the data.
In the special case of IP address export with variable (per-record)
truncation, the truncation MAY be expressed by exporting the prefix
length alongside the address.
<span class="h3"><a class="selflink" id="section-5.3" href="#section-5.3">5.3</a>. Bin Map</span>
Binning is described by the specification of a bin mapping function.
This function can be generally expressed in terms of an associative
array that maps each point in the original space to a bin, although
from an implementation standpoint most bin functions are much simpler
and more efficient.
Since the bin map for a bin mapping function is in essence the bin
mapping key, and can be used to partially deanonymize binned data,
depending on the degree of generalization, information about the bin
mapping function SHOULD NOT be exported.
<span class="h3"><a class="selflink" id="section-5.4" href="#section-5.4">5.4</a>. Permutation</span>
Like binning, permutation is described by the specification of a
permutation function. In the general case, this can be expressed in
terms of an associative array that maps each point in the original
space to a point in the anonymized space. Unlike binning, each point
in the anonymized space corresponds to a single, unique point in the
original space.
Since the parameters of the permutation function are in essence key-
like (indeed, for cryptographic permutation functions, they are the
keys themselves), information about the permutation function or its
parameters SHOULD NOT be exported.
<span class="h3"><a class="selflink" id="section-5.5" href="#section-5.5">5.5</a>. Shift Amount</span>
Shifting requires an amount by which to shift each value. Since the
shift amount is the only key to a shift function, and can be used to
trivially deanonymize data protected by shifting, information about
the shift amount SHOULD NOT be exported.
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. Anonymization Export Support in IPFIX</span>
Anonymized data exported via IPFIX SHOULD be annotated with
anonymization metadata, which details which fields described by which
Templates are anonymized, and provides appropriate information on the
anonymization techniques used. This metadata SHOULD be exported in
<span class="grey">Boschi & Trammell Experimental [Page 20]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-21" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
Data Records described by the recommended Options Templates described
in this section; these Options Templates use the additional
Information Elements described in the following subsection.
Note that fields anonymized using the black-marker (removal)
technique do not require any special metadata support: black-marker
anonymized fields SHOULD NOT be exported at all, by omitting the
corresponding Information Elements from Template describing the Data
Set. In the case where application requirements dictate that a
black-marker anonymized field must remain in a Template, then an
Exporting Process MAY export black-marker anonymized fields with
their native length as all-zeros, but only in cases where enough
contextual information exists within the record to differentiate a
black-marker anonymized field exported in this way from a real zero
value.
<span class="h3"><a class="selflink" id="section-6.1" href="#section-6.1">6.1</a>. Anonymization Records and the Anonymization Options Template</span>
The Anonymization Options Template describes Anonymization Records,
which allow anonymization metadata to be exported inline over IPFIX
or stored in an IPFIX File, by binding information about
anonymization techniques to Information Elements within defined
Templates or Options Templates. IPFIX Exporting Processes SHOULD
export anonymization records for any Template describing exported
anonymized Data Records; IPFIX Collecting Processes and processes
downstream from them MAY use anonymization records to treat
anonymized data differently depending on the applied technique.
Anonymization Records contain ancillary information bound to a
Template, so many of the considerations for Templates apply to
Anonymization Records as well. First, reliability is important: an
Exporting Process SHOULD export Anonymization Records after the
Templates they describe have been exported, and SHOULD export
anonymization records reliably if supported by the underlying
transport (i.e., without partial reliability when using Stream
Control Transmission Protocol (SCTP)).
Anonymization Records MUST be handled by Collecting Processes as
scoped to the Template to which they apply within the Transport
Session in which they are sent. When a Template is withdrawn via a
Template Withdrawal Message or expires during a UDP transport
session, the accompanying Anonymization Records are withdrawn or
expire as well and do not apply to subsequent Templates with the same
Template ID within the Session unless re-exported.
The Stability Class within the anonymizationFlags IE can be used to
declare that a given anonymization technique's mapping will remain
stable across multiple sessions, but this does not mean that
<span class="grey">Boschi & Trammell Experimental [Page 21]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-22" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
anonymization technique information given in the Anonymization
Records themselves persist across Sessions. Each new Transport
Session MUST contain new Anonymization Records for each Template
describing anonymized Data Sets.
SCTP per-stream export [<a href="#ref-IPFIX-PERSTREAM">IPFIX-PERSTREAM</a>] may be used to ease
management of Anonymization Records if appropriate for the
application.
The fields of the Anonymization Options Template are as follows:
+-------------------------+-----------------------------------------+
| IE | Description |
+-------------------------+-----------------------------------------+
| templateId [scope] | The Template ID of the Template or |
| | Options Template containing the |
| | Information Element described by this |
| | anonymization record. This Information |
| | Element MUST be defined as a Scope |
| | Field. |
| informationElementId | The Information Element identifier of |
| [scope] | the Information Element described by |
| | this anonymization record. This |
| | Information Element MUST be defined as |
| | a Scope Field. Exporting Processes |
| | MUST clear then Enterprise bit of the |
| | informationElementId and Collecting |
| | Processes SHOULD ignore it; information |
| | about enterprise-specific Information |
| | Elements is exported via the |
| | privateEnterpriseNumber Information |
| | Element. |
| privateEnterpriseNumber | The Private Enterprise Number of the |
| [scope] [optional] | enterprise-specific Information Element |
| | described by this anonymization record. |
| | This Information Element MUST be |
| | defined as a Scope Field if present. A |
| | privateEnterpriseNumber of 0 signifies |
| | that the Information Element is |
| | IANA-registered. |
| informationElementIndex | The Information Element index of the |
| [scope] [optional] | instance of the Information Element |
| | described by this anonymization record |
| | identified by the informationElementId |
| | within the Template. Optional; need |
| | only be present when describing |
| | Templates that have multiple instances |
| | of the same Information Element. This |
<span class="grey">Boschi & Trammell Experimental [Page 22]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-23" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
| | Information Element MUST be defined as |
| | a Scope Field if present. This |
| | Information Element is defined in |
| | <a href="#section-6.2">Section 6.2</a>. |
| anonymizationFlags | Flags describing the mapping stability |
| | and specialized modifications to the |
| | Anonymization Technique in use. SHOULD |
| | be present. This Information Element |
| | is defined in <a href="#section-6.2.3">Section 6.2.3</a>. |
| anonymizationTechnique | The technique used to anonymize the |
| | data. MUST be present. This |
| | Information Element is defined in |
| | <a href="#section-6.2.2">Section 6.2.2</a>. |
+-------------------------+-----------------------------------------+
<span class="h3"><a class="selflink" id="section-6.2" href="#section-6.2">6.2</a>. Recommended Information Elements for Anonymization Metadata</span>
<span class="h4"><a class="selflink" id="section-6.2.1" href="#section-6.2.1">6.2.1</a>. informationElementIndex</span>
Description: A zero-based index of an Information Element
referenced by informationElementId within a Template referenced by
templateId; used to disambiguate scope for templates containing
multiple identical Information Elements.
Abstract Data Type: unsigned16
Data Type Semantics: identifier
ElementId: 287
Status: Current
<span class="h4"><a class="selflink" id="section-6.2.2" href="#section-6.2.2">6.2.2</a>. anonymizationTechnique</span>
Description: A description of the anonymization technique applied
to a referenced Information Element within a referenced Template.
Each technique may be applicable only to certain Information
Elements and recommended only for certain Information Elements;
these restrictions are noted in the table below.
<span class="grey">Boschi & Trammell Experimental [Page 23]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-24" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
+-------+---------------------------+-----------------+-------------+
| Value | Description | Applicable to | Recommended |
| | | | for |
+-------+---------------------------+-----------------+-------------+
| 0 | Undefined: the Exporting | all | all |
| | Process makes no | | |
| | representation as to | | |
| | whether or not the | | |
| | defined field is | | |
| | anonymized. While the | | |
| | Collecting Process MAY | | |
| | assume that the field is | | |
| | not anonymized, it is not | | |
| | guaranteed not to be. | | |
| | This is the default | | |
| | anonymization technique. | | |
| 1 | None: the values exported | all | all |
| | are real. | | |
| 2 | Precision | all | all |
| | Degradation/Truncation: | | |
| | the values exported are | | |
| | anonymized using simple | | |
| | precision degradation or | | |
| | truncation. The new | | |
| | precision or number of | | |
| | truncated bits is | | |
| | implicit in the exported | | |
| | data and can be deduced | | |
| | by the Collecting | | |
| | Process. | | |
| 3 | Binning: the values | all | all |
| | exported are anonymized | | |
| | into bins. | | |
| 4 | Enumeration: the values | all | timestamps |
| | exported are anonymized | | |
| | by enumeration. | | |
| 5 | Permutation: the values | all | identifiers |
| | exported are anonymized | | |
| | by permutation. | | |
| 6 | Structured Permutation: | addresses | |
| | the values exported are | | |
| | anonymized by | | |
| | permutation, preserving | | |
| | bit-level structure as | | |
| | appropriate; this | | |
| | represents | | |
| | prefix-preserving IP | | |
| | address anonymization or | | |
<span class="grey">Boschi & Trammell Experimental [Page 24]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-25" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
| | structured MAC address | | |
| | anonymization. | | |
| 7 | Reverse Truncation: the | addresses | |
| | values exported are | | |
| | anonymized using reverse | | |
| | truncation. The number | | |
| | of truncated bits is | | |
| | implicit in the exported | | |
| | data, and can be deduced | | |
| | by the Collecting | | |
| | Process. | | |
| 8 | Noise: the values | non-identifiers | counters |
| | exported are anonymized | | |
| | by adding random noise to | | |
| | each value. | | |
| 9 | Offset: the values | all | timestamps |
| | exported are anonymized | | |
| | by adding a single offset | | |
| | to all values. | | |
+-------+---------------------------+-----------------+-------------+
Abstract Data Type: unsigned16
Data Type Semantics: identifier
ElementId: 286
Status: Current
<span class="h4"><a class="selflink" id="section-6.2.3" href="#section-6.2.3">6.2.3</a>. anonymizationFlags</span>
Description: A flag word describing specialized modifications to
the anonymization policy in effect for the anonymization technique
applied to a referenced Information Element within a referenced
Template. When flags are clear (0), the normal policy (as
described by anonymizationTechnique) applies without modification.
MSB 14 13 12 11 10 9 8 7 6 5 4 3 2 1 LSB
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| Reserved |LOR|PmA| SC |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
anonymizationFlags IE
<span class="grey">Boschi & Trammell Experimental [Page 25]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-26" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
+--------+----------+-----------------------------------------------+
| bit(s) | name | description |
| (LSB = | | |
| 0) | | |
+--------+----------+-----------------------------------------------+
| 0-1 | SC | Stability Class: see the Stability Class |
| | | table below, and <a href="#section-5.1">Section 5.1</a>. |
| 2 | PmA | Perimeter Anonymization: when set (1), source |
| | | Information Elements as described in |
| | | [<a href="./rfc5103" title=""Bidirectional Flow Export Using IP Flow Information Export (IPFIX)"">RFC5103</a>] are interpreted as external |
| | | addresses, and destination Information |
| | | Elements as described in [<a href="./rfc5103" title=""Bidirectional Flow Export Using IP Flow Information Export (IPFIX)"">RFC5103</a>] are |
| | | interpreted as internal addresses, for the |
| | | purposes of associating |
| | | anonymizationTechnique to Information |
| | | Elements only; see <a href="#section-7.2.2">Section 7.2.2</a> for details. |
| | | This bit MUST NOT be set when associated with |
| | | a non-endpoint (i.e., source or destination) |
| | | Information Element. SHOULD be consistent |
| | | within a record (i.e., if a source |
| | | Information Element has this flag set, the |
| | | corresponding destination element SHOULD have |
| | | this flag set, and vice versa.) |
| 3 | LOR | Low-Order Unchanged: when set (1), the |
| | | low-order bits of the anonymized Information |
| | | Element contain real data. This modification |
| | | is intended for the anonymization of |
| | | network-level addresses while leaving |
| | | host-level addresses intact in order to |
| | | preserve host level-structure, which could |
| | | otherwise be used to reverse anonymization. |
| | | MUST NOT be set when associated with a |
| | | truncation-based anonymizationTechnique. |
| 4-15 | Reserved | Reserved for future use: SHOULD be cleared |
| | | (0) by the Exporting Process and MUST be |
| | | ignored by the Collecting Process. |
+--------+----------+-----------------------------------------------+
The Stability Class portion of this flags word describes the
stability class of the anonymization technique applied to a
referenced Information Element within a referenced Template.
Stability classes refer to the stability of the parameters of the
anonymization technique, and therefore the comparability of the
mapping between the real and anonymized values over time. This
determines which anonymized datasets may be compared with each
other. Values are as follows:
<span class="grey">Boschi & Trammell Experimental [Page 26]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-27" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
+-----+-----+-------------------------------------------------------+
| Bit | Bit | Description |
| 1 | 0 | |
+-----+-----+-------------------------------------------------------+
| 0 | 0 | Undefined: the Exporting Process makes no |
| | | representation as to how stable the mapping is, or |
| | | over what time period values of this field will |
| | | remain comparable; while the Collecting Process MAY |
| | | assume Session level stability, Session level |
| | | stability is not guaranteed. Processes SHOULD assume |
| | | this is the case in the absence of stability class |
| | | information; this is the default stability class. |
| 0 | 1 | Session: the Exporting Process will ensure that the |
| | | parameters of the anonymization technique are stable |
| | | during the Transport Session. All the values of the |
| | | described Information Element for each Record |
| | | described by the referenced Template within the |
| | | Transport Session are comparable. The Exporting |
| | | Process SHOULD endeavor to ensure at least this |
| | | stability class. |
| 1 | 0 | Exporter-Collector Pair: the Exporting Process will |
| | | ensure that the parameters of the anonymization |
| | | technique are stable across Transport Sessions over |
| | | time with the given Collecting Process, but may use |
| | | different parameters for different Collecting |
| | | Processes. Data exported to different Collecting |
| | | Processes are not comparable. |
| 1 | 1 | Stable: the Exporting Process will ensure that the |
| | | parameters of the anonymization technique are stable |
| | | across Transport Sessions over time, regardless of |
| | | the Collecting Process to which it is sent. |
+-----+-----+-------------------------------------------------------+
Abstract Data Type: unsigned16
Data Type Semantics: flags
ElementId: 285
Status: Current
<span class="h2"><a class="selflink" id="section-7" href="#section-7">7</a>. Applying Anonymization Techniques to IPFIX Export and Storage</span>
When exporting or storing anonymized flow data using IPFIX, certain
interactions between the IPFIX protocol and the anonymization
techniques in use must be considered; these are treated in the
subsections below.
<span class="grey">Boschi & Trammell Experimental [Page 27]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-28" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
<span class="h3"><a class="selflink" id="section-7.1" href="#section-7.1">7.1</a>. Arrangement of Processes in IPFIX Anonymization</span>
Anonymization may be applied to IPFIX data at three stages within the
collection infrastructure: on initial export, at a mediator, or after
collection, as shown in Figure 1. Each of these locations has
specific considerations and applicability.
+==========================================+
| Exporting Process |
+==========================================+
| |
| (Anonymized at Original Exporter) |
V |
+=============================+ |
| Mediator | |
+=============================+ |
| |
| (Anonymizing Mediator) |
V V
+==========================================+
| Collecting Process |
+==========================================+
|
| (Anonymizing CP/File Writer)
V
+--------------------+
| IPFIX File Storage |
+--------------------+
Figure 1: Potential Anonymization Locations
Anonymization is generally performed before the wider dissemination
or repurposing of a dataset, e.g., adapting operational measurement
data for research. Therefore, direct anonymization of flow data on
initial export is only applicable in certain restricted
circumstances: when the Exporting Process (EP) is "publishing" data
to a Collecting Process (CP) directly, and the Exporting Process and
Collecting Process are operated by different entities. Note that
certain guidelines in <a href="#section-7.2.3">Section 7.2.3</a> with respect to timestamp
anonymization may not apply in this case, as the Collecting Process
may be able to deduce certain timing information from the time at
which each Message is received.
A much more flexible arrangement is to anonymize data within a
Mediator [<a href="./rfc6183" title=""IP Flow Information Export (IPFIX) Mediation: Framework"">RFC6183</a>]. Here, original data is sent to a Mediator, which
performs the anonymization function and re-exports the anonymized
data. Such a Mediator could be located at the administrative domain
boundary of the initial Exporting Process operator, exporting
<span class="grey">Boschi & Trammell Experimental [Page 28]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-29" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
anonymized data to other consumers outside the organization. In this
case, the original Exporter SHOULD use TLS [<a href="./rfc5246" title=""The Transport Layer Security (TLS) Protocol Version 1.2"">RFC5246</a>] as specified in
[<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>] to secure the channel to the Mediator, and the Mediator
should follow the guidelines in <a href="#section-7.2">Section 7.2</a>, to mitigate the risk of
original data disclosure.
When data is to be published as an anonymized dataset in an IPFIX
File [<a href="./rfc5655" title=""Specification of the IP Flow Information Export (IPFIX) File Format"">RFC5655</a>], the anonymization may be done at the final Collecting
Process before storage and dissemination, as well. In this case, the
Collector should follow the guidelines in <a href="#section-7.2">Section 7.2</a>, especially as
regards File-specific Options in <a href="#section-7.2.4">Section 7.2.4</a>
In each of these data flows, the anonymization of records is
undertaken by an Intermediate Anonymization Process (IAP); the data
flows into and out of this IAP are shown in Figure 2 below.
packets --+ +- IPFIX Messages -+
| | |
V V V
+==================+ +====================+ +=============+
| Metering Process | | Collecting Process | | File Reader |
+==================+ +====================+ +=============+
| Non-anonymized | Records |
V V V
+=========================================================+
| Intermediate Anonymization Process (IAP) |
+=========================================================+
| Anonymized ^ Anonymized |
| Records | Records |
V | V
+===================+ Anonymization +=============+
| Exporting Process |<--- Parameters ------>| File Writer |
+===================+ +=============+
| |
+------------> IPFIX Messages <----------+
Figure 2: Data Flows through the Anonymization Process
Anonymization parameters must also be available to the Exporting
Process and/or File Writer in order to ensure header data is also
appropriately anonymized as in <a href="#section-7.2.3">Section 7.2.3</a>.
Following each of the data flows through the IAP, we describe five
basic types of anonymization arrangements within this framework in
Figure 3. In addition to the three arrangements described in detail
above, anonymization can also be done at a collocated Metering
<span class="grey">Boschi & Trammell Experimental [Page 29]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-30" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
Process (MP) and File Writer (FW) (see <a href="./rfc5655#section-7.3.2">Section 7.3.2 of [RFC5655]</a>),
or at a file manipulator, which combines a File Writer with a File
Reader (FR) (see <a href="./rfc5655#section-7.3.7">Section 7.3.7 of [RFC5655]</a>).
+----+ +-----+ +----+
pkts -> | MP |->| IAP |->| EP |-> Anonymization on Original Exporter
+----+ +-----+ +----+
+----+ +-----+ +----+
pkts -> | MP |->| IAP |->| FW |-> Anonymizing collocated MP/File Writer
+----+ +-----+ +----+
+----+ +-----+ +----+
IPFIX -> | CP |->| IAP |->| EP |-> Anonymizing Mediator (Masq. Proxy)
+----+ +-----+ +----+
+----+ +-----+ +----+
IPFIX -> | CP |->| IAP |->| FW |-> Anonymizing collocated CP/File Writer
+----+ +-----+ +----+
+----+ +-----+ +----+
IPFIX -> | FR |->| IAP |->| FW |-> Anonymizing file manipulator
File +----+ +-----+ +----+
Figure 3: Possible Anonymization Arrangements in the IPFIX
Architecture
Note that anonymization may occur at more than one location within a
given collection infrastructure, to provide varying levels of
anonymization, disclosure risk, or data utility for specific
purposes.
<span class="h3"><a class="selflink" id="section-7.2" href="#section-7.2">7.2</a>. IPFIX-Specific Anonymization Guidelines</span>
In implementing and deploying the anonymization techniques described
in this document, implementors should note that IPFIX already
provides features that support anonymized data export, and use these
where appropriate. Care must also be taken that data structures
supporting the operation of the protocol itself do not leak data that
could be used to reverse the anonymization applied to the flow data.
Such data structures may appear in the header, or within the data
stream itself, especially as options data. Each of these and their
impact on specific anonymization techniques is noted in a separate
subsection below.
<span class="h4"><a class="selflink" id="section-7.2.1" href="#section-7.2.1">7.2.1</a>. Appropriate Use of Information Elements for Anonymized Data</span>
Note, as in <a href="#section-6">Section 6</a> above, that black-marker anonymized fields
SHOULD NOT be exported at all; the absence of the field in a given
Data Set is implicitly declared by not including the corresponding
Information Element in the Template describing that Data Set.
<span class="grey">Boschi & Trammell Experimental [Page 30]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-31" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
When using precision degradation of timestamps, Exporting Processes
SHOULD export timing information using Information Elements of an
appropriate precision, as explained in <a href="./rfc5153#section-4.5">Section 4.5 of [RFC5153]</a>. For
example, timestamps measured in millisecond-level precision and
degraded to second-level precision should use flowStartSeconds and
flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds.
When exporting anonymized data and anonymization metadata, Exporting
Processes SHOULD ensure that the combination of Information Element
and declared anonymization technique are compatible. Specifically,
the applicable and recommended Information Element types and
semantics for each technique are noted in the description of the
anonymizationTechnique Information Element in <a href="#section-6.2.2">Section 6.2.2</a>. In this
description, a timestamp is an Information Element with the data type
dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or
dateTimeNanoseconds; an address is an Information Element with the
data type ipv4Address, ipv6Address, or macAddress; and an identifier
is an Information Element with identifier data type semantics.
Exporting Process MUST NOT export Anonymization Options records
binding techniques to Information Elements to which they are not
applicable, and SHOULD NOT export Anonymization Options records
binding techniques to Information Elements for which they are not
recommended.
<span class="h4"><a class="selflink" id="section-7.2.2" href="#section-7.2.2">7.2.2</a>. Export of Perimeter-Based Anonymization Policies</span>
Data collected from a single network may require different
anonymization policies for addresses internal and external to the
network. For example, internal addresses could be subject to simple
permutation, while external addresses could be aggregated into
networks by truncation. When exporting anonymized perimeter
bidirectional flow (biflow) data as in <a href="./rfc5103#section-5.2">Section 5.2 of [RFC5103]</a>, this
arrangement may be easily represented by specifying one technique for
source endpoint information (which represents the external endpoint
in a perimeter biflow) and one technique for destination endpoint
information (which represents the internal address in a perimeter
biflow).
However, it can also be useful to represent perimeter-based
anonymization policies with unidirectional flow (uniflow), or non-
perimeter biflow data. In this case, the Perimeter Anonymization bit
(bit 2) in the anonymizationFlags Information Element describing the
anonymized address Information Elements can be set to change the
meaning of "source" and "destination" of Information Elements to mean
"external" and "internal" as with perimeter biflows, but only with
respect to anonymization policies.
<span class="grey">Boschi & Trammell Experimental [Page 31]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-32" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
<span class="h4"><a class="selflink" id="section-7.2.3" href="#section-7.2.3">7.2.3</a>. Anonymization of Header Data</span>
Each IPFIX Message contains a Message Header; within this Message
Header are contained two fields which may be used to break certain
anonymization techniques: the Export Time, and the Observation Domain
ID.
Export of IPFIX Messages containing anonymized timestamp data where
the original Export Time Message header has some relationship to the
anonymized timestamps SHOULD anonymize the Export Time header field
so that the Export Time is consistent with the anonymized timestamp
data. Otherwise, relationships between export and flow time could be
used to partially or totally reverse timestamp anonymization. When
anonymizing timestamps and the Export Time header field SHOULD avoid
times too far in the past or future; while [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>] does not make
any allowance for Export Time error detection, it is sensible that
Collecting Processes may interpret Messages with seemingly
nonsensical Export Times as erroneous. Specific limits are
implementation dependent, but this issue may cause interoperability
issues when anonymizing the Export Time header field.
The similarity in size between an Observation Domain ID and an IPv4
address (32 bits) may lead to a temptation to use an IPv4 interface
address on the Metering or Exporting Process as the Observation
Domain ID. If this address bears some relation to the IP addresses
in the flow data (e.g., shares a network prefix with internal
addresses) and the IP addresses in the flow data are anonymized in a
structure-preserving way, then the Observation Domain ID may be used
to break the IP address anonymization. Use of an IPv4 interface
address on the Metering or Exporting Process as the Observation
Domain ID is NOT RECOMMENDED in this case.
<span class="h4"><a class="selflink" id="section-7.2.4" href="#section-7.2.4">7.2.4</a>. Anonymization of Options Data</span>
IPFIX uses the Options mechanism to export, among other things,
metadata about exported flows and the flow collection infrastructure.
As with the IPFIX Message Header, certain Options recommended in
[<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>] and [<a href="./rfc5655" title=""Specification of the IP Flow Information Export (IPFIX) File Format"">RFC5655</a>] containing flow timestamps and network
addresses of Exporting and Collecting Processes may be used to break
certain anonymization techniques. When using these Options along
anonymized data export and storage, values within the Options that
could be used to break the anonymization SHOULD themselves be
anonymized or omitted.
The Exporting Process Reliability Statistics Options Template,
recommended in [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>], contains an Exporting Process ID field,
which may be an exportingProcessIPv4Address Information Element or an
exportingProcessIPv6Address Information Element. If the Exporting
<span class="grey">Boschi & Trammell Experimental [Page 32]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-33" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
Process address bears some relation to the IP addresses in the flow
data (e.g., shares a network prefix with internal addresses) and the
IP addresses in the flow data are anonymized in a structure-
preserving way, then the Exporting Process address may be used to
break the IP address anonymization. Exporting Processes exporting
anonymized data in this situation SHOULD mitigate the risk of attack
either by omitting Options described by the Exporting Process
Reliability Statistics Options Template or by anonymizing the
Exporting Process address using a similar technique to that used to
anonymize the IP addresses in the exported data.
Similarly, the Export Session Details Options Template and Message
Details Options Template specified for the IPFIX File Format
[<a href="./rfc5655" title=""Specification of the IP Flow Information Export (IPFIX) File Format"">RFC5655</a>] may contain the exportingProcessIPv4Address Information
Element or the exportingProcessIPv6Address Information Element to
identify an Exporting Process from which a flow record was received,
and the collectingProcessIPv4Address Information Element or the
collectingProcessIPv6Address Information Element to identify the
Collecting Process which received it. If the Exporting Process or
Collecting Process address bears some relation to the IP addresses in
the dataset (e.g., shares a network prefix with internal addresses)
and the IP addresses in the dataset are anonymized in a structure-
preserving way, then the Exporting Process or Collecting Process
address may be used to break the IP address anonymization. Since
these Options Templates are primarily intended for storing IPFIX
Transport Session data for auditing, replay, and testing purposes, it
is NOT RECOMMENDED that storage of anonymized data include these
Options Templates in order to mitigate the risk of attack.
The Message Details Options Template specified for the IPFIX File
Format [<a href="./rfc5655" title=""Specification of the IP Flow Information Export (IPFIX) File Format"">RFC5655</a>] also contains the collectionTimeMilliseconds
Information Element. As with the Export Time Message Header field,
if the exported dataset contains anonymized timestamp information,
and the collectionTimeMilliseconds Information Element in a given
Message has some relationship to the anonymized timestamp
information, then this relationship can be exploited to reverse the
timestamp anonymization. Since this Options Template is primarily
intended for storing IPFIX Transport Session data for auditing,
replay, and testing purposes, it is NOT RECOMMENDED that storage of
anonymized data include this Options Template in order to mitigate
the risk of attack.
Since the Time Window Options Template specified for the IPFIX File
Format [<a href="./rfc5655" title=""Specification of the IP Flow Information Export (IPFIX) File Format"">RFC5655</a>] refers to the timestamps within the dataset to
provide partial table of contents information for an IPFIX File,
Options described by this Template SHOULD be written using the
anonymized timestamps instead of the original ones.
<span class="grey">Boschi & Trammell Experimental [Page 33]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-34" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
<span class="h4"><a class="selflink" id="section-7.2.5" href="#section-7.2.5">7.2.5</a>. Special-Use Address Space Considerations</span>
When anonymizing data for transport or storage using IPFIX containing
anonymized IP addresses, and the analysis purpose permits doing so,
it is RECOMMENDED to filter out or leave unanonymized data containing
the special-use IPv4 addresses enumerated in [<a href="./rfc5735" title=""Special Use IPv4 Addresses"">RFC5735</a>] or the
special-use IPv6 addresses enumerated in [<a href="./rfc5156" title=""Special-Use IPv6 Addresses"">RFC5156</a>]. Data containing
these addresses (e.g. 0.0.0.0 and 169.254.0.0/16 for link-local
autoconfiguration in IPv4 space) are often associated with specific,
well-known behavioral patterns. Detection of these patterns in
anonymized data can lead to deanonymization of these special-use
addresses, which increases the chance of a complete reversal of
anonymization by an attacker, especially of prefix-preserving
techniques.
<span class="h4"><a class="selflink" id="section-7.2.6" href="#section-7.2.6">7.2.6</a>. Protecting Out-of-Band Configuration and Management Data</span>
Special care should be taken when exporting or sharing anonymized
data to avoid information leakage via the configuration or management
planes of the IPFIX Device containing the Exporting Process or the
File Writer. For example, adding noise to counters is useless if the
receiver can deduce the values in the counters from Simple Network
Management Protocol (SNMP) information, and concealing the network
under test is similarly useless if such information is available in a
configuration document. As the specifics of these concerns are
largely implementation and deployment dependent, specific mitigation
is out of scope for this document. The general ground rule is that
information of similar type to that anonymized SHOULD NOT be made
available to the receiver by any means, whether in the Data Records,
in IPFIX protocol structures such as Message Headers, or out of band.
<span class="h2"><a class="selflink" id="section-8" href="#section-8">8</a>. Examples</span>
In this example, consider the export or storage of an anonymized IPv4
dataset from a single network described by a simple Template
containing a timestamp in seconds, a five-tuple, and packet and octet
counters. The Template describing each record in this Data Set is
shown in Figure 4.
<span class="grey">Boschi & Trammell Experimental [Page 34]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-35" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 2 | Length = 40 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template ID = 256 | Field Count = 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| flowStartSeconds 150 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| sourceIPv4Address 8 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| destinationIPv4Address 12 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| sourceTransportPort 7 | Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| destinationTransportPort 11 | Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| packetDeltaCount 2 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| octetDeltaCount 1 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| protocolIdentifier 4 | Field Length = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Example Flow Template
Suppose that this Data Set is anonymized according to the following
policy:
o IP addresses within the network are protected by reverse
truncation.
o IP addresses outside the network are protected by prefix-
preserving anonymization.
o Octet counts are exported using degraded precision in order to
provide minimal protection against fingerprinting attacks.
o All other fields are exported unanonymized.
In order to export Anonymization Records for this Template and
policy, first, the Anonymization Options Template shown in Figure 5
is exported. For this example, the optional privateEnterpriseNumber
and informationElementIndex Information Elements are omitted, because
they are not used.
<span class="grey">Boschi & Trammell Experimental [Page 35]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-36" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 3 | Length = 26 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template ID = 257 | Field Count = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Scope Field Count = 2 |0| templateID 145 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| informationElementId 303 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| anonymizationFlags 285 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| anonymizationTechnique 286 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: Example Anonymization Options Template
Following the Anonymization Options Template comes a Data Set
containing Anonymization Records. This dataset has an entry for each
Information Element Specifier in Template 256 describing the flow
records. This Data Set is shown in Figure 6. Note that
sourceIPv4Address and destinationIPv4Address have the Perimeter
Anonymization (0x0004) flag set in anonymizationFlags, meaning that
source address should be treated as network-external, and the
destination address as network-internal.
<span class="grey">Boschi & Trammell Experimental [Page 36]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-37" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 257 | Length = 68 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | flowStartSeconds IE 150 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | sourceIPv4Address IE 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Perimeter, Session SC 0x0005 | Structured Permutation 6 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | destinationIPv4Address IE 12 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Perimeter, Stable 0x0007 | Reverse Truncation 7 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | sourceTransportPort IE 7 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | dest.TransportPort IE 11 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | packetDeltaCount IE 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | octetDeltaCount IE 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stable 0x0003 | Precision Degradation 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | protocolIdentifier IE 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6: Example Anonymization Records
Following the Anonymization Records come the Data Sets containing the
anonymized data, exported according to the Template in Figure 4.
Bringing it all together, consider an IPFIX Message containing three
real data records and the necessary templates to export them, shown
in Figure 7. (Note that the scale of this message is 8-bytes per
line, for compactness; lines of dots '. . . . . ' represent shifting
of the example bit structure for clarity.)
<span class="grey">Boschi & Trammell Experimental [Page 37]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-38" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
1 2 3 4 5 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x000a | length 135 | export time 1271227717 | msg
| sequence 0 | domain 1 | hdr
| SetID 2 | length 40 | tid 256 | fields 8 | tmpl
| IE 150 | length 4 | IE 8 | length 4 | set
| IE 12 | length 4 | IE 7 | length 2 |
| IE 11 | length 2 | IE 2 | length 4 |
| IE 1 | length 4 | IE 4 | length 1 |
| SetID 256 | length 79 | time 1271227681 | data
| sip 192.0.2.3 | dip 198.51.100.7 | set
| sp 53 | dp 53 | packets 1 |
| bytes 74 | prt 17 | . . . . . . . . . . .
| time 1271227682 | sip 198.51.100.7 |
| dip 192.0.2.88 | sp 5091 | dp 80 |
| packets 60 | bytes 2896 |
| prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . .
| time 1271227683 | sip 198.51.100.7 |
| dip 203.0.113.9 | sp 5092 | dp 80 |
| packets 44 | bytes 2037 |
| prt 6 |
+---------+
Figure 7: Example Real Message
The corresponding anonymized message is then shown in Figure 8. The
Options Template Set describing Anonymization Records and the
Anonymization Records themselves are added; IP addresses and byte
counts are anonymized as declared.
<span class="grey">Boschi & Trammell Experimental [Page 38]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-39" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
1 2 3 4 5 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x000a | length 233 | export time 1271227717 | msg
| sequence 0 | domain 1 | hdr
| SetID 2 | length 40 | tid 256 | fields 8 | tmpl
| IE 150 | length 4 | IE 8 | length 4 | set
| IE 12 | length 4 | IE 7 | length 2 |
| IE 11 | length 2 | IE 2 | length 4 |
| IE 1 | length 4 | IE 4 | length 1 |
| SetID 3 | length 30 | tid 257 | fields 4 | opt
| scope 2 | . . . . . . . . . . . . . . . . . . . . . . . . tmpl
| IE 145 | length 2 | IE 303 | length 2 | set
| IE 285 | length 2 | IE 286 | length 2 |
| SetID 257 | length 68 | . . . . . . . . . . . . . . . . anon
| tid 256 | IE 150 | flags 0 | tech 1 | recs
| tid 256 | IE 8 | flags 5 | tech 6 |
| tid 256 | IE 12 | flags 7 | tech 7 |
| tid 256 | IE 7 | flags 0 | tech 1 |
| tid 256 | IE 11 | flags 0 | tech 1 |
| tid 256 | IE 2 | flags 0 | tech 1 |
| tid 256 | IE 1 | flags 3 | tech 2 |
| tid 256 | IE41 | flags 0 | tech 1 |
| SetID 256 | length 79 | time 1271227681 | data
| sip 254.202.119.209 | dip 0.0.0.7 | set
| sp 53 | dp 53 | packets 1 |
| bytes 100 | prt 17 | . . . . . . . . . . .
| time 1271227682 | sip 0.0.0.7 |
| dip 254.202.119.6 | sp 5091 | dp 80 |
| packets 60 | bytes 2900 |
| prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . .
| time 1271227683 | sip 0.0.0.7 |
| dip 2.19.199.176 | sp 5092 | dp 80 |
| packets 60 | bytes 2000 |
| prt 6 |
+---------+
Figure 8: Corresponding Anonymized Message
<span class="h2"><a class="selflink" id="section-9" href="#section-9">9</a>. Security Considerations</span>
This document provides guidelines for exporting metadata about
anonymized data in IPFIX, or storing metadata about anonymized data
in IPFIX Files. It is not intended as a general statement on the
applicability of specific flow data anonymization techniques.
Exporters or publishers of anonymized data must take care that the
applied anonymization technique is appropriate for the data source,
the purpose, and the risk of deanonymization of a given application.
<span class="grey">Boschi & Trammell Experimental [Page 39]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-40" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
Research in anonymization techniques, and techniques for
deanonymization, is ongoing, and currently "safe" anonymization
techniques may be rendered unsafe by future developments.
We note specifically that anonymization is not a replacement for
encryption for confidentiality. It is only appropriate for
protecting identifying information in data to be used for purposes in
which the protected data is irrelevant. Confidentiality in export is
best served by using TLS [<a href="./rfc5246" title=""The Transport Layer Security (TLS) Protocol Version 1.2"">RFC5246</a>] or Datagram Transport Layer
Security (DTLS) [<a href="./rfc4347" title=""Datagram Transport Layer Security"">RFC4347</a>] as in the Security Considerations section
of [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>], and in long-term storage by implementation-specific
protection applied as in the Security Considerations section of
[<a href="./rfc5655" title=""Specification of the IP Flow Information Export (IPFIX) File Format"">RFC5655</a>]. Indeed, confidentiality and anonymization are not
mutually exclusive, as encryption for confidentiality may be applied
to anonymized data export or storage, as well, when the anonymized
data is not intended for public release.
We note as well that care should be taken even with well-anonymized
data, and anonymized data should still be treated as privacy
sensitive. Anonymization reduces the risk of misuse, but is not a
complete solution to the problem of protecting end-user privacy in
network flow trace analysis.
When using pseudonymization techniques that have a mutable mapping,
there is an inherent trade-off in the stability of the map between
long-term comparability and security of the dataset against
deanonymization. In general, deanonymization attacks are more
effective given more information, so the longer a given mapping is
valid, the more information can be applied to deanonymization. The
specific details of this are technique-dependent and therefore out of
the scope of this document.
When releasing anonymized data, publishers need to ensure that data
that could be used in deanonymization is not leaked through a side
channel. The entire workflow (hardware, software, operational
policies and procedures, etc.) for handling anonymized data must be
evaluated for risk of data leakage. While most of these possible
side channels are out of scope for this document, guidelines for
reducing the risk of information leakage specific to the IPFIX export
protocol are provided in <a href="#section-7.2">Section 7.2</a>.
Note as well that the Security Considerations section of [<a href="./rfc5101" title=""Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information"">RFC5101</a>]
applies as well to the export of anonymized data, and the Security
Considerations section of [<a href="./rfc5655" title=""Specification of the IP Flow Information Export (IPFIX) File Format"">RFC5655</a>] to the storage of anonymized
data, or the publication of anonymized traces.
<span class="grey">Boschi & Trammell Experimental [Page 40]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-41" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
<span class="h2"><a class="selflink" id="section-10" href="#section-10">10</a>. IANA Considerations</span>
This document specifies the creation of several new IPFIX Information
Elements in the IPFIX Information Element registry available from the
IANA site (<a href="http://www.iana.org">http://www.iana.org</a>), as defined in <a href="#section-6.2">Section 6.2</a>. IANA has
assigned the following Information Element numbers for their
respective Information Elements as specified below:
o Information Element number 285 for the anonymizationFlags
Information Element.
o Information Element number 286 for the anonymizationTechnique
Information Element.
o Information Element number 287 for the informationElementIndex
Information Element.
<span class="h2"><a class="selflink" id="section-11" href="#section-11">11</a>. Acknowledgments</span>
We thank Paul Aitken and John McHugh for their comments and insight,
and Carsten Schmoll, Benoit Claise, Lothar Braun, Dan Romascanu,
Stewart Bryant, and Sean Turner for their reviews. Special thanks to
the FP7 PRISM and DEMONS projects for their material support of this
work.
<span class="h2"><a class="selflink" id="section-12" href="#section-12">12</a>. References</span>
<span class="h3"><a class="selflink" id="section-12.1" href="#section-12.1">12.1</a>. Normative References</span>
[<a id="ref-RFC5101">RFC5101</a>] Claise, B., "Specification of the IP Flow Information
Export (IPFIX) Protocol for the Exchange of IP Traffic
Flow Information", <a href="./rfc5101">RFC 5101</a>, January 2008.
[<a id="ref-RFC5102">RFC5102</a>] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
Meyer, "Information Model for IP Flow Information Export",
<a href="./rfc5102">RFC 5102</a>, January 2008.
[<a id="ref-RFC5103">RFC5103</a>] Trammell, B. and E. Boschi, "Bidirectional Flow Export
Using IP Flow Information Export (IPFIX)", <a href="./rfc5103">RFC 5103</a>,
January 2008.
[<a id="ref-RFC5655">RFC5655</a>] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A.
Wagner, "Specification of the IP Flow Information Export
(IPFIX) File Format", <a href="./rfc5655">RFC 5655</a>, October 2009.
[<a id="ref-RFC2119">RFC2119</a>] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", <a href="https://www.rfc-editor.org/bcp/bcp14">BCP 14</a>, <a href="./rfc2119">RFC 2119</a>, March 1997.
<span class="grey">Boschi & Trammell Experimental [Page 41]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-42" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
[<a id="ref-RFC5735">RFC5735</a>] Cotton, M. and L. Vegoda, "Special Use IPv4 Addresses",
<a href="https://www.rfc-editor.org/bcp/bcp153">BCP 153</a>, <a href="./rfc5735">RFC 5735</a>, January 2010.
[<a id="ref-RFC5156">RFC5156</a>] Blanchet, M., "Special-Use IPv6 Addresses", <a href="./rfc5156">RFC 5156</a>,
April 2008.
<span class="h3"><a class="selflink" id="section-12.2" href="#section-12.2">12.2</a>. Informative References</span>
[<a id="ref-RFC5470">RFC5470</a>] Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek,
"Architecture for IP Flow Information Export", <a href="./rfc5470">RFC 5470</a>,
March 2009.
[<a id="ref-RFC5472">RFC5472</a>] Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP
Flow Information Export (IPFIX) Applicability", <a href="./rfc5472">RFC 5472</a>,
March 2009.
[<a id="ref-RFC6183">RFC6183</a>] Kobayashi, A., Claise, B., Muenz, G., and K. Ishibashi,
"IP Flow Information Export (IPFIX) Mediation: Framework",
<a href="./rfc6183">RFC 6183</a>, April 2011.
[<a id="ref-IPFIX-PERSTREAM">IPFIX-PERSTREAM</a>]
Claise, B., Aitken, P., Johnson, A., and G. Muenz, "IPFIX
Export per SCTP Stream", Work in Progress, May 2010.
[<a id="ref-RFC5153">RFC5153</a>] Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P.
Aitken, "IP Flow Information Export (IPFIX) Implementation
Guidelines", <a href="./rfc5153">RFC 5153</a>, April 2008.
[<a id="ref-RFC3917">RFC3917</a>] Quittek, J., Zseby, T., Claise, B., and S. Zander,
"Requirements for IP Flow Information Export (IPFIX)",
<a href="./rfc3917">RFC 3917</a>, October 2004.
[<a id="ref-RFC4291">RFC4291</a>] Hinden, R. and S. Deering, "IP Version 6 Addressing
Architecture", <a href="./rfc4291">RFC 4291</a>, February 2006.
[<a id="ref-RFC4347">RFC4347</a>] Rescorla, E. and N. Modadugu, "Datagram Transport Layer
Security", <a href="./rfc4347">RFC 4347</a>, April 2006.
[<a id="ref-RFC5246">RFC5246</a>] Dierks, T. and E. Rescorla, "The Transport Layer Security
(TLS) Protocol Version 1.2", <a href="./rfc5246">RFC 5246</a>, August 2008.
[<a id="ref-Bur10">Bur10</a>] Burkhart, M., Schatzmann, D., Trammell, B., and E. Boschi,
"The Role of Network Trace Anonymization Under Attack",
ACM Computer Communications Review, vol. 40, no. 1, pp.
6-11, January 2010.
<span class="grey">Boschi & Trammell Experimental [Page 42]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-43" ></span>
<span class="grey"><a href="./rfc6235">RFC 6235</a> IP Flow Anonymization Support May 2011</span>
[<a id="ref-Mur07">Mur07</a>] Murdoch, S. and P. Zielinski, "Sampled Traffic Analysis by
Internet-Exchange-Level Adversaries", Proceedings of the
7th Workshop on Privacy Enhancing Technologies, Ottawa,
Canada, June 2007.
Authors' Addresses
Elisa Boschi
Swiss Federal Institute of Technology Zurich
Gloriastrasse 35
8092 Zurich
Switzerland
EMail: boschie@tik.ee.ethz.ch
Brian Trammell
Swiss Federal Institute of Technology Zurich
Gloriastrasse 35
8092 Zurich
Switzerland
Phone: +41 44 632 70 13
EMail: trammell@tik.ee.ethz.ch
Boschi & Trammell Experimental [Page 43]
</pre>
|