1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237
|
<pre>Network Working Group V. Sharma, Ed.
Request for Comments: 3469 Metanoia, Inc.
Category: Informational F. Hellstrand, Ed.
Nortel Networks
February 2003
<span class="h1">Framework for Multi-Protocol Label Switching (MPLS)-based Recovery</span>
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved.
Abstract
Multi-protocol label switching (MPLS) integrates the label swapping
forwarding paradigm with network layer routing. To deliver reliable
service, MPLS requires a set of procedures to provide protection of
the traffic carried on different paths. This requires that the label
switching routers (LSRs) support fault detection, fault notification,
and fault recovery mechanisms, and that MPLS signaling support the
configuration of recovery. With these objectives in mind, this
document specifies a framework for MPLS based recovery. Restart
issues are not included in this framework.
Table of Contents
<a href="#section-1">1</a>. Introduction................................................<a href="#page-2">2</a>
<a href="#section-1.1">1.1</a>. Background............................................<a href="#page-3">3</a>
<a href="#section-1.2">1.2</a>. Motivation for MPLS-Based Recovery....................<a href="#page-4">4</a>
<a href="#section-1.3">1.3</a>. Objectives/Goals......................................<a href="#page-5">5</a>
<a href="#section-2">2</a>. Overview....................................................<a href="#page-6">6</a>
<a href="#section-2.1">2.1</a>. Recovery Models.......................................<a href="#page-7">7</a>
<a href="#section-2.1.1">2.1.1</a> Rerouting.....................................<a href="#page-7">7</a>
<a href="#section-2.1.2">2.1.2</a> Protection Switching..........................<a href="#page-8">8</a>
<a href="#section-2.2">2.2</a>. The Recovery Cycles...................................<a href="#page-8">8</a>
<a href="#section-2.2.1">2.2.1</a> MPLS Recovery Cycle Model.....................<a href="#page-8">8</a>
<a href="#section-2.2.2">2.2.2</a> MPLS Reversion Cycle Model...................<a href="#page-10">10</a>
<a href="#section-2.2.3">2.2.3</a> Dynamic Re-routing Cycle Model...............<a href="#page-12">12</a>
<a href="#section-2.2.4">2.2.4</a> Example Recovery Cycle.......................<a href="#page-13">13</a>
<a href="#section-2.3">2.3</a>. Definitions and Terminology..........................<a href="#page-14">14</a>
<a href="#section-2.3.1">2.3.1</a> General Recovery Terminology.................<a href="#page-14">14</a>
<span class="grey">Sharma & Hellstrand Informational [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<a href="#section-2.3.2">2.3.2</a> Failure Terminology..........................<a href="#page-17">17</a>
<a href="#section-2.4">2.4</a>. Abbreviations........................................<a href="#page-18">18</a>
<a href="#section-3">3</a>. MPLS-based Recovery Principles.............................<a href="#page-18">18</a>
<a href="#section-3.1">3.1</a>. Configuration of Recovery............................<a href="#page-19">19</a>
<a href="#section-3.2">3.2</a>. Initiation of Path Setup.............................<a href="#page-19">19</a>
<a href="#section-3.3">3.3</a>. Initiation of Resource Allocation....................<a href="#page-20">20</a>
<a href="#section-3.3.1">3.3.1</a> Subtypes of Protection Switching.............<a href="#page-21">21</a>
<a href="#section-3.4">3.4</a>. Scope of Recovery....................................<a href="#page-21">21</a>
<a href="#section-3.4.1">3.4.1</a> Topology.....................................<a href="#page-21">21</a>
<a href="#section-3.4.2">3.4.2</a> Path Mapping.................................<a href="#page-24">24</a>
<a href="#section-3.4.3">3.4.3</a> Bypass Tunnels...............................<a href="#page-25">25</a>
<a href="#section-3.4.4">3.4.4</a> Recovery Granularity.........................<a href="#page-25">25</a>
<a href="#section-3.4.5">3.4.5</a> Recovery Path Resource Use...................<a href="#page-26">26</a>
<a href="#section-3.5">3.5</a>. Fault Detection......................................<a href="#page-26">26</a>
<a href="#section-3.6">3.6</a>. Fault Notification...................................<a href="#page-27">27</a>
<a href="#section-3.7">3.7</a>. Switch-Over Operation................................<a href="#page-28">28</a>
<a href="#section-3.7.1">3.7.1</a> Recovery Trigger.............................<a href="#page-28">28</a>
<a href="#section-3.7.2">3.7.2</a> Recovery Action..............................<a href="#page-29">29</a>
<a href="#section-3.8">3.8</a>. Post Recovery Operation..............................<a href="#page-29">29</a>
<a href="#section-3.8.1">3.8.1</a> Fixed Protection Counterparts................<a href="#page-29">29</a>
<a href="#section-3.8.2">3.8.2</a> Dynamic Protection Counterparts..............<a href="#page-30">30</a>
<a href="#section-3.8.3">3.8.3</a> Restoration and Notification.................<a href="#page-31">31</a>
3.8.4 Reverting to Preferred Path
(or Controlled Rearrangement)................<a href="#page-31">31</a>
<a href="#section-3.9">3.9</a>. Performance..........................................<a href="#page-32">32</a>
<a href="#section-4">4</a>. MPLS Recovery Features.....................................<a href="#page-32">32</a>
<a href="#section-5">5</a>. Comparison Criteria........................................<a href="#page-33">33</a>
<a href="#section-6">6</a>. Security Considerations....................................<a href="#page-35">35</a>
<a href="#section-7">7</a>. Intellectual Property Considerations.......................<a href="#page-36">36</a>
<a href="#section-8">8</a>. Acknowledgements...........................................<a href="#page-36">36</a>
<a href="#section-9">9</a>. References.................................................<a href="#page-36">36</a>
<a href="#section-9.1">9.1</a> Normative References.................................<a href="#page-36">36</a>
<a href="#section-9.2">9.2</a> Informative References...............................<a href="#page-37">37</a>
<a href="#section-10">10</a>. Contributing Authors.......................................<a href="#page-37">37</a>
<a href="#section-11">11</a>. Authors' Addresses.........................................<a href="#page-39">39</a>
<a href="#section-12">12</a>. Full Copyright Statement...................................<a href="#page-40">40</a>
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
This memo describes a framework for MPLS-based recovery. We provide
a detailed taxonomy of recovery terminology, and discuss the
motivation for, the objectives of, and the requirements for MPLS-
based recovery. We outline principles for MPLS-based recovery, and
also provide comparison criteria that may serve as a basis for
comparing and evaluating different recovery schemes.
<span class="grey">Sharma & Hellstrand Informational [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
At points in the document, we provide some thoughts about the
operation or viability of certain recovery objectives. These should
be viewed as the opinions of the authors, and not the consolidated
views of the IETF. The document is informational and it is expected
that a standards track document will be developed in the future to
describe a subset of this document as to meet the needs currently
specified by the TE WG.
<span class="h3"><a class="selflink" id="section-1.1" href="#section-1.1">1.1</a>. Background</span>
Network routing deployed today is focused primarily on connectivity,
and typically supports only one class of service, the best effort
class. Multi-protocol label switching [<a href="./rfc3031" title=""Multiprotocol Label Switching Architecture"">RFC3031</a>], on the other hand,
by integrating forwarding based on label-swapping of a link local
label with network layer routing allows flexibility in the delivery
of new routing services. MPLS allows for using such media-specific
forwarding mechanisms as label swapping. This enables some
sophisticated features such as quality-of-service (QoS) and traffic
engineering [<a href="./rfc2702" title=""Requirements for Traffic Engineering Over MPLS"">RFC2702</a>] to be implemented more effectively. An
important component of providing QoS, however, is the ability to
transport data reliably and efficiently. Although the current
routing algorithms are robust and survivable, the amount of time they
take to recover from a fault can be significant, in the order of
several seconds (for interior gateway protocols (IGPs)) or minutes
(for exterior gateway protocols, such as the Border Gateway Protocol
(BGP)), causing disruption of service for some applications in the
interim. This is unacceptable in situations where the aim is to
provide a highly reliable service, with recovery times that are in
the order of seconds down to 10's of milliseconds. IP routing may
also not be able to provide bandwidth recovery, where the objective
is to provide not only an alternative path, but also bandwidth
equivalent to that available on the original path. (For some recent
work on bandwidth recovery schemes, the reader is referred to [MPLS-
BACKUP].) Examples of such applications are Virtual Leased Line
services, Stock Exchange data services, voice traffic, video services
etc, i.e., every application that gets a disruption in service long
enough to not fulfill service agreements or the required level of
quality.
MPLS recovery may be motivated by the notion that there are
limitations to improving the recovery times of current routing
algorithms. Additional improvement can be obtained by augmenting
these algorithms with MPLS recovery mechanisms [<a href="#ref-MPLS-PATH" title=""Building Reliable MPLS Networks Using a Path Protection Mechanism"">MPLS-PATH</a>]. Since
MPLS is a possible technology of choice in future IP-based transport
networks, it is useful that MPLS be able to provide protection and
restoration of traffic. MPLS may facilitate the convergence of
network functionality on a common control and management plane.
Further, a protection priority could be used as a differentiating
<span class="grey">Sharma & Hellstrand Informational [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
mechanism for premium services that require high reliability, such as
Virtual Leased Line services, and high priority voice and video
traffic. The remainder of this document provides a framework for
MPLS based recovery. It is focused at a conceptual level and is
meant to address motivation, objectives and requirements. Issues of
mechanism, policy, routing plans and characteristics of traffic
carried by recovery paths are beyond the scope of this document.
<span class="h3"><a class="selflink" id="section-1.2" href="#section-1.2">1.2</a>. Motivation for MPLS-Based Recovery</span>
MPLS based protection of traffic (called MPLS-based Recovery) is
useful for a number of reasons. The most important is its ability to
increase network reliability by enabling a faster response to faults
than is possible with traditional Layer 3 (or IP layer) approaches
alone while still providing the visibility of the network afforded by
Layer 3. Furthermore, a protection mechanism using MPLS could enable
IP traffic to be put directly over WDM optical channels and provide a
recovery option without an intervening SONET layer or optical
protection. This would facilitate the construction of IP-over-WDM
networks that request a fast recovery ability (Note that what is
meant here is the transport of IP traffic over WDM links, not the
Generalized MPLS, or GMPLS, control of a WDM link).
The need for MPLS-based recovery arises because of the following:
I. Layer 3 or IP rerouting may be too slow for a core MPLS network
that needs to support recovery times that are smaller than the
convergence times of IP routing protocols.
II. Layer 3 or IP rerouting does not provide the ability to provide
bandwidth protection to specific flows (e.g., voice over IP,
virtual leased line services).
III. Layer 0 (for example, optical layer) or Layer 1 (for example,
SONET) mechanisms may be wasteful use of resources.
IV. The granularity at which the lower layers may be able to protect
traffic may be too coarse for traffic that is switched using
MPLS-based mechanisms.
V. Layer 0 or Layer 1 mechanisms may have no visibility into higher
layer operations. Thus, while they may provide, for example,
link protection, they cannot easily provide node protection or
protection of traffic transported at layer 3. Further, this may
prevent the lower layers from providing restoration based on the
traffic's needs. For example, fast restoration for traffic that
needs it, and slower restoration (with possibly more optimal use
of resources) for traffic that does not require fast
<span class="grey">Sharma & Hellstrand Informational [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
restoration. In networks where the latter class of traffic is
dominant, providing fast restoration to all classes of traffic
may not be cost effective from a service provider's perspective.
VI. MPLS has desirable attributes when applied to the purpose of
recovery for connectionless networks. Specifically that an LSP
is source routed and a forwarding path for recovery can be
"pinned" and is not affected by transient instability in SPF
routing brought on by failure scenarios.
VII. Establishing interoperability of protection mechanisms between
routers/LSRs from different vendors in IP or MPLS networks is
desired to enable recovery mechanisms to work in a multivendor
environment, and to enable the transition of certain protected
services to an MPLS core.
<span class="h3"><a class="selflink" id="section-1.3" href="#section-1.3">1.3</a>. Objectives/Goals</span>
The following are some important goals for MPLS-based recovery.
I. MPLS-based recovery mechanisms may be subject to the traffic
engineering goal of optimal use of resources.
II. MPLS based recovery mechanisms should aim to facilitate
restoration times that are sufficiently fast for the end user
application. That is, that better match the end-user's
application requirements. In some cases, this may be as short
as 10s of milliseconds.
We observe that I and II may be conflicting objectives, and a trade
off may exist between them. The optimal choice depends on the end-
user application's sensitivity to restoration time and the cost
impact of introducing restoration in the network, as well as the
end-user application's sensitivity to cost.
III. MPLS-based recovery should aim to maximize network reliability
and availability. MPLS-based recovery of traffic should aim to
minimize the number of single points of failure in the MPLS
protected domain.
IV. MPLS-based recovery should aim to enhance the reliability of
the protected traffic while minimally or predictably degrading
the traffic carried by the diverted resources.
V. MPLS-based recovery techniques should aim to be applicable for
protection of traffic at various granularities. For example,
it should be possible to specify MPLS-based recovery for a
portion of the traffic on an individual path, for all traffic
<span class="grey">Sharma & Hellstrand Informational [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
on an individual path, or for all traffic on a group of paths.
Note that a path is used as a general term and includes the
notion of a link, IP route or LSP.
VI. MPLS-based recovery techniques may be applicable for an entire
end-to-end path or for segments of an end-to-end path.
VII. MPLS-based recovery mechanisms should aim to take into
consideration the recovery actions of lower layers. MPLS-based
mechanisms should not trigger lower layer protection switching
nor should MPLS-based mechanisms be triggered when lower layer
switching has or may imminently occur.
VIII. MPLS-based recovery mechanisms should aim to minimize the loss
of data and packet reordering during recovery operations. (The
current MPLS specification itself has no explicit requirement
on reordering.)
IX. MPLS-based recovery mechanisms should aim to minimize the state
overhead incurred for each recovery path maintained.
X. MPLS-based recovery mechanisms should aim to minimize the
signaling overhead to setup and maintain recovery paths and to
notify failures.
XI. MPLS-based recovery mechanisms should aim to preserve the
constraints on traffic after switchover, if desired. That is,
if desired, the recovery path should meet the resource
requirements of, and achieve the same performance
characteristics as, the working path.
We observe that some of the above are conflicting goals, and real
deployment will often involve engineering compromises based on a
variety of factors such as cost, end-user application requirements,
network efficiency, complexity involved, and revenue considerations.
Thus, these goals are subject to tradeoffs based on the above
considerations.
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Overview</span>
There are several options for providing protection of traffic. The
most generic requirement is the specification of whether recovery
should be via Layer 3 (or IP) rerouting or via MPLS protection
switching or rerouting actions.
Generally network operators aim to provide the fastest, most stable,
and the best protection mechanism that can be provided at a
reasonable cost. The higher the levels of protection, the more the
<span class="grey">Sharma & Hellstrand Informational [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
resources consumed. Therefore it is expected that network operators
will offer a spectrum of service levels. MPLS-based recovery should
give the flexibility to select the recovery mechanism, choose the
granularity at which traffic is protected, and to also choose the
specific types of traffic that are protected in order to give
operators more control over that tradeoff. With MPLS-based recovery,
it can be possible to provide different levels of protection for
different classes of service, based on their service requirements.
For example, using approaches outlined below, a Virtual Leased Line
(VLL) service or real-time applications like Voice over IP (VoIP) may
be supported using link/node protection together with pre-
established, pre-reserved path protection. Best effort traffic, on
the other hand, may use path protection that is established on demand
or may simply rely on IP re-route or higher layer recovery
mechanisms. As another example of their range of application, MPLS-
based recovery strategies may be used to protect traffic not
originally flowing on label switched paths, such as IP traffic that
is normally routed hop-by-hop, as well as traffic forwarded on label
switched paths.
<span class="h3"><a class="selflink" id="section-2.1" href="#section-2.1">2.1</a>. Recovery Models</span>
There are two basic models for path recovery: rerouting and
protection switching.
Protection switching and rerouting, as defined below, may be used
together. For example, protection switching to a recovery path may
be used for rapid restoration of connectivity while rerouting
determines a new optimal network configuration, rearranging paths, as
needed, at a later time.
<span class="h4"><a class="selflink" id="section-2.1.1" href="#section-2.1.1">2.1.1</a> Rerouting</span>
Recovery by rerouting is defined as establishing new paths or path
segments on demand for restoring traffic after the occurrence of a
fault. The new paths may be based upon fault information, network
routing policies, pre-defined configurations and network topology
information. Thus, upon detecting a fault, paths or path segments to
bypass the fault are established using signaling.
Once the network routing algorithms have converged after a fault, it
may be preferable, in some cases, to reoptimize the network by
performing a reroute based on the current state of the network and
network policies. This is discussed further in <a href="#section-3.8">Section 3.8</a>.
In terms of the principles defined in <a href="#section-3">section 3</a>, reroute recovery
employs paths established-on-demand with resources reserved-on-
demand.
<span class="grey">Sharma & Hellstrand Informational [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h4"><a class="selflink" id="section-2.1.2" href="#section-2.1.2">2.1.2</a> Protection Switching</span>
Protection switching recovery mechanisms pre-establish a recovery
path or path segment, based upon network routing policies, the
restoration requirements of the traffic on the working path, and
administrative considerations. The recovery path may or may not be
link and node disjoint with the working path. However if the
recovery path shares sources of failure with the working path, the
overall reliability of the construct is degraded. When a fault is
detected, the protected traffic is switched over to the recovery
path(s) and restored.
In terms of the principles in <a href="#section-3">section 3</a>, protection switching employs
pre-established recovery paths, and, if resource reservation is
required on the recovery path, pre-reserved resources. The various
sub-types of protection switching are detailed in <a href="#section-4.4">Section 4.4</a> of this
document.
<span class="h3"><a class="selflink" id="section-2.2" href="#section-2.2">2.2</a>. The Recovery Cycles</span>
There are three defined recovery cycles: the MPLS Recovery Cycle, the
MPLS Reversion Cycle and the Dynamic Re-routing Cycle. The first
cycle detects a fault and restores traffic onto MPLS-based recovery
paths. If the recovery path is non-optimal the cycle may be followed
by any of the two latter cycles to achieve an optimized network
again. The reversion cycle applies for explicitly routed traffic
that does not rely on any dynamic routing protocols to converge. The
dynamic re-routing cycle applies for traffic that is forwarded based
on hop-by-hop routing.
<span class="h4"><a class="selflink" id="section-2.2.1" href="#section-2.2.1">2.2.1</a> MPLS Recovery Cycle Model</span>
The MPLS recovery cycle model is illustrated in Figure 1. Definitions
and a key to abbreviations follow.
--Network Impairment
| --Fault Detected
| | --Start of Notification
| | | -- Start of Recovery Operation
| | | | --Recovery Operation Complete
| | | | | --Path Traffic Recovered
| | | | | |
| | | | | |
v v v v v v
----------------------------------------------------------------
| T1 | T2 | T3 | T4 | T5 |
Figure 1. MPLS Recovery Cycle Model
<span class="grey">Sharma & Hellstrand Informational [Page 8]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-9" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
The various timing measures used in the model are described below.
T1 Fault Detection Time
T2 Fault Hold-off Time
T3 Fault Notification Time
T4 Recovery Operation Time
T5 Traffic Recovery Time
Definitions of the recovery cycle times are as follows:
Fault Detection Time
The time between the occurrence of a network impairment and the
moment the fault is detected by MPLS-based recovery mechanisms.
This time may be highly dependent on lower layer protocols.
Fault Hold-Off Time
The configured waiting time between the detection of a fault and
taking MPLS-based recovery action, to allow time for lower layer
protection to take effect. The Fault Hold-off Time may be zero.
Note: The Fault Hold-Off Time may occur after the Fault
Notification Time interval if the node responsible for the
switchover, the Path Switch LSR (PSL), rather than the detecting
LSR, is configured to wait.
Fault Notification Time
The time between initiation of a Fault Indication Signal (FIS) by
the LSR detecting the fault and the time at which the Path Switch
LSR (PSL) begins the recovery operation. This is zero if the PSL
detects the fault itself or infers a fault from such events as an
adjacency failure.
Note: If the PSL detects the fault itself, there still may be a
Fault Hold-Off Time period between detection and the start of the
recovery operation.
Recovery Operation Time
The time between the first and last recovery actions. This may
include message exchanges between the PSL and PML (Path Merge LSR)
to coordinate recovery actions.
<span class="grey">Sharma & Hellstrand Informational [Page 9]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-10" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Traffic Recovery Time
The time between the last recovery action and the time that the
traffic (if present) is completely recovered. This interval is
intended to account for the time required for traffic to once
again arrive at the point in the network that experienced
disrupted or degraded service due to the occurrence of the fault
(e.g., the PML). This time may depend on the location of the
fault, the recovery mechanism, and the propagation delay along the
recovery path.
<span class="h4"><a class="selflink" id="section-2.2.2" href="#section-2.2.2">2.2.2</a> MPLS Reversion Cycle Model</span>
Protection switching, revertive mode, requires the traffic to be
switched back to a preferred path when the fault on that path is
cleared. The MPLS reversion cycle model is illustrated in Figure 2.
Note that the cycle shown below comes after the recovery cycle shown
in Fig. 1.
--Network Impairment Repaired
| --Fault Cleared
| | --Path Available
| | | --Start of Reversion Operation
| | | | --Reversion Operation Complete
| | | | | --Traffic Restored on Preferred Path
| | | | | |
| | | | | |
v v v v v v
-----------------------------------------------------------------
| T7 | T8 | T9 | T10| T11|
Figure 2. MPLS Reversion Cycle Model
The various timing measures used in the model are described below.
T7 Fault Clearing Time
T8 Clear Hold-Off Time
T9 Clear Notification Time
T10 Reversion Operation Time
T11 Traffic Reversion Time
Note that time T6 (not shown above) is the time for which the network
impairment is not repaired and traffic is flowing on the recovery
path.
<span class="grey">Sharma & Hellstrand Informational [Page 10]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-11" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Definitions of the reversion cycle times are as follows:
Fault Clearing Time
The time between the repair of a network impairment and the time
that MPLS-based mechanisms learn that the fault has been cleared.
This time may be highly dependent on lower layer protocols.
Clear Hold-Off Time
The configured waiting time between the clearing of a fault and
MPLS-based recovery action(s). Waiting time may be needed to
ensure that the path is stable and to avoid flapping in cases
where a fault is intermittent. The Clear Hold-Off Time may be
zero.
Note: The Clear Hold-Off Time may occur after the Clear
Notification Time interval if the PSL is configured to wait.
Clear Notification Time
The time between initiation of a Fault Recovery Signal (FRS) by
the LSR clearing the fault and the time at which the path switch
LSR begins the reversion operation. This is zero if the PSL
clears the fault itself.
Note: If the PSL clears the fault itself, there still may be a
Clear Hold-off Time period between fault clearing and the start of
the reversion operation.
Reversion Operation Time
The time between the first and last reversion actions. This may
include message exchanges between the PSL and PML to coordinate
reversion actions.
Traffic Reversion Time
The time between the last reversion action and the time that
traffic (if present) is completely restored on the preferred path.
This interval is expected to be quite small since both paths are
working and care may be taken to limit the traffic disruption
(e.g., using "make before break" techniques and synchronous
switch-over).
In practice, the most interesting times in the reversion cycle are
the Clear Hold-off Time and the Reversion Operation Time together
with Traffic Reversion Time (or some other measure of traffic
<span class="grey">Sharma & Hellstrand Informational [Page 11]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-12" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
disruption). The first interval is to ensure stability of the
repaired path and the latter one is to minimize disruption time
while the reversion action is in progress.
Given that both paths are available, it is better to wait to have
a well-controlled switch-back with minimal disruption than have an
immediate operation that may cause new faults to be introduced
(except, perhaps, when the recovery path is unable to offer a
quality of service comparable to the preferred path).
<span class="h4"><a class="selflink" id="section-2.2.3" href="#section-2.2.3">2.2.3</a> Dynamic Re-routing Cycle Model</span>
Dynamic rerouting aims to bring the IP network to a stable state
after a network impairment has occurred. A re-optimized network is
achieved after the routing protocols have converged, and the traffic
is moved from a recovery path to a (possibly) new working path. The
steps involved in this mode are illustrated in Figure 3.
Note that the cycle shown below may be overlaid on the recovery cycle
shown in Fig. 1 or the reversion cycle shown in Fig. 2, or both (in
the event that both the recovery cycle and the reversion cycle take
place before the routing protocols converge), and occurs if after the
convergence of the routing protocols it is determined (based on on-
line algorithms or off-line traffic engineering tools, network
configuration, or a variety of other possible criteria) that there is
a better route for the working path.
--Network Enters a Semi-stable State after an Impairment
| --Dynamic Routing Protocols Converge
| | --Initiate Setup of New Working Path between PSL
| | | and PML
| | | --Switchover Operation Complete
| | | | --Traffic Moved to New Working Path
| | | | |
| | | | |
v v v v v
-----------------------------------------------------------------
| T12 | T13 | T14 | T15 |
Figure 3. Dynamic Rerouting Cycle Model
The various timing measures used in the model are described below.
T12 Network Route Convergence Time
T13 Hold-down Time (optional)
T14 Switchover Operation Time
T15 Traffic Restoration Time
<span class="grey">Sharma & Hellstrand Informational [Page 12]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-13" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Network Route Convergence Time
We define the network route convergence time as the time taken for
the network routing protocols to converge and for the network to
reach a stable state.
Holddown Time
We define the holddown period as a bounded time for which a
recovery path must be used. In some scenarios it may be difficult
to determine if the working path is stable. In these cases a
holddown time may be used to prevent excess flapping of traffic
between a working and a recovery path.
Switchover Operation Time
The time between the first and last switchover actions. This may
include message exchanges between the PSL and PML to coordinate
the switchover actions.
Traffic Restoration Time
The time between the last restoration action and the time that
traffic (if present) is completely restored on the new preferred
path.
<span class="h4"><a class="selflink" id="section-2.2.4" href="#section-2.2.4">2.2.4</a> Example Recovery Cycle</span>
As an example of the recovery cycle, we present a sequence of events
that occur after a network impairment occurs and when a protection
switch is followed by dynamic rerouting.
I. Link or path fault occurs
II. Signaling initiated (FIS) for the detected fault
III. FIS arrives at the PSL
IV. The PSL initiates a protection switch to a pre-configured
recovery path
V. The PSL switches over the traffic from the working path to the
recovery path
VI. The network enters a semi-stable state
VII. Dynamic routing protocols converge after the fault, and a new
working path is calculated (based, for example, on some of the
criteria mentioned in <a href="#section-2.1.1">Section 2.1.1</a>).
VIII. A new working path is established between the PSL and the PML
(assumption is that PSL and PML have not changed)
IX. Traffic is switched over to the new working path.
<span class="grey">Sharma & Hellstrand Informational [Page 13]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-14" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h3"><a class="selflink" id="section-2.3" href="#section-2.3">2.3</a>. Definitions and Terminology</span>
This document assumes the terminology given in [<a href="./rfc3031" title=""Multiprotocol Label Switching Architecture"">RFC3031</a>], and, in
addition, introduces the following new terms.
<span class="h4"><a class="selflink" id="section-2.3.1" href="#section-2.3.1">2.3.1</a> General Recovery Terminology</span>
Re-routing
A recovery mechanism in which the recovery path or path segments
are created dynamically after the detection of a fault on the
working path. In other words, a recovery mechanism in which the
recovery path is not pre-established.
Protection Switching
A recovery mechanism in which the recovery path or path segments
are created prior to the detection of a fault on the working path.
In other words, a recovery mechanism in which the recovery path is
pre-established.
Working Path
The protected path that carries traffic before the occurrence of a
fault. The working path can be of different kinds; a hop-by-hop
routed path, a trunk, a link, an LSP or part of a multipoint-to-
point LSP.
Synonyms for a working path are primary path and active path.
Recovery Path
The path by which traffic is restored after the occurrence of a
fault. In other words, the path on which the traffic is directed
by the recovery mechanism. The recovery path is established by
MPLS means. The recovery path can either be an equivalent
recovery path and ensure no reduction in quality of service, or be
a limited recovery path and thereby not guarantee the same quality
of service (or some other criteria of performance) as the working
path. A limited recovery path is not expected to be used for an
extended period of time.
Synonyms for a recovery path are: back-up path, alternative path,
and protection path.
<span class="grey">Sharma & Hellstrand Informational [Page 14]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-15" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Protection Counterpart
The "other" path when discussing pre-planned protection switching
schemes. The protection counterpart for the working path is the
recovery path and vice-versa.
Path Switch LSR (PSL)
An LSR that is responsible for switching or replicating the
traffic between the working path and the recovery path.
Path Merge LSR (PML)
An LSR that is responsible for receiving the recovery path
traffic, and either merging the traffic back onto the working
path, or, if it is itself the destination, passing the traffic on
to the higher layer protocols.
Point of Repair (POR)
An LSR that is setup for performing MPLS recovery. In other
words, an LSR that is responsible for effecting the repair of an
LSP. The POR, for example, can be a PSL or a PML, depending on
the type of recovery scheme employed.
Intermediate LSR
An LSR on a working or recovery path that is neither a PSL nor a
PML for that path.
Path Group (PG)
A logical bundling of multiple working paths, each of which is
routed identically between a Path Switch LSR and a Path Merge LSR.
Protected Path Group (PPG)
A path group that requires protection.
Protected Traffic Portion (PTP)
The portion of the traffic on an individual path that requires
protection. For example, code points in the EXP bits of the shim
header may identify a protected portion.
<span class="grey">Sharma & Hellstrand Informational [Page 15]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-16" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Bypass Tunnel
A path that serves to back up a set of working paths using the
label stacking approach [<a href="./rfc3031" title=""Multiprotocol Label Switching Architecture"">RFC3031</a>]. The working paths and the
bypass tunnel must all share the same path switch LSR (PSL) and
the path merge LSR (PML).
Switch-Over
The process of switching the traffic from the path that the
traffic is flowing on onto one or more alternate path(s). This
may involve moving traffic from a working path onto one or more
recovery paths, or may involve moving traffic from a recovery
path(s) on to a more optimal working path(s).
Switch-Back
The process of returning the traffic from one or more recovery
paths back to the working path(s).
Revertive Mode
A recovery mode in which traffic is automatically switched back
from the recovery path to the original working path upon the
restoration of the working path to a fault-free condition. This
assumes a failed working path does not automatically surrender
resources to the network.
Non-revertive Mode
A recovery mode in which traffic is not automatically switched
back to the original working path after this path is restored to a
fault-free condition. (Depending on the configuration, the
original working path may, upon moving to a fault-free condition,
become the recovery path, or it may be used for new working
traffic, and be no longer associated with its original recovery
path, i.e., is surrendered to the network.)
MPLS Protection Domain
The set of LSRs over which a working path and its corresponding
recovery path are routed.
MPLS Protection Plan
The set of all LSP protection paths and the mapping from working
to protection paths deployed in an MPLS protection domain at a
given time.
<span class="grey">Sharma & Hellstrand Informational [Page 16]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-17" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Liveness Message
A message exchanged periodically between two adjacent LSRs that
serves as a link probing mechanism. It provides an integrity
check of the forward and the backward directions of the link
between the two LSRs as well as a check of neighbor aliveness.
Path Continuity Test
A test that verifies the integrity and continuity of a path or
path segment. The details of such a test are beyond the scope of
this document. (This could be accomplished, for example, by
transmitting a control message along the same links and nodes as
the data traffic or similarly could be measured by the absence of
traffic and by providing feedback.)
<span class="h4"><a class="selflink" id="section-2.3.2" href="#section-2.3.2">2.3.2</a> Failure Terminology</span>
Path Failure (PF)
Path failure is a fault detected by MPLS-based recovery
mechanisms, which is defined as the failure of the liveness
message test or a path continuity test, which indicates that path
connectivity is lost.
Path Degraded (PD)
Path degraded is a fault detected by MPLS-based recovery
mechanisms that indicates that the quality of the path is
unacceptable.
Link Failure (LF)
A lower layer fault indicating that link continuity is lost. This
may be communicated to the MPLS-based recovery mechanisms by the
lower layer.
Link Degraded (LD)
A lower layer indication to MPLS-based recovery mechanisms that
the link is performing below an acceptable level.
Fault Indication Signal (FIS)
A signal that indicates that a fault along a path has occurred.
It is relayed by each intermediate LSR to its upstream or
downstream neighbor, until it reaches an LSR that is setup to
perform MPLS recovery (the POR). The FIS is transmitted
<span class="grey">Sharma & Hellstrand Informational [Page 17]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-18" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
periodically by the node/nodes closest to the point of failure,
for some configurable length of time or until the transmitting
node receives an acknowledgement from its neighbor.
Fault Recovery Signal (FRS)
A signal that indicates a fault along a working path has been
repaired. Again, like the FIS, it is relayed by each intermediate
LSR to its upstream or downstream neighbor, until is reaches the
LSR that performs recovery of the original path. The FRS is
transmitted periodically by the node/nodes closest to the point of
failure, for some configurable length of time or until the
transmitting node receives an acknowledgement from its neighbor.
<span class="h3"><a class="selflink" id="section-2.4" href="#section-2.4">2.4</a>. Abbreviations</span>
FIS: Fault Indication Signal.
FRS: Fault Recovery Signal.
LD: Link Degraded.
LF: Link Failure.
PD: Path Degraded.
PF: Path Failure.
PML: Path Merge LSR.
PG: Path Group.
POR: Point of Repair.
PPG: Protected Path Group.
PTP: Protected Traffic Portion.
PSL: Path Switch LSR.
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. MPLS-based Recovery Principles</span>
MPLS-based recovery refers to the ability to effect quick and
complete restoration of traffic affected by a fault in an MPLS-
enabled network. The fault may be detected on the IP layer or in
lower layers over which IP traffic is transported. Fastest MPLS
recovery is assumed to be achieved with protection switching and may
be viewed as the MPLS LSR switch completion time that is comparable
to, or equivalent to, the 50 ms switch-over completion time of the
SONET layer. Further, MPLS-based recovery may provide bandwidth
protection for paths that require it. This section provides a
discussion of the concepts and principles of MPLS-based recovery.
The concepts are presented in terms of atomic or primitive terms that
may be combined to specify recovery approaches. We do not make any
assumptions about the underlying layer 1 or layer 2 transport
mechanisms or their recovery mechanisms.
<span class="grey">Sharma & Hellstrand Informational [Page 18]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-19" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h3"><a class="selflink" id="section-3.1" href="#section-3.1">3.1</a>. Configuration of Recovery</span>
An LSR may support any or all of the following recovery options on a
per-path basis:
Default-recovery (No MPLS-based recovery enabled): Traffic on the
working path is recovered only via Layer 3 or IP rerouting or by some
lower layer mechanism such as SONET APS. This is equivalent to
having no MPLS-based recovery. This option may be used for low
priority traffic or for traffic that is recovered in another way (for
example load shared traffic on parallel working paths may be
automatically recovered upon a fault along one of the working paths
by distributing it among the remaining working paths).
Recoverable (MPLS-based recovery enabled): This working path is
recovered using one or more recovery paths, either via rerouting or
via protection switching.
<span class="h3"><a class="selflink" id="section-3.2" href="#section-3.2">3.2</a>. Initiation of Path Setup</span>
There are three options for the initiation of the recovery path
setup. The active and recovery paths may be established by using
either RSVP-TE [<a href="./rfc2205" title=""Resource ReSerVation Protocol (RSVP) -- Version 1 Functional Specification"">RFC2205</a>][RFC3209] or CR-LDP [<a href="./rfc3212" title=""Constraint-Based LSP Setup using LDP"">RFC3212</a>], or by any
other means including SNMP.
Pre-established:
This is the same as the protection switching option. Here a
recovery path(s) is established prior to any failure on the
working path. The path selection can either be determined by an
administrative centralized tool, or chosen based on some algorithm
implemented at the PSL and possibly intermediate nodes. To guard
against the situation when the pre-established recovery path fails
before or at the same time as the working path, the recovery path
should have secondary configuration options as explained in
<a href="#section-3.3">Section 3.3</a> below.
Pre-Qualified:
A pre-established path need not be created, it may be pre-
qualified. A pre-qualified recovery path is not created expressly
for protecting the working path, but instead is a path created for
other purposes that is designated as a recovery path after
determining that it is an acceptable alternative for carrying the
working path traffic. Variants include the case where an optical
path or trail is configured, but no switches are set.
<span class="grey">Sharma & Hellstrand Informational [Page 19]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-20" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Established-on-Demand:
This is the same as the rerouting option. Here, a recovery path
is established after a failure on its working path has been
detected and notified to the PSL. The recovery path may be pre-
computed or computed on demand, which influences recovery times.
<span class="h3"><a class="selflink" id="section-3.3" href="#section-3.3">3.3</a>. Initiation of Resource Allocation</span>
A recovery path may support the same traffic contract as the working
path, or it may not. We will distinguish these two situations by
using different additive terms. If the recovery path is capable of
replacing the working path without degrading service, it will be
called an equivalent recovery path. If the recovery path lacks the
resources (or resource reservations) to replace the working path
without degrading service, it will be called a limited recovery path.
Based on this, there are two options for the initiation of resource
allocation:
Pre-reserved:
This option applies only to protection switching. Here a pre-
established recovery path reserves required resources on all hops
along its route during its establishment. Although the reserved
resources (e.g., bandwidth and/or buffers) at each node cannot be
used to admit more working paths, they are available to be used by
all traffic that is present at the node before a failure occurs.
The resources held by a set of recovery paths may be shared if
they protect resources that are not simultaneously subject to
failure.
Reserved-on-Demand:
This option may apply either to rerouting or to protection
switching. Here a recovery path reserves the required resources
after a failure on the working path has been detected and notified
to the PSL and before the traffic on the working path is switched
over to the recovery path.
Note that under both the options above, depending on the amount of
resources reserved on the recovery path, it could either be an
equivalent recovery path or a limited recovery path.
<span class="grey">Sharma & Hellstrand Informational [Page 20]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-21" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h4"><a class="selflink" id="section-3.3.1" href="#section-3.3.1">3.3.1</a> Subtypes of Protection Switching</span>
The resources (bandwidth, buffers, processing) on the recovery path
may be used to carry either a copy of the working path traffic or
extra traffic that is displaced when a protection switch occurs. This
leads to two subtypes of protection switching.
In 1+1 ("one plus one") protection, the resources (bandwidth,
buffers, processing capacity) on the recovery path are fully
reserved, and carry the same traffic as the working path. Selection
between the traffic on the working and recovery paths is made at the
path merge LSR (PML). In effect the PSL function is deprecated to
establishment of the working and recovery paths and a simple
replication function. The recovery intelligence is delegated to the
PML.
In 1:1 ("one for one") protection, the resources (if any) allocated
on the recovery path are fully available to preemptible low priority
traffic except when the recovery path is in use due to a fault on the
working path. In other words, in 1:1 protection, the protected
traffic normally travels only on the working path, and is switched to
the recovery path only when the working path has a fault. Once the
protection switch is initiated, the low priority traffic being
carried on the recovery path may be displaced by the protected
traffic. This method affords a way to make efficient use of the
recovery path resources.
This concept can be extended to 1:n (one for n) and m:n (m for n)
protection.
<span class="h3"><a class="selflink" id="section-3.4" href="#section-3.4">3.4</a>. Scope of Recovery</span>
<span class="h4"><a class="selflink" id="section-3.4.1" href="#section-3.4.1">3.4.1</a> Topology</span>
<span class="h5"><a class="selflink" id="section-3.4.1.1" href="#section-3.4.1.1">3.4.1.1</a> Local Repair</span>
The intent of local repair is to protect against a link or neighbor
node fault and to minimize the amount of time required for failure
propagation. In local repair (also known as local recovery), the
node immediately upstream of the fault is the one to initiate
recovery (either rerouting or protection switching). Local repair
can be of two types:
<span class="grey">Sharma & Hellstrand Informational [Page 21]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-22" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Link Recovery/Restoration
In this case, the recovery path may be configured to route around
a certain link deemed to be unreliable. If protection switching
is used, several recovery paths may be configured for one working
path, depending on the specific faulty link that each protects
against.
Alternatively, if rerouting is used, upon the occurrence of a
fault on the specified link, each path is rebuilt such that it
detours around the faulty link.
In this case, the recovery path need only be disjoint from its
working path at a particular link on the working path, and may
have overlapping segments with the working path. Traffic on the
working path is switched over to an alternate path at the upstream
LSR that connects to the failed link. Link recovery is
potentially the fastest to perform the switchover, and can be
effective in situations where certain path components are much
more unreliable than others.
Node Recovery/Restoration
In this case, the recovery path may be configured to route around
a neighbor node deemed to be unreliable. Thus the recovery path
is disjoint from the working path only at a particular node and at
links associated with the working path at that node. Once again,
the traffic on the primary path is switched over to the recovery
path at the upstream LSR that directly connects to the failed
node, and the recovery path shares overlapping portions with the
working path.
<span class="h5"><a class="selflink" id="section-3.4.1.2" href="#section-3.4.1.2">3.4.1.2</a> Global Repair</span>
The intent of global repair is to protect against any link or node
fault on a path or on a segment of a path, with the obvious exception
of the faults occurring at the ingress node of the protected path
segment. In global repair, the POR is usually distant from the
failure and needs to be notified by a FIS.
In global repair also, end-to-end path recovery/restoration applies.
In many cases, the recovery path can be made completely link and node
disjoint with its working path. This has the advantage of protecting
against all link and node fault(s) on the working path (end-to-end
path or path segment).
<span class="grey">Sharma & Hellstrand Informational [Page 22]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-23" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
However, it may, in some cases, be slower than local repair since the
fault notification message must now travel to the POR to trigger the
recovery action.
<span class="h5"><a class="selflink" id="section-3.4.1.3" href="#section-3.4.1.3">3.4.1.3</a> Alternate Egress Repair</span>
It is possible to restore service without specifically recovering the
faulted path.
For example, for best effort IP service it is possible to select a
recovery path that has a different egress point from the working path
(i.e., there is no PML). The recovery path egress must simply be a
router that is acceptable for forwarding the FEC carried by the
working path (without creating looping). In an engineering context,
specific alternative FEC/LSP mappings with alternate egresses can be
formed.
This may simplify enhancing the reliability of implicitly constructed
MPLS topologies. A PSL may qualify LSP/FEC bindings as candidate
recovery paths as simply link and node disjoint with the immediate
downstream LSR of the working path.
<span class="h5"><a class="selflink" id="section-3.4.1.4" href="#section-3.4.1.4">3.4.1.4</a> Multi-Layer Repair</span>
Multi-layer repair broadens the network designer's tool set for those
cases where multiple network layers can be managed together to
achieve overall network goals. Specific criteria for determining
when multi-layer repair is appropriate are beyond the scope of this
document.
<span class="h5"><a class="selflink" id="section-3.4.1.5" href="#section-3.4.1.5">3.4.1.5</a> Concatenated Protection Domains</span>
A given service may cross multiple networks and these may employ
different recovery mechanisms. It is possible to concatenate
protection domains so that service recovery can be provided end-to-
end. It is considered that the recovery mechanisms in different
domains may operate autonomously, and that multiple points of
attachment may be used between domains (to ensure there is no single
point of failure). Alternate egress repair requires management of
concatenated domains in that an explicit MPLS point of failure (the
PML) is by definition excluded. Details of concatenated protection
domains are beyond the scope of this document.
<span class="grey">Sharma & Hellstrand Informational [Page 23]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-24" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h4"><a class="selflink" id="section-3.4.2" href="#section-3.4.2">3.4.2</a> Path Mapping</span>
Path mapping refers to the methods of mapping traffic from a faulty
working path on to the recovery path. There are several options for
this, as described below. Note that the options below should be
viewed as atomic terms that only describe how the working and
protection paths are mapped to each other. The issues of resource
reservation along these paths, and how switchover is actually
performed lead to the more commonly used composite terms, such as 1+1
and 1:1 protection, which were described in <a href="#section-4.3.1">Section 4.3.1</a>..
1-to-1 Protection
In 1-to-1 protection the working path has a designated recovery
path that is only to be used to recover that specific working
path.
n-to-1 Protection
In n-to-1 protection, up to n working paths are protected using
only one recovery path. If the intent is to protect against any
single fault on any of the working paths, the n working paths
should be diversely routed between the same PSL and PML. In some
cases, handshaking between PSL and PML may be required to complete
the recovery, the details of which are beyond the scope of this
document.
n-to-m Protection
In n-to-m protection, up to n working paths are protected using m
recovery paths. Once again, if the intent is to protect against
any single fault on any of the n working paths, the n working
paths and the m recovery paths should be diversely routed between
the same PSL and PML. In some cases, handshaking between PSL and
PML may be required to complete the recovery, the details of which
are beyond the scope of this document. n-to-m protection is for
further study.
Split Path Protection
In split path protection, multiple recovery paths are allowed to
carry the traffic of a working path based on a certain
configurable load splitting ratio. This is especially useful when
no single recovery path can be found that can carry the entire
traffic of the working path in case of a fault. Split path
protection may require handshaking between the PSL and the PML(s),
and may require the PML(s) to correlate the traffic arriving on
<span class="grey">Sharma & Hellstrand Informational [Page 24]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-25" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
multiple recovery paths with the working path. Although this is
an attractive option, the details of split path protection are
beyond the scope of this document.
<span class="h4"><a class="selflink" id="section-3.4.3" href="#section-3.4.3">3.4.3</a> Bypass Tunnels</span>
It may be convenient, in some cases, to create a "bypass tunnel" for
a PPG between a PSL and PML, thereby allowing multiple recovery paths
to be transparent to intervening LSRs [<a href="./rfc2702" title=""Requirements for Traffic Engineering Over MPLS"">RFC2702</a>]. In this case, one
LSP (the tunnel) is established between the PSL and PML following an
acceptable route and a number of recovery paths can be supported
through the tunnel via label stacking. It is not necessary to apply
label stacking when using a bypass tunnel. A bypass tunnel can be
used with any of the path mapping options discussed in the previous
section.
As with recovery paths, the bypass tunnel may or may not have
resource reservations sufficient to provide recovery without service
degradation. It is possible that the bypass tunnel may have
sufficient resources to recover some number of working paths, but not
all at the same time. If the number of recovery paths carrying
traffic in the tunnel at any given time is restricted, this is
similar to the n-to-1 or n-to-m protection cases mentioned in <a href="#section-3.4.2">Section</a>
<a href="#section-3.4.2">3.4.2</a>.
<span class="h4"><a class="selflink" id="section-3.4.4" href="#section-3.4.4">3.4.4</a> Recovery Granularity</span>
Another dimension of recovery considers the amount of traffic
requiring protection. This may range from a fraction of a path to a
bundle of paths.
<span class="h5"><a class="selflink" id="section-3.4.4.1" href="#section-3.4.4.1">3.4.4.1</a> Selective Traffic Recovery</span>
This option allows for the protection of a fraction of traffic within
the same path. The portion of the traffic on an individual path that
requires protection is called a protected traffic portion (PTP). A
single path may carry different classes of traffic, with different
protection requirements. The protected portion of this traffic may
be identified by its class, as for example, via the EXP bits in the
MPLS shim header or via the priority bit in the ATM header.
<span class="h5"><a class="selflink" id="section-3.4.4.2" href="#section-3.4.4.2">3.4.4.2</a> Bundling</span>
Bundling is a technique used to group multiple working paths together
in order to recover them simultaneously. The logical bundling of
multiple working paths requiring protection, each of which is routed
identically between a PSL and a PML, is called a protected path group
<span class="grey">Sharma & Hellstrand Informational [Page 25]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-26" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
(PPG). When a fault occurs on the working path carrying the PPG, the
PPG as a whole can be protected either by being switched to a bypass
tunnel or by being switched to a recovery path.
<span class="h4"><a class="selflink" id="section-3.4.5" href="#section-3.4.5">3.4.5</a> Recovery Path Resource Use</span>
In the case of pre-reserved recovery paths, there is the question of
what use these resources may be put to when the recovery path is not
in use. There are two options:
Dedicated-resource: If the recovery path resources are dedicated,
they may not be used for anything except carrying the working
traffic. For example, in the case of 1+1 protection, the working
traffic is always carried on the recovery path. Even if the recovery
path is not always carrying the working traffic, it may not be
possible or desirable to allow other traffic to use these resources.
Extra-traffic-allowed: If the recovery path only carries the working
traffic when the working path fails, then it is possible to allow
extra traffic to use the reserved resources at other times. Extra
traffic is, by definition, traffic that can be displaced (without
violating service agreements) whenever the recovery path resources
are needed for carrying the working path traffic.
Shared-resource: A shared recovery resource is dedicated for use by
multiple primary resources that (according to SRLGs) are not expected
to fail simultaneously.
<span class="h3"><a class="selflink" id="section-3.5" href="#section-3.5">3.5</a>. Fault Detection</span>
MPLS recovery is initiated after the detection of either a lower
layer fault or a fault at the IP layer or in the operation of MPLS-
based mechanisms. We consider four classes of impairments: Path
Failure, Path Degraded, Link Failure, and Link Degraded.
Path Failure (PF) is a fault that indicates to an MPLS-based recovery
scheme that the connectivity of the path is lost. This may be
detected by a path continuity test between the PSL and PML. Some,
and perhaps the most common, path failures may be detected using a
link probing mechanism between neighbor LSRs. An example of a
probing mechanism is a liveness message that is exchanged
periodically along the working path between peer LSRs [<a href="#ref-MPLS-PATH" title=""Building Reliable MPLS Networks Using a Path Protection Mechanism"">MPLS-PATH</a>].
For either a link probing mechanism or path continuity test to be
effective, the test message must be guaranteed to follow the same
route as the working or recovery path, over the segment being tested.
In addition, the path continuity test must take the path merge points
<span class="grey">Sharma & Hellstrand Informational [Page 26]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-27" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
into consideration. In the case of a bi-directional link implemented
as two unidirectional links, path failure could mean that either one
or both unidirectional links are damaged.
Path Degraded (PD) is a fault that indicates to MPLS-based recovery
schemes/mechanisms that the path has connectivity, but that the
quality of the connection is unacceptable. This may be detected by a
path performance monitoring mechanism, or some other mechanism for
determining the error rate on the path or some portion of the path.
This is local to the LSR and consists of excessive discarding of
packets at an interface, either due to label mismatch or due to TTL
errors, for example.
Link Failure (LF) is an indication from a lower layer that the link
over which the path is carried has failed. If the lower layer
supports detection and reporting of this fault (that is, any fault
that indicates link failure e.g., SONET LOS (Loss of Signal)), this
may be used by the MPLS recovery mechanism. In some cases, using LF
indications may provide faster fault detection than using only MPLS-
based fault detection mechanisms.
Link Degraded (LD) is an indication from a lower layer that the link
over which the path is carried is performing below an acceptable
level. If the lower layer supports detection and reporting of this
fault, it may be used by the MPLS recovery mechanism. In some cases,
using LD indications may provide faster fault detection than using
only MPLS-based fault detection mechanisms.
<span class="h3"><a class="selflink" id="section-3.6" href="#section-3.6">3.6</a>. Fault Notification</span>
MPLS-based recovery relies on rapid and reliable notification of
faults. Once a fault is detected, the node that detected the fault
must determine if the fault is severe enough to require path
recovery. If the node is not capable of initiating direct action
(e.g., as a point of repair, POR) the node should send out a
notification of the fault by transmitting a FIS to the POR. This can
take several forms:
(i) control plane messaging: relayed hop-by-hop along the path
upstream of the failed LSP until a POR is reached.
(ii) user plane messaging: sent downstream to the PML, which may take
corrective action (as a POR for 1+1) or communicate with a POR
upstream (for 1:n) by any of several means:
- control plane messaging
- user plane return path (either through a bi-directional LSP or
via other means)
<span class="grey">Sharma & Hellstrand Informational [Page 27]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-28" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Since the FIS is a control message, it should be transmitted with
high priority to ensure that it propagates rapidly towards the
affected POR(s). Depending on how fault notification is configured
in the LSRs of an MPLS domain, the FIS could be sent either as a
Layer 2 or Layer 3 packet [<a href="#ref-MPLS-PATH" title=""Building Reliable MPLS Networks Using a Path Protection Mechanism"">MPLS-PATH</a>]. The use of a Layer 2-based
notification requires a Layer 2 path direct to the POR. An example
of a FIS could be the liveness message sent by a downstream LSR to
its upstream neighbor, with an optional fault notification field set
or it can be implicitly denoted by a teardown message.
Alternatively, it could be a separate fault notification packet. The
intermediate LSR should identify which of its incoming links to
propagate the FIS on.
<span class="h3"><a class="selflink" id="section-3.7" href="#section-3.7">3.7</a>. Switch-Over Operation</span>
<span class="h4"><a class="selflink" id="section-3.7.1" href="#section-3.7.1">3.7.1</a> Recovery Trigger</span>
The activation of an MPLS protection switch following the detection
or notification of a fault requires a trigger mechanism at the PSL.
MPLS protection switching may be initiated due to automatic inputs or
external commands. The automatic activation of an MPLS protection
switch results from a response to a defect or fault conditions
detected at the PSL or to fault notifications received at the PSL.
It is possible that the fault detection and trigger mechanisms may be
combined, as is the case when a PF, PD, LF, or LD is detected at a
PSL and triggers a protection switch to the recovery path. In most
cases, however, the detection and trigger mechanisms are distinct,
involving the detection of fault at some intermediate LSR followed by
the propagation of a fault notification to the POR via the FIS, which
serves as the protection switch trigger at the POR. MPLS protection
switching in response to external commands results when the operator
initiates a protection switch by a command to a POR (or alternatively
by a configuration command to an intermediate LSR, which transmits
the FIS towards the POR).
Note that the PF fault applies to hard failures (fiber cuts,
transmitter failures, or LSR fabric failures), as does the LF fault,
with the difference that the LF is a lower layer impairment that may
be communicated to MPLS-based recovery mechanisms. The PD (or LD)
fault, on the other hand, applies to soft defects (excessive errors
due to noise on the link, for instance). The PD (or LD) results in a
fault declaration only when the percentage of lost packets exceeds a
given threshold, which is provisioned and may be set based on the
service level agreement(s) in effect between a service provider and a
customer.
<span class="grey">Sharma & Hellstrand Informational [Page 28]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-29" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h4"><a class="selflink" id="section-3.7.2" href="#section-3.7.2">3.7.2</a> Recovery Action</span>
After a fault is detected or FIS is received by the POR, the recovery
action involves either a rerouting or protection switching operation.
In both scenarios, the next hop label forwarding entry for a recovery
path is bound to the working path.
<span class="h3"><a class="selflink" id="section-3.8" href="#section-3.8">3.8</a>. Post Recovery Operation</span>
When traffic is flowing on the recovery path, decisions can be made
as to whether to let the traffic remain on the recovery path and
consider it as a new working path or to do a switch back to the old
or to a new working path. This post recovery operation has two
styles, one where the protection counterparts, i.e., the working and
recovery path, are fixed or "pinned" to their routes, and one in
which the PSL or other network entity with real-time knowledge of
failure dynamically performs re-establishment or controlled
rearrangement of the paths comprising the protected service.
<span class="h4"><a class="selflink" id="section-3.8.1" href="#section-3.8.1">3.8.1</a> Fixed Protection Counterparts</span>
For fixed protection counterparts the PSL will be pre-configured with
the appropriate behavior to take when the original fixed path is
restored to service. The choices are revertive and non-revertive
mode. The choice will typically be dependent on relative costs of
the working and protection paths, and the tolerance of the service to
the effects of switching paths yet again. These protection modes
indicate whether or not there is a preferred path for the protected
traffic.
<span class="h5"><a class="selflink" id="section-3.8.1.1" href="#section-3.8.1.1">3.8.1.1</a> Revertive Mode</span>
If the working path always is the preferred path, this path will be
used whenever it is available. Thus, in the event of a fault on this
path, its unused resources will not be reclaimed by the network on
failure. Resources here may include assigned labels, links,
bandwidth etc. If the working path has a fault, traffic is switched
to the recovery path. In the revertive mode of operation, when the
preferred path is restored the traffic is automatically switched back
to it.
There are a number of implications to pinned working and recovery
paths:
- upon failure and after traffic has been moved to the recovery
path, the traffic is unprotected until such time as the path
defect in the original working path is repaired and that path
restored to service.
<span class="grey">Sharma & Hellstrand Informational [Page 29]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-30" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
- upon failure and after traffic has been moved to the recovery
path, the resources associated with the original path remain
reserved.
<span class="h5"><a class="selflink" id="section-3.8.1.2" href="#section-3.8.1.2">3.8.1.2</a> Non-revertive Mode</span>
In the non-revertive mode of operation, there is no preferred path or
it may be desirable to minimize further disruption of the service
brought on by a revertive switching operation. A switch-back to the
original working path is not desired or not possible since the
original path may no longer exist after the occurrence of a fault on
that path. If there is a fault on the working path, traffic is
switched to the recovery path. When or if the faulty path (the
originally working path) is restored, it may become the recovery path
(either by configuration, or, if desired, by management actions).
In the non-revertive mode of operation, the working traffic may or
may not be restored to a new optimal working path or to the original
working path anyway. This is because it might be useful, in some
cases, to either: (a) administratively perform a protection switch
back to the original working path after gaining further assurances
about the integrity of the path, or (b) it may be acceptable to
continue operation on the recovery path, or (c) it may be desirable
to move the traffic to a new optimal working path that is calculated
based on network topology and network policies. Once a new working
path has been defined, an associated recovery path may be setup.
<span class="h4"><a class="selflink" id="section-3.8.2" href="#section-3.8.2">3.8.2</a> Dynamic Protection Counterparts</span>
For dynamic protection counterparts when the traffic is switched over
to a recovery path, the association between the original working path
and the recovery path may no longer exist, since the original path
itself may no longer exist after the fault. Instead, when the
network reaches a stable state following routing convergence, the
recovery path may be switched over to a different preferred path
either optimization based on the new network topology and associated
information or based on pre-configured information.
Dynamic protection counterparts assume that upon failure, the PSL or
other network entity will establish new working paths if another
switch-over will be performed.
<span class="grey">Sharma & Hellstrand Informational [Page 30]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-31" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h4"><a class="selflink" id="section-3.8.3" href="#section-3.8.3">3.8.3</a> Restoration and Notification</span>
MPLS restoration deals with returning the working traffic from the
recovery path to the original or a new working path. Restoration is
performed by the PSL either upon receiving notification, via FRS,
that the working path is repaired, or upon receiving notification
that a new working path is established.
For fixed counterparts in revertive mode, an LSR that detected the
fault on the working path also detects the restoration of the working
path. If the working path had experienced a LF defect, the LSR
detects a return to normal operation via the receipt of a liveness
message from its peer. If the working path had experienced a LD
defect at an LSR interface, the LSR could detect a return to normal
operation via the resumption of error-free packet reception on that
interface. Alternatively, a lower layer that no longer detects a LF
defect may inform the MPLS-based recovery mechanisms at the LSR that
the link to its peer LSR is operational. The LSR then transmits FRS
to its upstream LSR(s) that were transmitting traffic on the working
path. At the point the PSL receives the FRS, it switches the working
traffic back to the original working path.
A similar scheme is used for dynamic counterparts where e.g., an
update of topology and/or network convergence may trigger
installation or setup of new working paths and may send notification
to the PSL to perform a switch over.
We note that if there is a way to transmit fault information back
along a recovery path towards a PSL and if the recovery path is an
equivalent working path, it is possible for the working path and its
recovery path to exchange roles once the original working path is
repaired following a fault. This is because, in that case, the
recovery path effectively becomes the working path, and the restored
working path functions as a recovery path for the original recovery
path. This is important, since it affords the benefits of non-
revertive switch operation outlined in <a href="#section-4.8.1">Section 4.8.1</a>, without leaving
the recovery path unprotected.
<span class="h4"><a class="selflink" id="section-3.8.4" href="#section-3.8.4">3.8.4</a> Reverting to Preferred Path (or Controlled Rearrangement)</span>
In the revertive mode, "make before break" restoration switching can
be used, which is less disruptive than performing protection
switching upon the occurrence of network impairments. This will
minimize both packet loss and packet reordering. The controlled
rearrangement of paths can also be used to satisfy traffic
engineering requirements for load balancing across an MPLS domain.
<span class="grey">Sharma & Hellstrand Informational [Page 31]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-32" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h3"><a class="selflink" id="section-3.9" href="#section-3.9">3.9</a>. Performance</span>
Resource/performance requirements for recovery paths should be
specified in terms of the following attributes:
I. Resource Class Attribute:
Equivalent Recovery Class: The recovery path has the same
performance guarantees as the working path. In other words, the
recovery path meets the same SLAs as the working path.
Limited Recovery Class: The recovery path does not have the same
performance guarantees as the working path.
A. Lower Class:
The recovery path has lower resource requirements or less
stringent performance requirements than the working path.
B. Best Effort Class:
The recovery path is best effort.
II. Priority Attribute:
The recovery path has a priority attribute just like the working
path (i.e., the priority attribute of the associated traffic
trunks). It can have the same priority as the working path or
lower priority.
III. Preemption Attribute:
The recovery path can have the same preemption attribute as the
working path or a lower one.
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. MPLS Recovery Features</span>
The following features are desirable from an operational point of
view:
I. It is desirable that MPLS recovery provides an option to
identify protection groups (PPGs) and protection portions
(PTPs).
II. Each PSL should be capable of performing MPLS recovery upon the
detection of the impairments or upon receipt of notifications of
impairments.
III. A MPLS recovery method should not preclude manual protection
switching commands. This implies that it would be possible
under administrative commands to transfer traffic from a working
path to a recovery path, or to transfer traffic from a recovery
<span class="grey">Sharma & Hellstrand Informational [Page 32]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-33" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
path to a working path, once the working path becomes
operational following a fault.
IV. A PSL may be capable of performing either a switch back to the
original working path after the fault is corrected or a
switchover to a new working path, upon the discovery or
establishment of a more optimal working path.
V. The recovery model should take into consideration path merging
at intermediate LSRs. If a fault affects the merged segment,
all the paths sharing that merged segment should be able to
recover. Similarly, if a fault affects a non-merged segment,
only the path that is affected by the fault should be recovered.
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Comparison Criteria</span>
Possible criteria to use for comparison of MPLS-based recovery
schemes are as follows:
Recovery Time
We define recovery time as the time required for a recovery path
to be activated (and traffic flowing) after a fault. Recovery
Time is the sum of the Fault Detection Time, Hold-off Time,
Notification Time, Recovery Operation Time, and the Traffic
Restoration Time. In other words, it is the time between a
failure of a node or link in the network and the time before a
recovery path is installed and the traffic starts flowing on it.
Full Restoration Time
We define full restoration time as the time required for a
permanent restoration. This is the time required for traffic to
be routed onto links, which are capable of or have been engineered
sufficiently to handle traffic in recovery scenarios. Note that
this time may or may not be different from the "Recovery Time"
depending on whether equivalent or limited recovery paths are
used.
Setup vulnerability
The amount of time that a working path or a set of working paths
is left unprotected during such tasks as recovery path computation
and recovery path setup may be used to compare schemes. The
nature of this vulnerability should be taken into account, e.g.,
End to End schemes correlate the vulnerability with working paths,
<span class="grey">Sharma & Hellstrand Informational [Page 33]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-34" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Local Repair schemes have a topological correlation that cuts
across working paths and Network Plan approaches have a
correlation that impacts the entire network.
Backup Capacity
Recovery schemes may require differing amounts of "backup
capacity" in the event of a fault. This capacity will be
dependent on the traffic characteristics of the network. However,
it may also be dependent on the particular protection plan
selection algorithms as well as the signaling and re-routing
methods.
Additive Latency
Recovery schemes may introduce additive latency for traffic. For
example, a recovery path may take many more hops than the working
path. This may be dependent on the recovery path selection
algorithms.
Quality of Protection
Recovery schemes can be considered to encompass a spectrum of
"packet survivability" which may range from "relative" to
"absolute". Relative survivability may mean that the packet is on
an equal footing with other traffic of, as an example, the same
diff-serv code point (DSCP) in contending for the resources of the
portion of the network that survives the failure. Absolute
survivability may mean that the survivability of the protected
traffic has explicit guarantees.
Re-ordering
Recovery schemes may introduce re-ordering of packets. Also the
action of putting traffic back on preferred paths might cause
packet re-ordering.
State Overhead
As the number of recovery paths in a protection plan grows, the
state required to maintain them also grows. Schemes may require
differing numbers of paths to maintain certain levels of coverage,
etc. The state required may also depend on the particular scheme
used for recovery. The state overhead may be a function of
several parameters. For example, the number of recovery paths
and the number of the protected facilities (links, nodes, or
shared link risk groups (SRLGs)).
<span class="grey">Sharma & Hellstrand Informational [Page 34]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-35" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Loss
Recovery schemes may introduce a certain amount of packet loss
during switchover to a recovery path. Schemes that introduce loss
during recovery can measure this loss by evaluating recovery times
in proportion to the link speed.
In case of link or node failure a certain packet loss is
inevitable.
Coverage
Recovery schemes may offer various types of failover coverage.
The total coverage may be defined in terms of several metrics:
I. Fault Types: Recovery schemes may account for only link faults
or both node and link faults or also degraded service. For
example, a scheme may require more recovery paths to take node
faults into account.
II. Number of concurrent faults: dependent on the layout of recovery
paths in the protection plan, multiple fault scenarios may be
able to be restored.
III. Number of recovery paths: for a given fault, there may be one or
more recovery paths.
IV. Percentage of coverage: dependent on a scheme and its
implementation, a certain percentage of faults may be covered.
This may be subdivided into percentage of link faults and
percentage of node faults.
V. The number of protected paths may effect how fast the total set
of paths affected by a fault could be recovered. The ratio of
protection is n/N, where n is the number of protected paths and
N is the total number of paths.
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. Security Considerations</span>
The MPLS recovery that is specified herein does not raise any
security issues that are not already present in the MPLS
architecture.
Confidentiality or encryption of information on the recovery path is
outside the scope of this document, but any method designed to do
this in other contexts may be used with the methods described in this
document.
<span class="grey">Sharma & Hellstrand Informational [Page 35]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-36" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h2"><a class="selflink" id="section-7" href="#section-7">7</a>. Intellectual Property Considerations</span>
The IETF has been notified of intellectual property rights claimed in
regard to some or all of the specification contained in this
document. For more information consult the online list of claimed
rights.
<span class="h2"><a class="selflink" id="section-8" href="#section-8">8</a>. Acknowledgements</span>
We would like to thank members of the MPLS WG mailing list for their
suggestions on the earlier versions of this document. In particular,
Bora Akyol, Dave Allan, Dave Danenberg, Sharam Davari, and Neil
Harrison whose suggestions and comments were very helpful in revising
the document.
The editors would like to give very special thanks to Curtis
Villamizar for his careful and extremely thorough reading of the
document and for taking the time to provide numerous suggestions,
which were very helpful in the last couple of revisions of the
document. Thanks are also due to Adrian Farrel for a through reading
of the last version of the document, and to Jean-Phillipe Vasseur and
Anna Charny for several useful editorial comments and suggestions,
and for input on bandwidth recovery.
<span class="h2"><a class="selflink" id="section-9" href="#section-9">9</a>. References</span>
<span class="h3"><a class="selflink" id="section-9.1" href="#section-9.1">9.1</a> Normative</span>
[<a id="ref-RFC3031">RFC3031</a>] Rosen, E., Viswanathan, A. and R. Callon,
"Multiprotocol Label Switching Architecture", <a href="./rfc3031">RFC 3031</a>,
January 2001.
[<a id="ref-RFC2702">RFC2702</a>] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M. and
J. McManus, "Requirements for Traffic Engineering Over
MPLS", <a href="./rfc2702">RFC 2702</a>, September 1999.
[<a id="ref-RFC3209">RFC3209</a>] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan,
V. and G. Swallow, "RSVP-TE Extensions to RSVP for LSP
Tunnels", <a href="./rfc3209">RFC 3209</a>, December 2001.
[<a id="ref-RFC3212">RFC3212</a>] Jamoussi, B. (Ed.), Andersson, L., Callon, R., Dantu,
R., Wu, L., Doolan, P., Worster, T., Feldman, N.,
Fredette, A., Girish, M., Gray, E., Heinanen, J.,
Kilty, T. and A. Malis, "Constraint-Based LSP Setup
using LDP", <a href="./rfc3212">RFC 3212</a>, January 2002.
<span class="grey">Sharma & Hellstrand Informational [Page 36]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-37" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h3"><a class="selflink" id="section-9.2" href="#section-9.2">9.2</a> Informative</span>
[<a id="ref-MPLS-BACKUP">MPLS-BACKUP</a>] Vasseur, J. P., Charny, A., LeFaucheur, F., and
Achirica, "MPLS Traffic Engineering Fast reroute:
backup tunnel path computation for bandwidth
protection", Work in Progress.
[<a id="ref-MPLS-PATH">MPLS-PATH</a>] Haung, C., Sharma, V., Owens, K., Makam, V. "Building
Reliable MPLS Networks Using a Path Protection
Mechanism", IEEE Commun. Mag., Vol. 40, Issue 3, March
2002, pp. 156-162.
[<a id="ref-RFC2205">RFC2205</a>] Braden, R., Zhang, L., Berson, S., Herzog, S.,
"Resource ReSerVation Protocol (RSVP) -- Version 1
Functional Specification", <a href="./rfc2205">RFC 2205</a>, September 1997.
<span class="h2"><a class="selflink" id="section-10" href="#section-10">10</a>. Contributing Authors</span>
This document was the collective work of several individuals over a
period of three years. The text and content of this document was
contributed by the editors and the co-authors listed below. (The
contact information for the editors appears in <a href="#section-11">Section 11</a>, and is not
repeated below.)
Ben Mack-Crane
Tellabs Operations, Inc.
1415 West Diehl Road
Naperville, IL 60563
Phone: (630) 798-6197
EMail: Ben.Mack-Crane@tellabs.com
Srinivas Makam
Eshernet, Inc.
1712 Ada Ct.
Naperville, IL 60540
Phone: (630) 308-3213
EMail: Smakam60540@yahoo.com
<span class="grey">Sharma & Hellstrand Informational [Page 37]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-38" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Ken Owens
Edward Jones Investments
201 Progress Parkway
St. Louis, MO 63146
Phone: (314) 515-3431
EMail: ken.owens@edwardjones.com
Changcheng Huang
Carleton University
Minto Center, Rm. 3082
1125 Colonial By Drive
Ottawa, Ont. K1S 5B6 Canada
Phone: (613) 520-2600 x2477
EMail: Changcheng.Huang@sce.carleton.ca
Jon Weil
Brad Cain
Storigen Systems
650 Suffolk Street
Lowell, MA 01854
Phone: (978) 323-4454
EMail: bcain@storigen.com
Loa Andersson
EMail: loa@pi.se
Bilel Jamoussi
Nortel Networks
3 Federal Street, BL3-03
Billerica, MA 01821, USA
Phone:(978) 288-4506
EMail: jamoussi@nortelnetworks.com
<span class="grey">Sharma & Hellstrand Informational [Page 38]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-39" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
Angela Chiu
AT&T Labs-Research
200 Laurel Ave. Rm A5-1F13
Middletown , NJ 07748
Phone: (732) 420-9061
EMail: chiu@research.att.com
Seyhan Civanlar
Lemur Networks, Inc.
135 West 20th Street, 5th Floor
New York, NY 10011
Phone: (212) 367-7676
EMail: scivanlar@lemurnetworks.com
<span class="h2"><a class="selflink" id="section-11" href="#section-11">11</a>. Editors' Addresses</span>
Vishal Sharma (Editor)
Metanoia, Inc.
1600 Villa Street, Unit 352
Mountain View, CA 94041-1174
Phone: (650) 386-6723
EMail: v.sharma@ieee.org
Fiffi Hellstrand (Editor)
Nortel Networks
St Eriksgatan 115
PO Box 6701
113 85 Stockholm, Sweden
Phone: +46 8 5088 3687
EMail: fiffi@nortelnetworks.com
<span class="grey">Sharma & Hellstrand Informational [Page 39]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-40" ></span>
<span class="grey"><a href="./rfc3469">RFC 3469</a> Framework for MPLS-based Recovery February 2003</span>
<span class="h2"><a class="selflink" id="section-12" href="#section-12">12</a>. Full Copyright Statement</span>
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Sharma & Hellstrand Informational [Page 40]
</pre>
|