1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131
|
Network Working Group R. Braden
Request for Comments: 1379 ISI
November 1992
Extending TCP for Transactions -- Concepts
Status of This Memo
This memo provides information for the Internet community. It does
not specify an Internet standard. Distribution of this memo is
unlimited.
Abstract
This memo discusses extension of TCP to provide transaction-oriented
service, without altering its virtual-circuit operation. This
extension would fill the large gap between connection-oriented TCP
and datagram-based UDP, allowing TCP to efficiently perform many
applications for which UDP is currently used. A separate memo
contains a detailed functional specification for this proposed
extension.
This work was supported in part by the National Science Foundation
under Grant Number NCR-8922231.
TABLE OF CONTENTS
1. INTRODUCTION .................................................. 2
2. TRANSACTIONS USING STANDARD TCP ............................... 3
3. BYPASSING THE 3-WAY HANDSHAKE ................................. 6
3.1 Concept of TAO ........................................... 6
3.2 Cache Initialization ..................................... 10
3.3 Accepting <SYN,ACK> Segments ............................. 11
4. SHORTENING TIME-WAIT STATE .................................... 13
5. CHOOSING A MONOTONIC SEQUENCE ................................. 15
5.1 Cached Timestamps ........................................ 16
5.2 Current TCP Sequence Numbers ............................. 18
5.3 64-bit Sequence Numbers .................................. 20
5.4 Connection Counts ........................................ 20
5.5 Conclusions .............................................. 21
6. CONNECTION STATES ............................................. 24
7. CONCLUSIONS AND ACKNOWLEDGMENTS ............................... 32
APPENDIX A: TIME-WAIT STATE AND THE 2-PACKET EXCHANGE ............ 34
REFERENCES ....................................................... 37
Security Considerations .......................................... 38
Author's Address ................................................. 38
Braden [Page 1]
RFC 1379 Transaction TCP -- Concepts November 1992
1. INTRODUCTION
The TCP protocol [STD-007] implements a virtual-circuit transport
service that provides reliable and ordered data delivery over a
full-duplex connection. Under the virtual circuit model, the life of
a connection is divided into three distinct phases: (1) opening the
connection to create a full-duplex byte stream; (2) transferring data
in one or both directions over this stream; and (3) closing the
connection. Remote login and file transfer are examples of
applications that are well suited to virtual-circuit service.
Distributed applications, which are becoming increasingly numerous
and sophisticated in the Internet, tend to use a transaction-oriented
rather than a virtual circuit style of communication. Currently, a
transaction-oriented Internet application must choose to suffer the
overhead of opening and closing TCP connections or else build an
application-specific transport mechanism on top of the connectionless
transport protocol UDP. Greater convenience, uniformity, and
efficiency would result from widely-available kernel implementations
of a transport protocol supporting a transaction service model [RFC-
955].
The transaction service model has the following features:
* The fundamental interaction is a request followed by a response.
* An explicit open or close phase would impose excessive overhead.
* At-most-once semantics is required; that is, a transaction must
not be "replayed" by a duplicate request packet.
* In favorable circumstances, a reliable request/response
handshake can be performed with exactly one packet in each
direction.
* The minimum transaction latency for a client is RTT + SPT, where
RTT is the round-trip time and SPT is the server processing
time.
We use the term "transaction transport protocol" for a transport-
layer protocol that follows this model [RFC-955].
The Internet architecture allows an arbitrary collection of transport
protocols to be defined on top of the minimal end-to-end datagram
service provided by IP [Clark88]. In practice, however, production
systems implement only TCP and UDP at the transport layer. It has
proven difficult to leverage a new transport protocol into place, to
be widely enough available to be useful for application builders.
Braden [Page 2]
RFC 1379 Transaction TCP -- Concepts November 1992
This memo explores an alternative approach to providing a transaction
transport protocol: extending TCP to implement the transaction
service model, while continuing to support the virtual circuit model.
Each transaction will then be a single instance of a TCP connection.
The proposed transaction extension is effectively implementable
within current TCPs and operating systems, and it should also scale
to the much faster networks, interfaces, and CPUs of the future.
The present memo explains the theory behind the extension, in
somewhat exquisite detail. Despite the length and complexity of this
memo, the TCP extensions required for transactions are in fact quite
limited and simple. Another memo [TTCP-FS] provides a self-contained
functional specification of the extensions.
Section 2 of this memo describes the limitations of standard TCP for
transaction processing, to motivate the extensions. Sections 3, 4,
and 5 explore the fundamental extensions that are required for
transactions. Section 6 discusses the changes required in the TCP
connection state diagram. Finally, Section 7 presents conclusions
and acknowledgments. Familiarity with the standard TCP protocol
[STD-007] is assumed.
2. TRANSACTIONS USING STANDARD TCP
Reliable transfer of data depends upon sequence numbers. Before data
transfer can begin, both parties must "synchronize" the connection,
i.e, agree on common sequence numbers. The synchronization procedure
must preserve at-most-once semantics, i.e., be free from replay
hazards due to duplicate packets. The TCP developers adopted a
synchronization mechanism known as the 3-way handshake.
Consider a simple transaction in which client host A sends a single-
segment request to server host B, and B returns a single-segment
response. Many current TCP implementations use at least ten segments
(i.e., packets) for this sequence: three for the 3-way handshake
opening the connection, four to send and acknowledge the request and
response data, and three for TCP's full-duplex data-conserving close
sequence. These ten segments represent a high relative overhead for
two data-bearing segments. However, a more important consideration
is the transaction latency seen by the client: 2*RTT + SPT, larger
than the minimum by one RTT. As CPU and network speeds increase, the
relative significance of this extra transaction latency also
increases.
Proposed transaction transport protocols have typically used a
"timer-based" approach to connection synchronization [Birrell84]. In
this approach, once end-to-end connection state is established in the
client and server hosts, a subset of this state is maintained for
Braden [Page 3]
RFC 1379 Transaction TCP -- Concepts November 1992
some period of time. A new request before the expiration of this
timeout period can then reestablish the full state without an
explicit handshake. Watson pointed out that the timer-based approach
of his Delta-T protocol [Watson81] would encompass both virtual
circuits and transactions. However, the TCP group adopted the 3-way
handshake (because of uncertainty about the robustness of enforcing
the packet lifetime bounds required by Delta-T, within a general
Internet environment). More recently, Liskov, Shrira, and Wroclawski
[Liskov90] have proposed a different timer-based approach to
connection synchronization, requiring loosely-synchronized clocks in
the hosts.
The technique proposed in this memo, suggested by Clark [Clark89],
depends upon cacheing of connection state but not upon clocks or
timers; it is described in Section 3 below. Garlick, Rom, and Postel
also proposed a connection synchronization mechanism using cached
state [Garlick77]. Their scheme required each host to maintain
connection records containing the highest sequence number on each
connection. The technique suggested here retains only per-host
state, not per-connection state.
During TCP development, it was suggested that TCP could support
transactions with data segments containing both SYN and FIN bits.
(These "Kamikaze" segments were not supported as a service; they were
used mainly to crash other experimental TCPs!) To illustrate this
idea, Figure 1 shows a plausible application of the current TCP rules
to create a minimal transaction. (In fact, some minor adjustments in
the standard TCP spec would be required to make Figure 1 fully legal
[STD-007]).
Figure 1, like many of the examples shown in this memo, uses an
abbreviated form to illustrate segment sequences. For clarity and
brevity, it omits explicit sequence and acknowledgment numbers,
assuming that these will follow the well-known TCP rules. The
notation "ACK(x)" implies a cumulative acknowledgment for the control
bit or data "x" and everything preceding "x" in the sequence space.
The referent of "x" should be clear from the context. Also, host A
will always be the client and host B will be the server in these
diagrams.
The first three segments in Figure 1 implement the standard TCP
three-way handshake. If segment #1 had been an old duplicate, the
client side would have sent an RST (Reset) bit in segment #3,
terminating the sequence. The request data included on the initial
SYN segment cannot be delivered to user B until segment #3 completes
the 3-way handshake. Loading control bits onto the segments has
reduced the total number of segments to 5, but the client still
observes a transaction latency of 2*RTT + SPT. The 3-way handshake
Braden [Page 4]
RFC 1379 Transaction TCP -- Concepts November 1992
thus precludes high-performance transaction processing.
TCP A (Client) TCP B (Server)
_______________ ______________
CLOSED LISTEN
(Client sends request)
1. SYN-SENT --> <SYN,data1,FIN> --> SYN-RCVD
(data1 queued)
2. ESTABLISHED <-- <SYN,ACK(SYN)> <-- SYN-RCVD
3. FIN-WAIT-1 --> <ACK(SYN),FIN> --> CLOSE-WAIT
(data1 to server)
(Server sends reply)
4. TIME-WAIT <-- <ACK(FIN),data2,FIN> <-- LAST-ACK
(data2 to client)
5. TIME-WAIT --> <ACK(FIN)> --> CLOSED
(timeout)
CLOSED
Figure 1: Transaction Sequence: RFC-793 TCP
The TCP close sequence also poses a performance problem for
transactions: one or both end(s) of a closed connection must remain
in "TIME-WAIT" state until a 4 minute timeout has expired [STD-007].
The same connection (defined by the host and port numbers at both
ends) cannot be reopened until this delay has expired. Because of
TIME-WAIT state, a client program should choose a new local port
number (i.e., a different connection) for each successive
transaction. However, the TCP port field of 16 bits (less the
"well-known" port space) provides only 64512 available user ports.
This limits the total rate of transactions between any pair of hosts
to a maximum of 64512/240 = 268 per second. This is much too low a
rate for low-delay paths, e.g., high-speed LANs. A high rate of
short connections (i.e., transactions) could also lead to excessive
consumption of kernel memory by connection control blocks in TIME-
WAIT state.
In summary, to perform efficient transaction processing in TCP, we
need to suppress the 3-way handshake and to shorten TIME-WAIT state.
Braden [Page 5]
RFC 1379 Transaction TCP -- Concepts November 1992
Protocol mechanisms to accomplish these two goals are discussed in
Sections 3 and 4, respectively. Both require the choice of a
monotonic sequence-like space; Section 5 analyzes the choices and
makes a selection for this space. Finally, the TCP connection state
machine must be extended as described in Section 6.
Transaction processing in TCP raises some other protocol issues,
which are discussed in the functional specification memo [TTCP-FS].
These include:
(1) augmenting the user interface for transactions,
(2) delaying acknowledgment segments to allow maximum piggy-backing
of control bits with data,
(3) measuring the retransmission timeout time (RTO) on very short
connections, and
(4) providing an initial server window.
A recently proposed set of enhancements [RFC-1323] defines a TCP
Timestamps option that carries two 32-bit timestamp values. The
Timestamps option is used to accurately measure round-trip time
(RTT). The same option is also used in a procedure known as "PAWS"
(Protect Againsts Wrapped Sequence) to prevent erroneous data
delivery due to a combination of old duplicate segments and sequence
number reuse at very high bandwidths. The particular approach to
transactions chosen in this memo does not require the RFC-1323
enhancements; however, they are important and should be implemented
in every TCP, with or without the transaction extensions described
here.
3. BYPASSING THE 3-WAY HANDSHAKE
To avoid 3-way handshakes for transactions, we introduce a new
mechanism for validating initial SYN segments, i.e., for enforcing
at-most-once semantics without a 3-way handshake. We refer to this
as the TCP Accelerated Open, or TAO, mechanism.
3.1 Concept of TAO
The basis of TAO is this: a TCP uses cached per-host information
to immediately validate new SYNs [Clark89]. If this validation
fails, e.g., because there is no current cached state or the
segment is an old duplicate, the procedure falls back to a normal
3-way handshake to validate the SYN. Thus, bypassing a 3-way
handshake is considered to be an optional optimization.
Braden [Page 6]
RFC 1379 Transaction TCP -- Concepts November 1992
The proposed TAO mechanism uses a finite sequence-like space of
values that increase monotonically with successive transactions
(connections) between a given (client, server) host pair. Call
this monotonic space M, and let each initial SYN segment carry an
M value SEG.M. If M is not the existing sequence (SEG.SEQ) field,
SEG.M may be carried in a TCP option.
When host B receives from host A an initial SYN segment containing
a new value SEG.M, host B compares this against cache.M[A], the
latest M value that B has cached for host A. This comparison is
the "TAO test". Because the M values are monotonically
increasing, SEG.M > cache.M[A] implies that the SYN must be new
and can be accepted immediately. If not, a normal 3-way handshake
is performed to validate the initial SYN segment. Figure 2
illustrates the TAO mechanism; cached M values are shown enclosed
in square brackets. The M values generated by host A satisfy
x0 < x1, and the M values generated by host B satisfy y0 < y1.
An appropriate choice for the M value space is discussed in
Section 5. M values are drawn from a finite number space, so
inequalities must be defined in the usual way for sequence numbers
[STD-007]. The M space must not wrap so quickly that an old
duplicate SYN will be erroneously accepted. We assume that some
maximum segment lifetime (MSL) is enforced by the IP layer.
____T_C_P__A_____ ____T_C_P__B_____
cache.M[B] cache.M[A]
V V
[ y0 ] [ x0 ]
1. --> <SYN,data1,M=x1> --> ( (x1 > x0) =>
data1 -> user_B;
cache.M[A]= x1)
[ y0 ] [ x1 ]
2. <-- <SYN,ACK(data1),data2,M=y1> <--
(data2 -> user_A,
cache.M[B]= y1)
[ y1 ] [ x1 ]
... (etc.) ...
Figure 2. TAO: Three-Way Handshake is Bypassed
Braden [Page 7]
RFC 1379 Transaction TCP -- Concepts November 1992
Figure 2 shows the simplest case: each side has cached the latest
M value of the other, and the SEG.M value in the client's SYN
segment is greater than the value in the cache at the server host.
As a result, B can accept the client A's request data1 immediately
and pass it to the server application. B's reply data2 is shown
piggybacked on the <SYN,ACK> segment. As a result of this 2-way
exchange, the cached M values are updated at both sites; the
client side becomes relevant only if the client/server roles
reverse. Validation of the <SYN,ACK> segment at host A is
discussed later.
Figure 3 shows the TAO test failing but the consequent 3-way
handshake succeeding. B updates its cache with the value x2 >= x1
when the initial SYN is known to be valid.
_T_C_P__A _T_C_P__B
cache.M[B] cache.M[A]
V V
[ y0 ] [ x0 ]
1. --> <SYN,data1,M=x1> --> ( (x1 <= x0) =>
data1 queued;
3-way handshake)
[ y0 ] [ x0 ]
2. <-- <SYN,ACK(SYN),M=y1> <--
(cache.M[B]= y1)
[ y1 ] [ x0 ]
3. --> <ACK(SYN),M=x2> --> (Handshake OK =>
data1->user_B,
cache.M[A]= x2)
[ y1 ] [ x2 ]
... (etc.) ...
Figure 3. TAO Test Fails but 3-Way Handshake Succeeds.
There are several possible causes for a TAO test failure on a
legitimate new SYN segment (not an old duplicate).
(1) There may be no cached M value for this particular client
host.
(2) The SYN may be the one of a set of nearly-simultaneous SYNs
for different connections but from the same host, which
Braden [Page 8]
RFC 1379 Transaction TCP -- Concepts November 1992
arrived out of order.
(3) The finite M space may have wrapped around between successive
transactions from the same client.
(4) The M values may advance too slowly for closely-spaced
transactions.
None of these TAO failures will cause a lockout, because the
resulting 3-way handshake will succeed. Note that the first
transaction between a given host pair will always require a 3-way
handshake; subsequent transactions can take advantage of TAO.
The per-host cache required by TAO is highly desirable for other
reasons, e.g., to retain the measured round trip time and MTU for
a given remote host. Furthermore, a host should already have a
per-host routing cache [HR-COMM] that should be easily extensible
for this purpose.
Figure 4 illustrates a complete TCP transaction sequence using the
TAO mechanism. Bypassing the 3-way handshake leads to new
connection states; Figure 4 shows three of them, "SYN-SENT*",
"CLOSE-WAIT*", and "LAST-ACK*". Explanation of these states is
deferred to Section 6.
TCP A (Client) TCP B (Server)
_______________ ______________
CLOSED LISTEN
1. SYN-SENT* --> <SYN,data1,FIN,M=x1> --> CLOSE-WAIT*
(TAO test OK=>
data1->user_B)
<-- <SYN,ACK(FIN),data2,FIN,M=y1> <-- LAST-ACK*
2. TIME-WAIT
(data2->user_A)
3. TIME-WAIT --> <ACK(FIN),M=x2> --> CLOSED
(timeout)
CLOSED
Figure 4: Minimal Transaction Sequence Using TAO
Braden [Page 9]
RFC 1379 Transaction TCP -- Concepts November 1992
3.2 Cache Initialization
The first connection between hosts A and B will find no cached
state at one or both ends, so both M caches must be initialized.
This requires that the first transaction carry a specially marked
SEG.M value, which we call SEG.M.NEW. Receiving a SEG.M.NEW value
in an initial SYN segment, B will cache this value and send its
own M back to initialize A's cache. When a host crashes and
restarts, all its cached M values cache.M[*] must be invalidated
in order to force a re-synchronization of the caches at both ends.
This cache synchronization procedure is illustrated in Figure 5,
where client host A has crashed and restarted with its cache
entries undefined, as indicated by "??". Since cache.TS[B] is
undefined, A sends a SEG.M.NEW value instead of SEG.M in the <SYN>
segment of its first transaction request to B. Receiving this
SEG.M.NEW, the server host B invalidates cache.TS[A] and performs
a 3-way handshake. SEG.M in segment #2 updates A's cache, and
when the handshake completes successfully, B updates its cached M
value to x2 >= x1.
_T_C_P__A _T_C_P__B
cache.M[B] cache.M[A]
V V
[ ?? ] [ x0 ]
1. --> <SYN,data1,M.NEW=x1> --> (invalidate cache;
queue data1;
[ ?? ] 3-way handshake)
[ ?? ]
2. <-- <SYN,ACK(SYN),M=y1> <--
(cache.M[B]= y1)
[ y1 ] [ ?? ]
3. --> <ACK(SYN),M=x2> --> data1->user_B,
cache.M[A]= x2)
[ y1 ] [ x2 ]
... (etc.) ...
Figure 5. Client Host Crashed
Suppose that the 3-way handshake failed, presumably because
Braden [Page 10]
RFC 1379 Transaction TCP -- Concepts November 1992
segment #1 was an old duplicate. Then segment #3 from host A
would be an RST segment, with the result that both side's caches
would be left undefined.
Figure 6 shows the procedure when the server crashes and restarts.
Upon receiving a <SYN> segment from a host for which it has no
cached M value, B initiates a 3-way handshake to validate the
request and sends its own M value to A. Again the result is to
update cached M values on both sides.
_T_C_P__A _T_C_P__B
cache.M[B] cache.M[A]
V V
[ y0 ] [ ?? ]
1. --> <SYN,data1,M=x1> --> (data1 queued;
3-way handshake)
[ y0 ] [ ?? ]
2. <-- <SYN,ACK(SYN),M=y1> <--
(cache.M[B]= y1)
[ y1 ] [ ?? ]
3. --> <ACK(SYN),M=x2> --> (data1->user_B,
cache.M[A]= x2)
[ y1 ] [ x2 ]
... (etc.) ...
Figure 6. Server Host Crashed
3.3 Accepting <SYN,ACK> Segments
Transactions introduce a new hazard of erroneously accepting an
old duplicate <SYN,ACK> segment. To be acceptable, a <SYN,ACK>
segment must arrive in SYN-SENT state, and its ACK field must
acknowledge something that was sent. In current TCPs the
effective send window in SYN-SENT state is exactly one octet, and
an acceptable <SYN,ACK> must exactly ACK this one octet. The
clock-driven selection of Initial Sequence Number (ISN) makes an
erroneous acceptance exceedingly unlikely. An old duplicate SYN
could be accepted erroneously only if successive connection
attempts occurred more often than once every 4 microseconds, or if
the segment lifetime exceeded the 4 hour wraparound time for ISN
Braden [Page 11]
RFC 1379 Transaction TCP -- Concepts November 1992
selection.
However, when TCP is used for transactions, data sent with the
initial SYN increases the range of sequence numbers that have been
sent. This increases the danger of accepting an old duplicate
<SYN,ACK> segment, and the consequences are more serious. In the
example in Figure 7, segments 1-3 form a normal transaction
sequence, and segment 4 begins a new transaction (incarnation) for
the same connection. Segment #5 is a duplicate of segment #2 from
the preceding transaction. Although the new transaction has a
larger ISN, the previous ACK value 402 falls into the new range
[200,700) of sequence numbers that have been sent, so segment #5
could be erroneously accepted and passed to the client as the
response to the new request.
_T_C_P__A _T_C_P__B
CLOSED LISTEN
1. --> <seq=100,SYN,data=300,FIN,M=x1> --> (TAO test OK)
2. <-- <seq=800,ack=402,SYN,data=350,FIN,M=y1> <--
3. TIME-WAIT --> <ACK(FIN)> --> CLOSED
(short timeout)
CLOSED
(New Request)
4. --> <seq=200,SYN,data=500,FIN,M=x2> --> ...
(Duplicate of segment #2)
5. <-- <seq=800,ack=402,SYN,data=300,FIN,M=y1> <--...
(Acceptable!!)
Figure 7: Old Duplicate <SYN,ACK> Causing Error
Unfortunately, we cannot simply use TAO on the client side to
detect and reject old duplicate <SYN,ACK> segments. A TAO test at
the client might fail for a valid <SYN,ACK> segment, due to out-
of-order delivery, and this could result in permanent non-delivery
of a valid transaction reply.
Instead, we include a second M value, an echo of the client's M
value from the initial <SYN> segment, in the <SYN,ACK> segment. A
Braden [Page 12]
RFC 1379 Transaction TCP -- Concepts November 1992
specially-marked M value, SEG.M.ECHO, is used for this purpose.
The client knows the value it sent in the initial <SYN> and can
therefore positively validate the <SYN,ACK> using the echoed
value. This is illustrated in Figure 12, which is the same as
Figure 4 with the addition of the echoed value on the <SYN,ACK>
segment #2.
It should be noted that TCP allows a simultaneous open sequence in
which both sides send and receive an initial <SYN> (see Figure 8
of [STD-007]. In this case, the TAO test must be performed on
both sides to preserve the symmetry. See [TTCP-FS] for an
example.
4. SHORTENING TIME-WAIT STATE
Once a transaction has been initiated for a particular connection
(pair of ports) between a given host pair, a new transaction for the
same connection cannot take place for a time that is at least:
RTT + SPT + TIME-WAIT_delay
Since the client host can cycle among the 64512 available port
numbers, an upper bound on the transaction rate between a particular
host pair is:
[1] TRmax = 64512 /(RTT + TIME-WAIT_Delay)
in transactions per second (Tps), where we assumed SPT is negligible.
We must reduce TIME-WAIT_Delay to support high-rate TCP transaction
processing.
TIME-WAIT state performs two functions: (1) supporting the full-
duplex reliable close of TCP, and (2) allowing old duplicate segments
from an earlier connection incarnation to expire before they can
cause an error (see Appendix to [RFC-1185]). The first function
impacts the application model of a TCP connection, which we would not
want to change. The second is part of the fundamental machinery of
TCP reliable delivery; to safely truncate TIME-WAIT state, we must
provide another means to exclude duplicate packets from earlier
incarnations of the connection.
To minimize the delay in TIME-WAIT state while performing both
functions, we propose to set the TIME-WAIT delay to:
[2] TIME-WAIT_Delay = max( K*RTO, U )
where U and K are constants and RTO is the dynamically-determined
retransmission timeout, the measured RTT plus an allowance for the
Braden [Page 13]
RFC 1379 Transaction TCP -- Concepts November 1992
RTT variance [Jacobson88]. We choose K large enough so that there is
high probability of the close completing successfully if at all
possible; K = 8 seems reasonable. This takes care of the first
function of TIME-WAIT state.
In a real implementation, there may be a minimum RTO value Tr,
corresponding to the precision of RTO calculation. For example, in
the popular BSD implementation of TCP, the minimum RTO is Tr = 0.5
second. Assuming K = 8 and U = 0, Eqns [1] and [2] impose an upper
limit of TRmax = 16K Tps on the transaction rate of these
implementations.
It is possible to have many short connections only if RTO is very
small, in which case the TIME-WAIT delay [2] reduces to U. To
accelerate the close sequence, we need to reduce U below the MSL
enforced by the IP layer, without introducing a hazard from old
duplicate segments. For this purpose, we introduce another monotonic
number sequence; call it X. X values are required to be monotonic
between successive connection incarnations; depending upon the choice
of the X space (see Section 5), X values may also increase during a
connection. A value from the X space is to be carried in every
segment, and a segment is rejected if it is received with an X value
smaller than the largest X value received. This mechanism does not
use a cache; the largest X value is maintained in the TCP connection
control block (TCB) for each connection.
The value of U depends upon the choice for the X space, discussed in
the next section. If X is time-like, U can be set to twice the time
granularity (i.e, twice the minimum "tick" time) of X. The TIME-WAIT
delay will then ensure that current X values do not overlap the X
values of earlier incarnations of the same connection. Another
consequence of time-like X values is the possibility that an open but
idle connection might allow the X value to wrap its sign bit,
resulting in a lockup of the connection. To prevent this, a 24-day
idle timer on each open connection could bypass the X check on the
first segment following the idle period, for example. In practice,
many implementations have keep-alive mechanisms that prevent such
long idle periods [RFC-1323].
Referring back to Figure 4, our proposed transaction extension
results in a minimum exchange of 3 packets. Segment #3, the final
ACK segment, does not increase transaction latency, but in
combination with the TIME-WAIT delay of K*RTO it ensures that the
server side of the connection will be closed before a new transaction
is issued for this same pair of ports. It also provides an RTT
measurement for the server.
We may ask whether it would be possible to further reduce the TIME-
Braden [Page 14]
RFC 1379 Transaction TCP -- Concepts November 1992
WAIT delay. We might set K to zero; alternatively, we might allow
the client TCP to start a new transaction request while the
connection was still in TIME-WAIT state, with the new initial SYN
acting as an implied acknowledgment of the previous FIN. Appendix A
summarizes the issues raised by these alternatives, which we call
"truncating" TIME-WAIT state, and suggests some possible solutions.
Further study would be required, but these solutions appear to bend
the theory and/or implementations of the TCP protocol farther than we
wish to bend them.
We therefore propose using formula [2] with K=8 and retaining the
final ACK(FIN) transmission. To raise the transaction rate,
therefore, we require small values of RTO and U.
5. CHOOSING A MONOTONIC SEQUENCE
For simplicity, we want the monotonic sequence X used for shortening
TIME-WAIT state to be identical to the monotonic sequence M for
bypassing the 3-way handshake. Calling the common space M, we will
send an M value SEG.M in each TCP segment. Upon receipt of an
initial SYN segment, SEG.M will be compared with a per-host cached
value to authenticate the SYN without a 3-way handshake; this is the
TAO mechanism. Upon receipt of a non-SYN segment, SEG.M will be
compared with the current value in the connection control block and
used to discard old duplicates.
Note that the situation with TIME-WAIT state differs from that of
bypassing 3-way handshakes in two ways: (a) TIME-WAIT requires
duplicate detection on every segment vs. only on SYN segments, and
(b) TIME-WAIT applies to a single connection vs. being global across
all connections. This section discusses possible choices for the
common monotonic sequence.
The SEG.M values must satisfy the following requirements.
* The values must be monotonic; this requirement is defined more
precisely below.
* Their granularity must be fine-grained enough to support a high
rate of transaction processing; the M clock must "tick" at least
once between successive transactions.
* Their range (wrap-around time) must be great enough to allow a
realistic MSL to be enforced by the network.
The TCP spec calls for an MSL of 120 secs. Since much of the
Internet does not carefully enforce this limit, it would be safer to
have an MSL at least an order of magnitude larger. We set as an
Braden [Page 15]
RFC 1379 Transaction TCP -- Concepts November 1992
objective an MSL of at least 2000 seconds. If there were no TIME-
WAIT delay, the ultimate limit on transaction rate would be set by
speed-of-light delays in the network and by the latency of host
operating systems. As the bottleneck problems with interfacing CPUs
to gigabit LANs are solved, we can imagine transaction durations as
short as 1 microsecond. Therefore, we set an ultimate performance
goal of TRmax at least 10**6 Tps.
A particular connection between hosts A and B is identified by the
local and remote TCP "sockets", i.e., by the quadruplet: {A, B,
Port.A, Port.B}. Imagine that each host keeps a count CC of the
number of TCP connections it has initiated. We can use this CC
number to distinguish different incarnations of the same connection.
Then a particular SEG.M value may be labeled implicitly by 6
quantities: {A, B, Port.A, Port.B, CC, n}, where n is the byte offset
of that segment within the connection incarnation.
To bypass the 3-way handshake, we require thgt SEG.M values on
successive SYN segments from a host A to a host B be monotone
increasing. If CC' > CC, then we require that:
SEG.M(A,B,Port.A,Port.B,CC',0) > SEG.M(A,B,Port.A,Port.B,CC,0)
for any legal values of Port.A and Port.B.
To delete old duplicates (allowing TIME-WAIT state to be shortened),
we require that SEG.M values be disjoint across different
incarnations of the same connection. If CC' > CC then
SEG.M(A,B,Port.A,Port.B,CC',n') > SEG.M(A,B,Port.A,Port.B,CC,n),
for any non-negative integers n and n'.
We now consider four different choices for the common monotonic
space: RFC-1323 timestamps, TCP sequence numbers, the connection
count, and 64-bit TCP sequence numbers. The results are summarized
in Table I.
5.1 Cached Timestamps
The PAWS mechanism [RFC-1323] uses TCP "timestamps" as
monotonically increasing integers in order to throw out old
duplicate segments within the same incarnation. Jacobson
suggested the cacheing of these timestamps for bypassing 3-way
handshakes [Jacobson90], i.e., that TCP timestamps be used for our
common monotonic space M. This idea is attractive since it would
allow the same timestamp options to be used for RTTM, PAWS, and
transactions.
Braden [Page 16]
RFC 1379 Transaction TCP -- Concepts November 1992
To obtain at-most-once service, the criterion for immediate
acceptance of a SYN must be that SEG.M is strictly greater than
the cached M value. That is, to be useful for bypassing 3-way
handshakes, the timestamp clock must tick at least once between
any two successive transactions between the same pair of hosts
(even if different ports are used). Hence, the timestamp clock
rate would determine TRmax, the maximum possible transaction rate.
Unfortunately, the timestamp clock frequency called for by RFC-
1323, in the range 1 sec to 1 ms, is much too slow for
transactions. The TCP timestamp period was chosen to be
comparable to the fundamental interval for computing and
scheduling retransmission timeouts; this is generally in the range
of 1 sec. to 1 ms., and in many operating systems, much closer to
1 second. Although it would be possible to increase the timestamp
clock frequency by several orders of magnitude, to do so would
make implementation more difficult, and on some systems
excessively expensive.
The wraparound time for TCP timestamps, at least 24 days, causes
no problem for transactions.
The PAWS mechanism uses TCP timestamps to protect against old
duplicate non-SYN segments from the same incarnation [RFC-1323].
It can also be used to protect against old duplicate data segments
from earlier incarnations (and therefore allow shortening of
TIME-WAIT state) if we can ensure that the timestamp clock ticks
at least once between the end of one incarnation and the beginning
of the next. This can be achieved by setting U = 2 seconds, i.e.,
to twice the maximum timestamp clock period. This value in
formula [2] leads to an upper bound TRmax = 32K Tps between a host
pair. However, as pointed out above, old duplicate SYN detection
using timestamps leads to a smaller transaction rate bound, 1 Tps,
which is unacceptable. In addition, the timestamp approach is
imperfect; it allows old ACK segments to enter the new connection
where they can cause a disconnect. This happens because old
duplicate ACKs that arrive during TIME-WAIT state generate new
ACKs with the current timestamp [RFC-1337].
We therefore conclude that timestamps are not adequate as the
monotonic space M; see Table I. However, they may still be useful
to effectively extend some other monotonic number space, just as
they are used in PAWS to extend the TCP sequence number space.
This is discussed below.
Braden [Page 17]
RFC 1379 Transaction TCP -- Concepts November 1992
5.2 Current TCP Sequence Numbers
It is useful to understand why the existing 32-bit TCP sequence
numbers do not form an appropriate monotonic space for
transactions.
The sequence number sent in an initial SYN is called the Initial
Sequence Number or ISN. According to the TCP specification, an
ISN is to be selected using:
[3] ISN = (R*T) mod 2**32
where T is the real time in seconds (from an arbitrary origin,
fixed when the system is started) and R is a constant, currently
250 KBps. These ISN values form a monotonic time sequence that
wraps in 4.55 hours = 16380 seconds and has a granularity of 4
usecs. For transaction rates up to roughly 250K Tps, the ISN
value calculated by formula [3] will be monotonic and could be
used for bypassing the 3-way handshake.
However, TCP sequence numbers (alone) could not be used to shorten
TIME-WAIT state, because there are several ways that overlap of
the sequence space of successive incarnations can occur (as
described in Appendix to [RFC-1185]). One way is a "fast
connection", with a transfer rate greater than R; another is a
"long" connection, with a duration of approximately 4.55 hours.
TIME-WAIT delay is necessary to protect against these cases. With
the official delay of 240 seconds, formula [1] implies a upper
bound (as RTT -> 0) of TRmax = 268 Tps; with our target MSL of
2000 sec, TRmax = 32 Tps. These values are unacceptably low.
To improve this transaction rate, we could use TCP timestamps to
effectively extend the range of the TCP sequence numbers.
Timestamps would guard against sequence number wrap-around and
thereby allow us to increase R in [3] to exceed the maximum
possible transfer rate. Then sequence numbers for successive
incarnations could not overlap. Timestamps would also provide
safety with an MSL as large as 24 days. We could then set U = 0
in the TIME-WAIT delay calculation [2]. For example, R = 10**9
Bps leads to TRmax <= 10**9 Tps. See 2(b) in Table I. These
values would more than satisfy our objectives.
We should make clear how this proposal, sequence numbers plus
timestamps, differs from the timestamps alone discussed (and
rejected) in the previous section. The difference lies in what is
cached and tested for TAO; the proposal here is to cache and test
BOTH the latest TCP sequence number and the latest TCP timestamp.
In effect, we are proposing to use timestamps to logically extend
Braden [Page 18]
RFC 1379 Transaction TCP -- Concepts November 1992
the sequence space to 64 bits. Another alternative, presented in
the next section, is to directly expand the TCP sequence space to
64 bits.
Unfortunately, the proposed solution (TCP sequence numbers plus
timestamps) based on equation [3] would be difficult or impossible
to implement on many systems, which base their TCP implementation
upon a very low granularity software clock, typically O(1 sec).
To adapt the procedure to a system with a low granularity software
clock, suppose that we calculate the ISN as:
[4] ISN = ( R*Ts*floor(T/Ts) + q*CC) mod 2**32
where Ts is the time per tick of the software clock, CC is the
connection count, and q is a constant. That is, the ISN is
incremented by the constant R*Ts once every clock tick and by the
constant q for every new connection. We need to choose q to
obtain the required monotonicity.
For monotonicity of the ISN's themselves, q=1 suffices. However,
monotonicity during the entire connection requires q = R*Ts. This
value of q can be deduced as follows. Let S(T, CC, n) be the
sequence number for byte offset n in a connection with number CC
at time T:
S(T, CC, n) = (R*Ts*floor(T/Ts) + q*CC + n) mod 2**32.
For any T1 > T2, we require that: S(T2, CC+1, 0) - S(T1, CC, n) >
0 for all n. Since R is assumed to be an upper bound on the
transfer rate, we can write down:
R > n/(T2 - T1), or T2/Ts - T1/Ts > n/(R*Ts)
Using the relationship: floor(x)-floor(y) > x-y-1 and a little
algebra leads to the conclusion that using q = R*Ts creates the
required monotonic number sequence. Therefore, we consider:
[5] ISN = R*Ts*(floor(T/Ts) + CC) mod 2**32
(which is the algorithm used for ISN selection by BSD TCP).
For error-free operation, the sequence numbers generated by [5]
must not wrap the sign bit in less than MSL seconds. Since CC
cannot increase faster than TRmax, the safe condition is:
R* (1 + Ts*TRmax) * MSL < 2**31.
We are interested in the case: Ts*TRmax >> 1, so this relationship
Braden [Page 19]
RFC 1379 Transaction TCP -- Concepts November 1992
reduces to:
[6] R * Ts * TRmax * MSL < 2**31.
This shows a direct trade-off among the maximum effective
bandwidth R, the maximum transaction rate TRmax, and the maximum
segment lifetime MSL. For reasonable limiting values of R, Ts,
and MSL, formula [6] leads to a very low value of TRmax. For
example, with MSL= 2000 secs, R=10**9 Bps, and Ts = 0.5 sec, TRmax
< 2*10**-3 Tps.
To ease the situation, we could supplement sequence numbers with
timestamps. This would allow an effective MSL of 2 seconds in
[6], since longer times would be protected by differing
timestamps. Then TRmax < 2**30/(R*Ts). The actual enforced MSL
would be increased to 24 days. Unfortunately, TRmax would still
be too small, since we want to support transfer rates up to R ~
10**9 Bps. Ts = 0.5 sec would imply TRmax ~ 2 Tps. On many
systems, it appears infeasible to decrease Ts enough to obtain an
acceptable TRmax using this approach.
5.3 64-bit TCP Sequence Numbers
Another possibility would be to simply increase the TCP sequence
space to 64 bits as suggested in [RFC-1263]. We would also
increase the R value for clock-driven ISN selection, beyond the
fastest transfer rate of which the host is capable. A reasonable
upper limit might be R = 10**9 Bps. As noted above, in a
practical implementation we would use:
ISN = R*Ts*( floor(T/Ts) + CC) mod 2**64
leading to:
R*(1 + Ts * TRmax) * MSL < 2**63
For example, suppose that R = 10**9 Bps, Ts = 0.5, and MSL = 16K
secs (4.4 hrs); then this result implies that TRmax < 10**6 Tps.
We see that adding 32 bits to the sequence space has provided
feasible values for transaction processing.
5.4 Connection Counts
The Connection Count CC is well suited to be the monotonic
sequence M, since it "ticks" exactly once for each new connection
incarnation and is constant within a single incarnation. Thus, it
perfectly separates segments from different incarnations of the
same connection and would allow U = 0 in the TIME-WAIT state delay
Braden [Page 20]
RFC 1379 Transaction TCP -- Concepts November 1992
formula [2]. (Strictly, U cannot be reduced below 1/R = 4 usec,
as noted in Section 4. However, this is of little practical
consequence until the ultimate limits on TRmax are approached).
Assume that CC is a 32-bit number. To prevent wrap-around in the
sign bit of CC in less than MSL seconds requires that:
TRmax * MSL < 2**31
For example, if MSL = 2000 seconds then TRmax < 10**6 Tp. These
are acceptable limits for transaction processing. However, if
they are not, we could augment CC with TCP timestamps to obtain
very far-out limits, as discussed below.
It would be an implementation choice at the client whether CC is
global for all destinations or private to each destination host
(and maintained in the per-host cache). In the latter case, the
last CC value assigned for each remote host could also be
maintained in the per-host cache. Since there is not typically a
large amount of parallelism in the network connection of a host,
there should be little difference in the performance of these two
different approaches, and the single global CC value is certainly
simpler.
To augment CC with TCP timestamps, we would bypass a 3-way
handshake if both SEG.CC > cache.CC[A] and SEG.TSval >=
cache.TS[A]. The timestamp check would detect a SYN older than 2
seconds, so that the effective wrap-around requirement would be:
TRmax * 2 < 2**31
i.e., TRmax < 10**9 Tps. The required MSL would be raised to 24
days. Using timestamps in this way, we could reduce the size of
CC. For example, suppose CC were 16 bits. Then the wrap-around
condition TRmax * 2 < 2**15 implies that TRmax is 16K.
Finally, note that using CC to delete old duplicates from earlier
incarnations would not obviate the need for the time-stamp-based
PAWS mechanism to prevent errors within a single incarnation due
to wrapping the 32-bit TCP sequence space at very high transfer
rates.
5.5 Conclusions
The alternatives for monotonic sequence are summarized in Table I.
We see that there are two feasible choices for the monotonic
space: the connection count and 64-bit sequence numbers. Of these
two, we believe that the simpler is the connection count.
Braden [Page 21]
RFC 1379 Transaction TCP -- Concepts November 1992
Implementation of 64-bit sequence numbers would require
negotiation of a new header format and expansion of all variables
and calculations on the sequence space. CC can be carried in an
option and need be examined only once per packet.
We propose to use a simple 32-bit connection count CC, without
augmentation with timestamps, for the transaction extension. This
choice has the advantages of simplicity and directness. Its
drawback is that it adds a third sequence-like space (in addition
to the TCP sequence number and the TCP timestamp) to each TCP
header and to the main line of packet processing. However, the
additional code is in fact very modest.
We now have a general outline of the proposed TCP extensions for
transactions.
o A host maintains a 32-bit global connection counter variable CC.
o The sender's current CC value is carried in an option in every
TCP segment.
o CC values are cached per host, and the TAO mechanism is used to
bypass the 3-way handshake when possible.
o In non-SYN segments, the CC value is used to reject duplicates
from earlier incarnations. This allows TIME-WAIT state delay to
be reduced to K*RTO (i.e., U=0 in Eq. [2]).
Braden [Page 22]
RFC 1379 Transaction TCP -- Concepts November 1992
TABLE I: Summary of Monotonic Sequences
APPROACH TRmax (Tps) Required MSL COMMENTS
__________________________________________________________________
1. Timestamp & PAWS 1 24 days TRmax is
too small
__________________________________________________________________
2. Current TCP Sequence Numbers
(a) clock-driven
ISN: eq. [3] 268 240 secs TRmax & MSL
too small
(b) Timestamps& clock-
driven ISN [3] & 10**9 24 days Hard to
R=10**9 implement
(c) Timestamps & c-dr
ISN: eq. [4] 2**30/(R*Ts) 24 days TRmax too
small.
__________________________________________________________________
3. 64-bit TCP Sequence Numbers
2**63/(MSL*R*Ts) MSL Significant
TCP change
e.g., R=10**9 Bps,
MSL = 4.4 hrs,
Ts = 0.5 sec=>
TRmax = 10**6
__________________________________________________________________
4. Connection Counts
(a) no timestamps 2**31/MSL MSL 3rd sequence
e.g., MSL=2000 sec space
TRmax = 10**6
(b) with timestamps 2**30 24 days (ditto)
and PAWS
__________________________________________________________________
Braden [Page 23]
RFC 1379 Transaction TCP -- Concepts November 1992
6. CONNECTION STATES
TCP has always allowed a connection to be half-closed. TAO makes a
significant addition to TCP semantics by allowing a connection to be
half-synchronized, i.e., to be open for data transfer in one
direction before the other direction has been opened. Thus, the
passive end of a connection (which receives an initial SYN) can
accept data and even a FIN bit before its own SYN has been
acknowledged. This SYN, data, and FIN may arrive on a single segment
(as in Figure 4), or on multiple segments; packetization makes no
difference to the logic of the finite-state machine (FSM) defining
transitions among connection states.
Half-synchronized connections have several consequences.
(a) The passive end must provide an implied initial data window in
order to accept data. The minimum size of this implied window
is a parameter in the specification; we suggest 4K bytes.
(b) New connection states and transitions are introduced into the
TCP FSM at both ends of the connection. At the active end, new
states are required to piggy-back the FIN on the initial SYN
segment. At the passive end, new states are required for a
half-synchronized connection.
This section develops the resulting FSM description of a TCP
connection as a conventional state/transition diagram. To develop a
complete FSM, we take a constructive approach, as follows: (1) write
down all possible events; (2) write down the precedence rules that
govern the order in which events may occur; (3) construct the
resulting FSM; and (4) augment it to support TAO. In principle, we
do this separately for the active and passive ends; however, the
symmetry of TCP results in the two FSMs being almost entirely
coincident.
Figure 8 lists all possible state transitions for a TCP connection in
the absence of TAO, as elementary events and corresponding actions.
Each transition is labeled with a letter. Transitions a-g are used
by the active side, and c-i are used by the passive side. Without
TAO, transition "c" (event "rcv ACK(SYN)") synchronizes the
connection, allowing data to be accepted for the user.
By definition, the first transition for an active (or passive) side
must be "a" (or "i", respectively). During a single instance of a
connection, the active side will progress through some permutation of
the complete sequence of transitions {a b c d e f } or the sequence
{a b c d e f g}. The set of possible permutations is determined by
precedence rules governing the order in which transitions can occur.
Braden [Page 24]
RFC 1379 Transaction TCP -- Concepts November 1992
Label Event / Action
_____ ________________________
a OPEN / snd SYN
b rcv SYN [No TAO]/ snd ACK(SYN)
c rcv ACK(SYN) /
d CLOSE / snd FIN
e rcv FIN / snd ACK(FIN)
f rcv ACK(FIN) /
g timeout=2MSL / delete TCB
___________________________________________________
h passive OPEN / create TCB
i rcv SYN [No TAO]/ snd SYN, ACK(SYN)
___________________________________________________
Figure 8. Basic TCP Connection Transitions
Using the notation "<." to mean "must precede", the precedence rules
are:
(1) Logical ordering: must open connection before closing it:
b <. e
(2) Causality -- cannot receive ACK(x) before x has been sent:
a <. c and i <. c and d <. f
(3) Acknowledgments are cumulative
c <. f
(4) First packet in each direction must contain a SYN.
b <. c and b <. f
(5) TIME-WAIT state
Whenever d precedes e in the sequence, g must be the last
transition.
Braden [Page 25]
RFC 1379 Transaction TCP -- Concepts November 1992
Applying these rules, we can enumerate all possible permutations of
the events and summarize them in a state transition diagram. Figure
9 shows the result, with boxes representing the states and directed
arcs representing the transitions.
________ ________
| | h | |
| CLOSED |--------->| LISTEN |
|________| |________|
| |
| a | i
____V____ ____V___ ________
| | b | | e | |
| |--------->| |-------------->| |
|________| |________| |________|
/ / | / |
/ / | c d / | c
/ / __V_____ | ____V___
/ / | | e | | |
d | d / | |------------>| |
| | |________| | |________|
| | | | |
| | | ___V____ |
| | | | | |
| | | | | |
| | | |________| |
| | | | |
____V___ ______V_ | ________ | |
| | b | | e | | | | |
| |------->| |--------->| | | |
|________| |________| | |________| | |
| / | | |
c | / d c | c | d |
| / | | |
_V___V__ ____V___ V_____V_
| | e | | | |
| |---->| | | |
|________| |________| |________|
| | |
| f | f | f
____V___ ____V___ ___V____
| | e | TIME- | g | |
| |---->| WAIT |-->| CLOSED |
|________| |________| |________|
Figure 9: Basic State Diagram
Braden [Page 26]
RFC 1379 Transaction TCP -- Concepts November 1992
Although Figure 9 gives a correct representation of the possible
event sequences, it is not quite correct for the actions, which do
not compose as shown. In particular, once a control bit X has been
sent, it must continue to be sent until ACK(X) is received. This
requires new transitions with modified actions, shown in the
following list. We use the labeling convention that transitions with
the same event part all have the same letter, with different numbers
of primes to indicate different actions.
Label Event / Action
_____ _______________________________________
b' (=i) rcv SYN [No TAO] / snd SYN,ACK(SYN)
b'' rcv SYN [No TAO] / snd SYN,FIN,ACK(SYN)
d' CLOSE / snd SYN,FIN
e' rcv FIN / snd FIN,ACK(FIN)
e'' rcv FIN / snd SYN,FIN,ACK(FIN)
Figure 10 shows the state diagram of Figure 9, with the modified
transitions and with the states used by standard TCP [STD-007]
identified. Those states that do not occur in standard TCP are
numbered 1-5.
Standard TCP has another implied restriction: a FIN bit cannot be
recognized before the connection has been synchronized, i.e., c <. e.
This eliminates from standard TCP the states 1, 2, and 5 shown in
Figure 10. States 3 and 4 are needed if a FIN is to be piggy-backed
on a SYN segment (note that the states shown in Figure 1 are actually
wrong; the states shown as SYN-SENT and ESTABLISHED are really states
3 and 4). In the absence of piggybacking the FIN bit, Figure 10
reduces to the standard TCP state diagram [STD-007].
The FSM described in Figure 10 is intended to be applied
cumulatively; that is, parsing a single packet header may lead to
more than one transition. For example, the standard TCP state
diagram includes a direct transition from SYN-SENT to ESTABLISHED:
rcv SYN,ACK(SYN) / snd ACK(SYN).
This is transition b followed immediately by c.
Braden [Page 27]
RFC 1379 Transaction TCP -- Concepts November 1992
________ ________
| | h | |
| CLOSED |--------->| LISTEN |
|________| |________|
| |
| a | i
____V____ ____V___ ________
| SYN- | b' | SYN- | e' | |
| SENT |--------->|RECEIVED|-------------->| 1 |
|________| |________| |________|
/ / | | |
d'/ d'/ | c d' | c |
/ / __V_____ | _V______
/ / |ESTAB- | e | | CLOSE- |
| / | LISHED|------------|-->| WAIT |
| | |________| | |________|
| | | | |
| | | _____V__ |
| | | | | |
| | | | 2 | |
| | | |________| |
| | | | |
____V___ ______V_ | ________ | |
| | b'' | |e''' | | | | |
| 3 |------->| 4 |--------->| 5 | | |
|________| |________| | |________| | |
| / | | |
c | / d c | c | d |
| / | | |
_V___V__ ____V___ V_____V_
| FIN- | e'' | | | LAST- |
| WAIT-1|---->|CLOSING | | ACK |
|________| |________| |________|
| | |
| f | f | f
____V___ ____V___ ___V____
| FIN- | e | TIME- | g | |
| WAIT-2|---->| WAIT |-->| CLOSED |
|________| |________| |________|
Figure 10: Basic State Diagram -- Correct Actions
Next we introduce TAO. If the TAO test succeeds, the connection
becomes half-synchronized. This requires a new set of states,
mirroring the states of Figure 10, beginning with acceptance of a SYN
(transition "b" or "i"), and ending when ACK(SYN) arrives (transition
Braden [Page 28]
RFC 1379 Transaction TCP -- Concepts November 1992
"c"). Figure 11 shows the result of augmenting Figure 10 with the
additional states for TAO. The transitions are defined in the
following table:
Key for Figure 11: Complete State Diagram with TAO
Label Event / Action
_____ ________________________
a OPEN / create TCB, snd SYN
b' rcv SYN [no TAO]/ snd SYN,ACK(SYN)
b'' rcv SYN [no TAO]/ snd SYN,FIN,ACK(SYN)
c rcv ACK(SYN) /
d CLOSE / snd FIN
d' CLOSE / snd SYN,FIN
e rcv FIN / snd ACK(FIN)
e' rcv FIN / snd SYN,ACK(FIN)
e'' rcv FIN / snd FIN,ACK(FIN)
e''' rcv FIN / snd SYN,FIN,ACK(FIN)
f rcv ACK(FIN) /
g timeout=2MSL / delete TCB
h passive OPEN / create TCB
i (= b') rcv SYN [no TAO]/ snd SYN,ACK(SYN)
j rcv SYN [TAO OK] / snd SYN,ACK(SYN)
k rcv SYN [TAO OK] / snd SYN,FIN,ACK(SYN)
Each new state in Figure 11 bears a very simple relationship to a
standard TCP state. We indicate this by naming the new state with
the standard state name followed by a star. States SYN-SENT* and
SYN-RECEIVED* differ from the corresponding unstarred states in
recording the fact that a FIN has been sent. The other new states
with starred names differ from the corresponding unstarred states in
being half-synchronized (hence, a SYN bit needs to be transmitted).
The state diagram of Figure 11 is more general than required for
transaction processing. In particular, it handles simultaneous
connection synchronization from both sides, allowing one or both
sides to bypass the 3-way handshake. It includes other transitions
that are unlikely in normal transaction processing, for example, the
server sending a FIN before it receives a FIN from the client
(ESTABLISHED* -> FIN-WAIT-1* in Figure 11).
Braden [Page 29]
RFC 1379 Transaction TCP -- Concepts November 1992
________ ________
| | h | |
| CLOSED |--------------->| LISTEN |
|________| |________|
| / |
a| / i | j
| / |
| / _V______ ________
| j | |ESTAB- | e' | CLOSE- |
| /---------|----->| LISHED*|------------>| WAIT*|
| / | |________| |________|
| / | | | | |
| / | |d' | c d' | | c
____V___ / ______V_ | _V______ | _V______
| SYN- | b' | SYN- | c | |ESTAB- | e | | CLOSE- |
| SENT |------>|RECEIVED|-----|-->| LISHED|----------|->| WAIT |
|________| |________| | |________| | |________|
| | | | | |
| | | | ___V____ |
| | | | | LAST- | |
| d' | d' | d' | d | ACK* | |
| | | | |________| |
| | | | | |
| | ______V_ | ________ |c |d
| k | | FIN- | | e''' | | | |
| /------|-->| WAIT-1*|---|------>|CLOSING*| | |
| / | |________| | |________| | |
| / | | | | | |
| / | | c | | c | |
____V___ / ____V___ V_____V_ ____V___ V____V__
| SYN- | b'' | SYN- | c | FIN- | e'' | | | LAST- |
| SENT* |----->|RECEIVD*|---->| WAIT-1 |---->|CLOSING | | ACK |
|________| |________| |________| |________| |________|
| | |
| f | f | f
___V____ ____V___ ___V____
| FIN- | e |TIME- | g | |
| WAIT-2 |---->| WAIT |-->| CLOSED |
|________| |________| |________|
Figure 11: Complete State Diagram with TAO
The relationship between starred and unstarred states is very
regular. As a result, the state extensions can be implemented very
simply using the standard TCP FSM with the addition of two "hidden"
boolean flags, as described in the functional specification memo
Braden [Page 30]
RFC 1379 Transaction TCP -- Concepts November 1992
[TTCP-FS].
As an example of the application of Figure 11, consider the minimal
transaction shown in Figure 12.
TCP A (Client) TCP B (Server)
_______________ ______________
CLOSED LISTEN
1. SYN-SENT* --> <SYN,data1,FIN,CC=x1> --> CLOSE-WAIT*
(TAO test OK=>
data1->user_B)
LAST-ACK*
<-- <SYN,ACK(FIN),data2,FIN,CC=y1,CC.ECHO=x1> <--
2. TIME-WAIT
(TAO test OK,
data2->user_A)
3. TIME-WAIT --> <ACK(FIN),CC=x2> --> CLOSED
(timeout)
CLOSED
Figure 12: Minimal Transaction Sequence
Sending segment #1 leaves the client end in SYN-SENT* state, which
differs from SYN-SENT state in recording the fact that a FIN has been
sent. At the server end, passing the TAO test enters ESTABLISHED*
state, which passes the data to the user as in ESTABLISHED state and
also records the fact that the connection is half synchronized. Then
the server processes the FIN bit of segment #1, moving to CLOSE-WAIT*
state.
Moving to CLOSE-WAIT* state should cause the server to send a segment
containing SYN and ACK(FIN). However, transmission of this segment
is deferred so the server can piggyback the response data and FIN on
the same segment, unless a timeout occurs first. When the server
does send segment #2 containing the response data2 and a FIN, the
connection advances from CLOSE-WAIT* to LAST-ACK* state; the
connection is still half-synchronized from B's viewpoint.
Processing segment #2 at the client again results in multiple
transitions:
Braden [Page 31]
RFC 1379 Transaction TCP -- Concepts November 1992
SYN-SENT* -> FIN-WAIT-1* -> CLOSING* -> CLOSING -> TIME-WAIT
These correspond respectively to receiving a SYN, a FIN, an ACK for
A's SYN, and an ACK for A's FIN.
Figure 13 shows a slightly more complex example, a transaction
sequence in which request and response data each require two
segments. This figure assumes that both client and server TCP are
well-behaved, so that e.g., the client sends the single segment #5 to
acknowledge both data segments #3 and #4. SEG.CC values are omitted
for clarity.
_T_C_P__A _T_C_P__B
1. SYN-SENT* --> <SYN,data1> --> ESTABLISHED*
(TAO OK,
data1-> user)
2. SYN-SENT* --> <data2,FIN> --> CLOSE-WAIT*
(data2-> user)
3. FIN-WAIT-2 <-- <SYN,ACK(FIN),data3> <-- CLOSE-WAIT*
(data3->user)
4. TIME_WAIT <-- <ACK(FIN),data4,FIN> <-- LAST-ACK*
(data4->user)
5. TIME-WAIT --> <ACK(FIN)> --> CLOSED
Figure 13. Multi-Packet Request/Response Transaction
7. CONCLUSIONS AND ACKNOWLEDGMENTS
TCP was designed to be a highly symmetric protocol. This symmetry is
evident in the piggy-backing of acknowledgments on data and in the
common header format for data segments and acknowledgments. On the
other hand, the examples and discussion in this memo are in general
highly unsymmetrical; the actions of a "client" are clearly
distinguished from those of a "server". To explain this apparent
discrepancy, we note the following. Even when TCP is used for
virtual circuit service, the data transfer phase is symmetrical but
the open and close phases are not. A minimal transaction, consisting
of one segment in each direction, compresses the open, data transfer,
and close phases together, and making the asymmetry of the open and
Braden [Page 32]
RFC 1379 Transaction TCP -- Concepts November 1992
close phases dominant. As request and response messages increase in
size, the virtual circuit model becomes increasingly relevant, and
symmetry again dominates.
TCP's 3-way handshake precludes any performance gain from including
data on a SYN segment, while TCP's full-duplex data-conserving close
sequence ties up communication resources to the detriment of high-
speed transactions. Merely loading more control bits onto TCP data
segments does not provide efficient transaction service. To use TCP
as an effective transaction transport protocol requires bypassing the
3-way handshake and shortening the TIME-WAIT delay. This memo has
proposed a backwards-compatible TCP extension to accomplish both
goals. It is our hope that by building upon the current version of
TCP, we can give a boost to community acceptance of the new
facilities. Furthermore, the resulting protocol implementations will
retain the algorithms that have been developed for flow and
congestion control in TCP [Jacobson88].
O'Malley and Peterson have recently recommended against backwards-
compatible extensions to TCP, and suggested instead a mechanism to
allow easy installation of alternative versions of a protocol [RFC-
1263]. While this is an interesting long-term approach, in the
shorter term we suggest that incremental extension of the current TCP
may be a more effective route.
Besides the backward-compatible extension proposed here, there are
two other possible approaches to making efficient transaction
processing widely available in the Internet: (1) a new version of TCP
or (2) a new protocol specifically adapted to transactions. Since
current TCP "almost" supports transactions, we favor (1) over (2). A
new version of TCP that retained the semantics of STD-007 but used 64
bit sequence numbers with the procedures and states described in
Sections 3, 4, and 6 of this memo would support transactions as well
as virtual circuits in a clean, coherent manner.
A potential application of transaction-mode TCP might be SMTP. If
commands and responses are batched, in favorable cases complete SMTP
delivery operations on short messages could be performed with a
single minimal transaction; on the other hand, the body of a message
may be arbitrarily large. Using a TCP extended as in this memo could
significantly reduce the load on large mail hosts.
This work began as an elaboration of the concept of TAO, due to Dave
Clark. I am grateful to him and to Van Jacobson, John Wroclawski,
Dave Borman, and other members of the End-to-End Research group for
helpful ideas and critiques during the long development of this work.
I also thank Liming Wei, who tested the initial implementation in Sun
OS.
Braden [Page 33]
RFC 1379 Transaction TCP -- Concepts November 1992
APPENDIX A -- TIME-WAIT STATE AND THE 2-PACKET EXCHANGE
This appendix considers the implications of reducing TIME-WAIT state
delay below that given in formula [2].
An immediate consequence of this would be the requirement for the
server host to accept an initial SYN for a connection in LAST-ACK
state. Without the transaction extensions, the arrival of a new
<SYN> in LAST-ACK state looks to TCP like a half-open connection, and
TCP's rules are designed to restore correspondence by destroying the
state (through sending a RST segment) at one end or the other. We
would need to thwart this action in the case of transactions.
There are two different possible ways to further reduce TIME-WAIT
delay.
(1) Explicit Truncation of TIME-WAIT state
TIME-WAIT state could be explicitly truncated by accepting a new
sendto() request for a connection in TIME-WAIT state.
This would allow the ACK(FIN) segment to be delayed and sent
only if a timeout occurs before a new request arrives. This
allows an ideal 2-segment exchange for closely-spaced
transactions, which would restore some symmetry to the
transaction exchange. However, explicit truncation would
represent a significant change in many implementations.
It might be supposed that even greater symmetry would result if
the new request segment were a <SYN,ACK> that explicitly
acknowledges the previous reply, rather than a <SYN> that is
only an implicit acknowledgment. However, the new request
segment might arrive at B to find the server side in either
LAST-ACK or CLOSED state, depending upon whether the ACK(FIN)
had arrived. In CLOSED state, a <SYN,ACK> would not be
acceptable. Hence, if the client sent an initial <SYN,ACK>
instead of a <SYN> segment, there would be a race condition at
the server.
(2) No TIME-WAIT delay
TIME-WAIT delay could be removed entirely. This would imply
that the ACK(FIN) would always be sent (which does not of course
guarantee that it will be received). As a result, the arrival
of a new SYN in LAST-ACK state would be rare.
This choice is much simpler to implement. Its drawback is that
the server will get a false failure report if the ACK(FIN) is
Braden [Page 34]
RFC 1379 Transaction TCP -- Concepts November 1992
lost. This may not matter in practice, but it does represent a
significant change of TCP semantics. It should be noted that
reliable delivery of the reply is not an issue. The client
enter TIME-WAIT state only after the entire reply, including the
FIN bit, has been received successfully.
The server host B must be certain that a new request received in
LAST-ACK state is indeed a new SYN and not an old duplicate;
otherwise, B could falsely acknowledge a previous response that has
not in fact been delivered to A. If the TAO comparison succeeds, the
SYN must be new; however, the server has a dilemma if the TAO test
fails.
In Figure A.1, for example, the reply segment from the first
transaction has been lost; since it has not been acknowledged, it is
still in B's retransmission queue. An old duplicate request, segment
#3, arrives at B and its TAO test fails. B is in the position of
having old state it cannot discard (the retransmission queue) and
needing to build new state to pursue a 3-way handshake to validate
the new SYN. If the 3-way handshake failed, it would need to restore
the earlier LAST-ACK* state. (Compare with Figure 15 "Old Duplicate
SYN Initiates a Reset on Two Passive Sockets" in STD-007). This
would be complex and difficult to accomplish in many implementations.
TCP A (Client) TCP B (Server)
_______________ ______________
CLOSED LISTEN
1. SYN-SENT* --> <SYN,data1,FIN> --> CLOSE-WAIT*
(TAO test OK;
data1->server)
2. (lost) X<-- <SYN,ACK(FIN),data2,FIN> <-- LAST-ACK*
(old duplicate)
3. ... <SYN,data3,FIN> --> LAST-ACK*
(TAO test fail;
3-way handshake?)
Figure A.1: The Server's Dilemma
The only practical action A can taken when the TAO test fails on a
new SYN received in LAST-ACK state is to ignore the SYN, assuming it
is really an old duplicate. We must pursue the possible consequences
Braden [Page 35]
RFC 1379 Transaction TCP -- Concepts November 1992
of this action.
Section 3.1 listed four possible reasons for failure of the TAO test
on a legitimate SYN segment: (1) no cached state, (2) out-of-order
delivery of SYNs, (3) wraparound of CCgen relative to the cached
value, or (4) the M values advance too slowly. We are assuming that
there is a cached CC value at B (otherwise, the SYN cannot be
acceptable in LAST-ACK state). Wrapping the CC space is very
unlikely and probably impossible; it is difficult to imagine
circumstances which would allow the new SYN to be delivered but not
the ACK(FIN), especially given the long wraparound time of CCgen.
This leaves the problem of out-of-order delivery of two nearly-
concurrent SYNs for different ports. The second to be delivered may
have a lower CC option and thus be locked out. This can be solved by
using a new CCgen value for every retransmission of an initial SYN.
Truncation of TIME-WAIT state and acceptance of a SYN in LAST-ACK
state should take place only if there is a cached CC value for the
remote host. Otherwise, a SYN arriving in LAST-ACK state is to be
processed by normal TCP rules, which will result in a RST segment
from either A or B.
This discussion leads to a paradigm for rejecting old duplicate
segments that is different from TAO. This alternative scheme is
based upon the following:
(a) Each retransmission of an initial SYN will have a new value of
CC, as described above.
This provision takes care of reordered SYNs.
(b) A host maintains a distinct CCgen value for each remote host.
This value could easily be maintained in the same cache used for
the received CC values, e.g., as cache.CCgen[].
Once the caches are primed, it should always be true that
cache.CCgen[B] on host A is equal to cache.CC[A] on host B, and
the next transaction from A will carry a CC value exactly 1
greater. Thus, there is no problem of wraparound of the CC
value.
(c) A new SYN is acceptable if its SEG.CC > cache.CC[client],
otherwise the SYN is ignored as an old duplicate.
This alternative paradigm was not adopted because it would be a
somewhat greater perturbation of TCP rules, because it may not have
the robustness of TAO, and because all of its consequences may not be
Braden [Page 36]
RFC 1379 Transaction TCP -- Concepts November 1992
understood.
REFERENCES
[Birrell84] Birrell, A. and B. Nelson, "Implementing Remote
Procedure Calls", ACM TOCS, Vo. 2, No. 1, February 1984.
[Clark88] Clark, D., "The Design Philosophy of the Internet
Protocols", ACM SIGCOMM '88, Stanford, CA, August 1988.
[Clark89] Clark, D., Private communication, 1989.
[Garlick77] Garlick, L., R. Rom, and J. Postel, "Issues in Reliable
Host-to-Host Protocols", Proc. Second Berkeley Workshop on
Distributed Data Management and Computer Networks, May 1977.
[HR-COMM] Braden, R., Ed., "Requirements for Internet Hosts --
Communication Layers", STD-003, RFC-1122, October 1989.
[Jacobson88] Jacobson, V., "Congestion Avoidance and Control",
SIGCOMM '88, Stanford, CA., August 1988.
[Jacobson90] Jacobson, V., private communication, 1990.
[Liskov90] Liskov, B., Shrira, L., and J. Wroclawski, "Efficient
At-Most-Once Messages Based on Synchronized Clocks", ACM SIGCOMM
'90, Philadelphia, PA, September 1990.
[RFC-955] Braden, R., "Towards a Transport Service Transaction
Protocol", RFC-955, September 1985.
[RFC-1185] Jacobson, V., Braden, R., and Zhang, L., "TCP Extension
for High-Speed Paths", RFC-1185, October 1990.
[RFC-1263] O'Malley, S. and L. Peterson, "TCP Extensions Considered
Harmful", RFC-1263, University of Arizona, October 1991.
[RFC-1323] Jacobson, V., Braden, R., and Borman, D., "TCP
Extensions for High Performance, RFC-1323, February 1991.
[RFC-1337] Braden, R., "TIME-WAIT Assassination Hazards in TCP",
RFC-1337, May 1992.
[STD-007] Postel, J., "Transmission Control Protocol - DARPA
Internet Program Protocol Specification", STD-007, RFC-793,
September 1981.
Braden [Page 37]
RFC 1379 Transaction TCP -- Concepts November 1992
[TTCP-FS] Braden, R., "Transaction TCP -- Functional
Specification", Work in Progress, September 1992.
[Watson81] Watson, R., "Timer-based Mechanisms in Reliable
Transport Protocol Connection Management", Computer Networks, Vol.
5, 1981.
Security Considerations
Security issues are not discussed in this memo.
Author's Address
Bob Braden
University of Southern California
Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292
Phone: (310) 822-1511
EMail: Braden@ISI.EDU
Braden [Page 38]
|