1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837
|
<pre>Internet Engineering Task Force (IETF) M. Allman
Request for Comments: 5827 ICSI
Category: Experimental K. Avrachenkov
ISSN: 2070-1721 INRIA
U. Ayesta
BCAM-IKERBASQUE and LAAS-CNRS
J. Blanton
Ohio University
P. Hurtig
Karlstad University
April 2010
<span class="h1">Early Retransmit for TCP</span>
<span class="h1">and Stream Control Transmission Protocol (SCTP)</span>
Abstract
This document proposes a new mechanism for TCP and Stream Control
Transmission Protocol (SCTP) that can be used to recover lost
segments when a connection's congestion window is small. The "Early
Retransmit" mechanism allows the transport to reduce, in certain
special circumstances, the number of duplicate acknowledgments
required to trigger a fast retransmission. This allows the transport
to use fast retransmit to recover segment losses that would otherwise
require a lengthy retransmission timeout.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for examination, experimental implementation, and
evaluation.
This document defines an Experimental Protocol for the Internet
community. This document is a product of the Internet Engineering
Task Force (IETF). It represents the consensus of the IETF
community. It has received public review and has been approved for
publication by the Internet Engineering Steering Group (IESG). Not
all documents approved by the IESG are a candidate for any level of
Internet Standard; see <a href="./rfc5741#section-2">Section 2 of RFC 5741</a>.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
<a href="http://www.rfc-editor.org/info/rfc5827">http://www.rfc-editor.org/info/rfc5827</a>.
<span class="grey">Allman, et al. Experimental [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a> and the IETF Trust's Legal
Provisions Relating to IETF Documents
(<a href="http://trustee.ietf.org/license-info">http://trustee.ietf.org/license-info</a>) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
Many researchers have studied the problems with TCP's loss recovery
[RFC793, <a href="./rfc5681">RFC5681</a>] when the congestion window is small, and they have
outlined possible mechanisms to mitigate these problems
[Mor97, BPS+98, Bal98, LK98, <a href="./rfc3150">RFC3150</a>, AA02]. SCTP's [<a href="./rfc4960" title=""Stream Control Transmission Protocol"">RFC4960</a>] loss
recovery and congestion control mechanisms are based on TCP, and
therefore the same problems impact the performance of SCTP
connections. When the transport detects a missing segment, the
connection enters a loss recovery phase. There are several variants
of the loss recovery phase depending on the TCP implementation. TCP
can use slow-start-based recovery or fast recovery [<a href="./rfc5681" title=""TCP Congestion Control"">RFC5681</a>], NewReno
[<a href="./rfc3782" title=""The NewReno Modification to TCP's Fast Recovery Algorithm"">RFC3782</a>], and loss recovery, based on selective acknowledgments
(SACKs) [RFC2018, FF96, <a href="./rfc3517">RFC3517</a>]. SCTP's loss recovery is not as
varied due to the built-in selective acknowledgments.
All of the above variants have two methods for invoking loss
recovery. First, if an acknowledgment (ACK) for a given segment is
not received in a certain amount of time, a retransmission timer
fires, and the segment is resent [RFC2988, <a href="./rfc4960">RFC4960</a>]. Second, the
"fast retransmit" algorithm resends a segment when three duplicate
<span class="grey">Allman, et al. Experimental [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
ACKs arrive at the sender [Jac88, <a href="./rfc5681">RFC5681</a>]. Duplicate ACKs are
triggered by out-of-order arrivals at the receiver. However, because
duplicate ACKs from the receiver are triggered by both segment loss
and segment reordering in the network path, the sender waits for
three duplicate ACKs in an attempt to disambiguate segment loss from
segment reordering. When the congestion window is small, it may not
be possible to generate the required number of duplicate ACKs to
trigger fast retransmit when a loss does happen.
Small congestion windows can occur in a number of situations, such
as:
(1) The connection is constrained by end-to-end congestion control
when the connection's share of the path is small, the path has a
small bandwidth-delay product, or the transport is ascertaining
the available bandwidth in the first few round-trip times of slow
start.
(2) The connection is "application limited" and has only a limited
amount of data to send. This can happen any time the application
does not produce enough data to fill the congestion window. A
particular case when all connections become application limited
is as the connection ends.
(3) The connection is limited by the receiver's advertised window.
The transport's retransmission timeout (RTO) is based on measured
round-trip times (RTT) between the sender and receiver, as specified
in [<a href="./rfc2988" title=""Computing TCP's Retransmission Timer"">RFC2988</a>] (for TCP) and [<a href="./rfc4960" title=""Stream Control Transmission Protocol"">RFC4960</a>] (for SCTP). To prevent spurious
retransmissions of segments that are only delayed and not lost, the
minimum RTO is conservatively chosen to be 1 second. Therefore, it
behooves TCP senders to detect and recover from as many losses as
possible without incurring a lengthy timeout during which the
connection remains idle. However, if not enough duplicate ACKs
arrive from the receiver, the fast retransmit algorithm is never
triggered -- this situation occurs when the congestion window is
small, if a large number of segments in a window are lost, or at the
end of a transfer as data drains from the network. For instance,
consider a congestion window of three segments' worth of data. If
one segment is dropped by the network, then at most two duplicate
ACKs will arrive at the sender. Since three duplicate ACKs are
required to trigger fast retransmit, a timeout will be required to
resend the dropped segment. Note that delayed ACKs [<a href="./rfc5681" title=""TCP Congestion Control"">RFC5681</a>] may
further reduce the number of duplicate ACKs a receiver sends.
However, we assume that receivers send immediate ACKs when there is a
gap in the received sequence space per [<a href="./rfc5681" title=""TCP Congestion Control"">RFC5681</a>].
<span class="grey">Allman, et al. Experimental [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
[BPS+98] shows that roughly 56% of retransmissions sent by a busy Web
server are sent after the RTO timer expires, while only 44% are
handled by fast retransmit. In addition, only 4% of the RTO timer-
based retransmissions could have been avoided with SACK, which has to
continue to disambiguate reordering from genuine loss. Furthermore,
[<a href="#ref-All00">All00</a>] shows that for one particular Web server, the median number
of bytes carried by a connection is less than four segments,
indicating that more than half of the connections will be forced to
rely on the RTO timer to recover from any losses that occur. Thus,
loss recovery that does not rely on the conservative RTO is likely to
be beneficial for short TCP transfers.
The limited transmit mechanism introduced in [<a href="./rfc3042" title=""Enhancing TCP's Loss Recovery Using Limited Transmit"">RFC3042</a>] and currently
codified in [<a href="./rfc5681" title=""TCP Congestion Control"">RFC5681</a>] allows a TCP sender to transmit previously
unsent data upon receipt of each of the two duplicate ACKs that
precede a fast retransmit. SCTP [<a href="./rfc4960" title=""Stream Control Transmission Protocol"">RFC4960</a>] uses SACK information to
calculate the number of outstanding segments in the network. Hence,
when the first two duplicate ACKs arrive at the sender, they will
indicate that data has left the network, and they will allow the
sender to transmit new data (if available), similar to TCP's limited
transmit algorithm. In the remainder of this document, we use
"limited transmit" to include both TCP and SCTP mechanisms for
sending in response to the first two duplicate ACKs. By sending
these two new segments, the sender is attempting to induce additional
duplicate ACKs (if appropriate), so that fast retransmit will be
triggered before the retransmission timeout expires. The sender-side
"Early Retransmit" mechanism outlined in this document covers the
case when previously unsent data is not available for transmission
(case (2) above) or cannot be transmitted due to an advertised window
limitation (case (3) above).
Note: This document is being published as an experimental RFC, as
part of the process for the TCPM working group and the IETF to assess
whether the proposed change is useful and safe in the heterogeneous
environments, including which variants of the mechanism are the most
effective. In the future, this specification may be updated and put
on the standards track if its safeness and efficacy can be
demonstrated.
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Terminology</span>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <a href="./rfc2119">RFC 2119</a> [<a href="./rfc2119" title=""Key words for use in RFCs to Indicate Requirement Levels"">RFC2119</a>].
The reader is expected to be familiar with the definitions given in
[<a href="./rfc5681" title=""TCP Congestion Control"">RFC5681</a>].
<span class="grey">Allman, et al. Experimental [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. Early Retransmit Algorithm</span>
The Early Retransmit algorithm calls for lowering the threshold for
triggering fast retransmit when the amount of outstanding data is
small and when no previously unsent data can be transmitted (such
that limited transmit could be used). Duplicate ACKs are triggered
by each arriving out-of-order segment. Therefore, fast retransmit
will not be invoked when there are less than four outstanding
segments (assuming only one segment loss in the window). However,
TCP and SCTP are not required to track the number of outstanding
segments, but rather the number of outstanding bytes or messages.
(Note that SCTP's message boundaries do not necessarily correspond to
segment boundaries.) Therefore, applying the intuitive notion of a
transport with less than four segments outstanding is more
complicated than it first appears. In <a href="#section-3.1">Section 3.1</a>, we describe a
"byte-based" variant of Early Retransmit that attempts to roughly map
the number of outstanding bytes to a number of outstanding segments
that is then used when deciding whether to trigger Early Retransmit.
In <a href="#section-3.2">Section 3.2</a>, we describe a "segment-based" variant that represents
a more precise algorithm for triggering Early Retransmit. This
precision comes at the cost of requiring additional state to be kept
by the TCP sender. In both cases, we describe SACK-based and non-
SACK-based versions of the scheme (of course, the non-SACK version
will not apply to SCTP). This document explicitly does not prefer
one variant over the other, but leaves the choice to the implementer.
<span class="h3"><a class="selflink" id="section-3.1" href="#section-3.1">3.1</a>. Byte-Based Early Retransmit</span>
A TCP or SCTP sender MAY use byte-based Early Retransmit.
Upon the arrival of an ACK, a sender employing byte-based Early
Retransmit MUST use the following two conditions to determine when an
Early Retransmit is sent:
(2.a) The amount of outstanding data (ownd) -- data sent but not yet
acknowledged -- is less than 4*SMSS bytes (as defined in
[<a href="./rfc5681" title=""TCP Congestion Control"">RFC5681</a>]).
Note that in the byte-based variant of Early Retransmit, "ownd"
is equivalent to "FlightSize" (defined in [<a href="./rfc5681" title=""TCP Congestion Control"">RFC5681</a>]). We use
different notation, because "ownd" is not consistent with
FlightSize throughout this document.
Also note that in SCTP, messages will have to be converted to
bytes to make this variant of Early Retransmit work.
<span class="grey">Allman, et al. Experimental [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
(2.b) There is either no unsent data ready for transmission at the
sender, or the advertised receive window does not permit new
segments to be transmitted.
When the above two conditions hold and a TCP connection does not
support SACK, the duplicate ACK threshold used to trigger a
retransmission MUST be reduced to:
ER_thresh = ceiling (ownd/SMSS) - 1 (1)
duplicate ACKs, where ownd is expressed in terms of bytes. We call
this reduced ACK threshold enabling "Early Retransmission".
When conditions (2.a) and (2.b) hold and a TCP connection does
support SACK or SCTP is in use, Early Retransmit MUST be used only
when "ownd - SMSS" bytes have been SACKed.
If either (or both) condition (2.a) and/or (2.b) does not hold, the
transport MUST NOT use Early Retransmit, but rather prefer the
standard mechanisms, including fast retransmit and limited transmit.
As noted above, the drawback of this byte-based variant is precision
[<a href="#ref-HB08" title=" October 2008">HB08</a>]. We illustrate this with two examples:
+ Consider a non-SACK TCP sender that uses an SMSS of 1460 bytes
and transmits three segments, each with 400 bytes of payload.
This is a case where Early Retransmit could aid loss recovery if
one segment is lost. However, in this case, ER_thresh will
become zero, per Equation (1), because the number of outstanding
bytes is a poor estimate of the number of outstanding segments.
A similar problem occurs for senders that employ SACK, as the
expression "ownd - SMSS" will become negative.
+ Next, consider a non-SACK TCP sender that uses an SMSS of
1460 bytes and transmits 10 segments, each with 400 bytes of
payload. In this case, ER_thresh will be 2 per Equation (1).
Thus, even though there are enough segments outstanding to
trigger fast retransmit with the standard duplicate ACK
threshold, Early Retransmit will be triggered. This could cause
or exacerbate performance problems caused by segment reordering
in the network.
<span class="grey">Allman, et al. Experimental [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
<span class="h3"><a class="selflink" id="section-3.2" href="#section-3.2">3.2</a>. Segment-Based Early Retransmit</span>
A TCP or SCTP sender MAY use segment-based Early Retransmit.
Upon the arrival of an ACK, a sender employing segment-based Early
Retransmit MUST use the following two conditions to determine when an
Early Retransmit is sent:
(3.a) The number of outstanding segments (oseg) -- segments sent but
not yet acknowledged -- is less than four.
(3.b) There is either no unsent data ready for transmission at the
sender, or the advertised receive window does not permit new
segments to be transmitted.
When the above two conditions hold and a TCP connection does not
support SACK, the duplicate ACK threshold used to trigger a
retransmission MUST be reduced to:
ER_thresh = oseg - 1 (2)
duplicate ACKs, where oseg represents the number of outstanding
segments. (We discuss tracking the number of outstanding segments
below.) We call this reduced ACK threshold enabling "Early
Retransmission".
When conditions (3.a) and (3.b) hold and a TCP connection does
support SACK or SCTP is in use, Early Retransmit MUST be used only
when "oseg - 1" segments have been SACKed. A segment is considered
to be SACKed when all of its data bytes (TCP) or data chunks (SCTP)
have been indicated as arrived by the receiver.
If either (or both) condition (3.a) and/or (3.b) does not hold, the
transport MUST NOT use Early Retransmit, but rather prefer the
standard mechanisms, including fast retransmit and limited transmit.
This version of Early Retransmit solves the precision issues
discussed in the previous section. As noted previously, the cost is
that the implementation will have to track segment boundaries to form
an understanding as to how many actual segments have been
transmitted, but not acknowledged. This can be done by the sender
tracking the boundaries of the three segments on the right side of
the current window (which involves tracking four sequence numbers in
TCP). This could be done by keeping a circular list of the segment
boundaries, for instance. Cumulative ACKs that do not fall within
this region indicate that at least four segments are outstanding, and
therefore Early Retransmit MUST NOT be used. When the outstanding
window becomes small enough that Early Retransmit can be invoked, a
<span class="grey">Allman, et al. Experimental [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
full understanding of the number of outstanding segments will be
available from the four sequence numbers retained. (Note: the
implicit sequence number consumed by the TCP FIN bit can also be
included in the tracking of segment boundaries.)
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. Discussion</span>
In this section, we discuss a number of issues surrounding the Early
Retransmit algorithm.
<span class="h3"><a class="selflink" id="section-4.1" href="#section-4.1">4.1</a>. SACK vs. Non-SACK</span>
The SACK variant of the Early Retransmit algorithm is preferred to
the non-SACK variant in TCP due to its robustness in the face of ACK
loss (since SACKs are sent redundantly), and due to interactions with
the delayed ACK timer (SCTP does not have a non-SACK mode and
therefore naturally supports SACK-based Early Retransmit). Consider
a flight of three segments, S1...S3, with S2 being dropped by the
network. When S1 arrives, it is in order, and so the receiver may or
may not delay the ACK, leading to two scenarios:
(A) The ACK for S1 is delayed: In this case, the arrival of S3 will
trigger an ACK to be transmitted, covering S1 (which was
previously unacknowledged). In this case, Early Retransmit
without SACK will not prevent an RTO because no duplicate ACKs
will arrive. However, with SACK, the ACK for S1 will also
include SACK information indicating that S3 has arrived at the
receiver. The sender can then invoke Early Retransmit on this
ACK because only one segment remains outstanding.
(B) The ACK for S1 is not delayed: In this case, the arrival of S1
triggers an ACK of previously unacknowledged data. The arrival
of S3 triggers a duplicate ACK (because it is out of order).
Both ACKs will cover the same segment (S1). Therefore,
regardless of whether SACK is used, Early Retransmit can be
performed by the sender (assuming no ACK loss).
<span class="h3"><a class="selflink" id="section-4.2" href="#section-4.2">4.2</a>. Segment Reordering</span>
Early Retransmit is less robust in the face of reordered segments
than when using the standard fast retransmit threshold. Research
shows that a general reduction in the number of duplicate ACKs
required to trigger fast retransmit to two (rather than three) leads
to a reduction in the ratio of good to bad retransmits by a factor of
three [<a href="#ref-Pax97">Pax97</a>]. However, this analysis did not include the additional
conditioning on the event that the ownd was smaller than four
segments and that no new data was available for transmission.
<span class="grey">Allman, et al. Experimental [Page 8]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-9" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
A number of studies have shown that network reordering is not a rare
event across some network paths. Various measurement studies have
shown that reordering along most paths is negligible, but along
certain paths can be quite prevalent [<a href="#ref-Pax97">Pax97</a>, <a href="#ref-BPS99" title="Craig Partridge">BPS99</a>, <a href="#ref-BS02" title="ACM/USENIX Internet Measurement Workshop">BS02</a>, <a href="#ref-Pir05" title=""A Theoretical Foundation, Metrics and Modeling of Packet Reordering and Methodology of Delay Modeling using Inter-packet Gaps,"">Pir05</a>].
Evaluating Early Retransmit in the face of real segment reordering is
part of the experiment we hope to instigate with this document.
<span class="h3"><a class="selflink" id="section-4.3" href="#section-4.3">4.3</a>. Worst Case</span>
Next, we note two "worst case" scenarios for Early Retransmit:
(1) Persistent reordering of segments coupled with an application
that does not constantly send data can result in large numbers of
needless retransmissions when using Early Retransmit. For
instance, consider an application that sends data two segments at
a time, followed by an idle period when no data is queued for
delivery. If the network consistently reorders the two segments,
the sender will needlessly retransmit one out of every two unique
segments transmitted when using the above algorithm (meaning that
one-third of all segments sent are needless retransmissions).
However, this would only be a problem for long-lived connections
from applications that transmit in spurts.
(2) Similar to the above, consider the case of that consist of two
segment each and always experience reordering. Just as in (1)
above, one out of every two unique data segments will be
retransmitted needlessly; therefore, one-third of the traffic
will be spurious.
Currently, this document offers no suggestion on how to mitigate the
above problems. However, the worst cases are likely pathological.
Part of the experiments that this document hopes to trigger would
involve better understanding of whether such theoretical worst-case
scenarios are prevalent in the network, and in general, to explore
the trade-off between spurious fast retransmits and the delay imposed
by the RTO. <a href="#appendix-A">Appendix A</a> does offer a survey of possible mitigations
that call for curtailing the use of Early Retransmit when it is
making poor retransmission decisions.
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Related Work</span>
There are a number of similar proposals in the literature that
attempt to mitigate the same problem that Early Retransmit addresses.
Deployment of Explicit Congestion Notification (ECN) [Flo94, <a href="./rfc3168">RFC3168</a>]
may benefit connections with small congestion window sizes [<a href="./rfc2884" title=""Performance Evaluation of Explicit Congestion Notification (ECN) in IP Networks"">RFC2884</a>].
ECN provides a method for indicating congestion to the end-host
without dropping segments. While some segment drops may still occur,
<span class="grey">Allman, et al. Experimental [Page 9]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-10" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
ECN may allow a transport to perform better with small congestion
window sizes because the sender will be required to detect less
segment loss [<a href="./rfc2884" title=""Performance Evaluation of Explicit Congestion Notification (ECN) in IP Networks"">RFC2884</a>].
[<a id="ref-Bal98">Bal98</a>] outlines another solution to the problem of having no new
segments to transmit into the network when the first two duplicate
ACKs arrive. In response to these duplicate ACKs, a TCP sender
transmits zero-byte segments to induce additional duplicate ACKs.
This method preserves the robustness of the standard fast retransmit
algorithm at the cost of injecting segments into the network that do
not deliver any data, and therefore are potentially wasting network
resources (at a time when there is a reasonable chance that the
resources are scarce).
[<a id="ref-RFC4653">RFC4653</a>] also defines an orthogonal method for altering the
duplicate ACK threshold. The mechanisms proposed in this document
decrease the duplicate ACK threshold when a small amount of data is
outstanding. Meanwhile, the mechanisms in [<a href="./rfc4653" title=""Improving the Robustness of TCP to Non-Congestion Events"">RFC4653</a>] increase the
duplicate ACK threshold (over the standard of 3) when the congestion
window is large in an effort to increase robustness to segment
reordering.
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. Security Considerations</span>
The security considerations found in [<a href="./rfc5681" title=""TCP Congestion Control"">RFC5681</a>] apply to this
document. No additional security problems have been identified with
Early Retransmit at this time.
<span class="h2"><a class="selflink" id="section-7" href="#section-7">7</a>. Acknowledgments</span>
We thank Sally Floyd for her feedback in discussions about Early
Retransmit. The notion of Early Retransmit was originally sketched
in an Internet-Draft co-authored by Sally Floyd and Hari
Balakrishnan. Armando Caro, Joe Touch, Alexander Zimmermann, and
many members of the TSVWG and TCPM working groups provided good
discussions that helped shape this document. Our thanks to all!
<span class="h2"><a class="selflink" id="section-8" href="#section-8">8</a>. References</span>
<span class="h3"><a class="selflink" id="section-8.1" href="#section-8.1">8.1</a>. Normative References</span>
[<a id="ref-RFC793">RFC793</a>] Postel, J., "Transmission Control Protocol", STD 7,
<a href="./rfc793">RFC 793</a>, September 1981.
[<a id="ref-RFC2018">RFC2018</a>] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", <a href="./rfc2018">RFC 2018</a>,
October 1996.
<span class="grey">Allman, et al. Experimental [Page 10]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-11" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
[<a id="ref-RFC2119">RFC2119</a>] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", <a href="https://www.rfc-editor.org/bcp/bcp14">BCP 14</a>, <a href="./rfc2119">RFC 2119</a>, March 1997.
[<a id="ref-RFC2883">RFC2883</a>] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
Extension to the Selective Acknowledgement (SACK) Option
for TCP", <a href="./rfc2883">RFC 2883</a>, July 2000.
[<a id="ref-RFC2988">RFC2988</a>] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", <a href="./rfc2988">RFC 2988</a>, November 2000.
[<a id="ref-RFC3042">RFC3042</a>] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
TCP's Loss Recovery Using Limited Transmit", <a href="./rfc3042">RFC 3042</a>,
January 2001.
[<a id="ref-RFC4960">RFC4960</a>] Stewart, R., Ed., "Stream Control Transmission Protocol",
<a href="./rfc4960">RFC 4960</a>, September 2007.
[<a id="ref-RFC5681">RFC5681</a>] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", <a href="./rfc5681">RFC 5681</a>, September 2009.
<span class="h3"><a class="selflink" id="section-8.2" href="#section-8.2">8.2</a>. Informative References</span>
[<a id="ref-AA02">AA02</a>] Urtzi Ayesta, Konstantin Avrachenkov, "The Effect of the
Initial Window Size and Limited Transmit Algorithm on the
Transient Behavior of TCP Transfers", In Proc. of the
15th ITC Internet Specialist Seminar, Wurzburg,
July 2002.
[<a id="ref-All00">All00</a>] Mark Allman. A Web Server's View of the Transport Layer.
ACM Computer Communication Review, October 2000.
[<a id="ref-Bal98">Bal98</a>] Hari Balakrishnan. Challenges to Reliable Data Transport
over Heterogeneous Wireless Networks. Ph.D. Thesis,
University of California at Berkeley, August 1998.
[BPS+98] Hari Balakrishnan, Venkata Padmanabhan,
Srinivasan Seshan, Mark Stemm, and Randy Katz. TCP
Behavior of a Busy Web Server: Analysis and Improvements.
Proc. IEEE INFOCOM Conf., San Francisco, CA, March 1998.
[<a id="ref-BPS99">BPS99</a>] Jon Bennett, Craig Partridge, Nicholas Shectman. Packet
Reordering is Not Pathological Network Behavior.
IEEE/ACM Transactions on Networking, December 1999.
[<a id="ref-BS02">BS02</a>] John Bellardo, Stefan Savage. Measuring Packet
Reordering, ACM/USENIX Internet Measurement Workshop,
November 2002.
<span class="grey">Allman, et al. Experimental [Page 11]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-12" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
[<a id="ref-FF96">FF96</a>] Kevin Fall, Sally Floyd. Simulation-based Comparisons of
Tahoe, Reno, and SACK TCP. ACM Computer Communication
Review, July 1996.
[<a id="ref-Flo94">Flo94</a>] Sally Floyd. TCP and Explicit Congestion Notification.
ACM Computer Communication Review, October 1994.
[<a id="ref-HB08">HB08</a>] Per Hurtig, Anna Brunstrom. Enhancing SCTP Loss
Recovery: An Experimental Evaluation of Early Retransmit.
Elsevier Computer Communications, Vol. 31(16),
October 2008, pp. 3778-3788.
[<a id="ref-Jac88">Jac88</a>] Van Jacobson. Congestion Avoidance and Control. ACM
SIGCOMM 1988.
[<a id="ref-LK98">LK98</a>] Dong Lin, H.T. Kung. TCP Fast Recovery Strategies:
Analysis and Improvements. Proc. IEEE INFOCOM Conf.,
San Francisco, CA, March 1998.
[<a id="ref-Mor97">Mor97</a>] Robert Morris. TCP Behavior with Many Flows. Proc.
Fifth IEEE International Conference on Network Protocols,
October 1997.
[<a id="ref-Pax97">Pax97</a>] Vern Paxson. End-to-End Internet Packet Dynamics. ACM
SIGCOMM, September 1997.
[<a id="ref-Pir05">Pir05</a>] N. M. Piratla, "A Theoretical Foundation, Metrics and
Modeling of Packet Reordering and Methodology of Delay
Modeling using Inter-packet Gaps," Ph.D. Dissertation,
Department of Electrical and Computer Engineering,
Colorado State University, Fort Collins, CO, Fall 2005.
[<a id="ref-RFC2884">RFC2884</a>] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of
Explicit Congestion Notification (ECN) in IP Networks",
<a href="./rfc2884">RFC 2884</a>, July 2000.
[<a id="ref-RFC3150">RFC3150</a>] Dawkins, S., Montenegro, G., Kojo, M., and V. Magret,
"End-to-end Performance Implications of Slow Links",
<a href="https://www.rfc-editor.org/bcp/bcp48">BCP 48</a>, <a href="./rfc3150">RFC 3150</a>, July 2001.
[<a id="ref-RFC3168">RFC3168</a>] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
<a href="./rfc3168">RFC 3168</a>, September 2001.
[<a id="ref-RFC3517">RFC3517</a>] Blanton, E., Allman, M., Fall, K., and L. Wang, "A
Conservative Selective Acknowledgment (SACK)-based Loss
Recovery Algorithm for TCP", <a href="./rfc3517">RFC 3517</a>, April 2003.
<span class="grey">Allman, et al. Experimental [Page 12]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-13" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
[<a id="ref-RFC3522">RFC3522</a>] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
for TCP", <a href="./rfc3522">RFC 3522</a>, April 2003.
[<a id="ref-RFC3782">RFC3782</a>] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
Modification to TCP's Fast Recovery Algorithm", <a href="./rfc3782">RFC 3782</a>,
April 2004.
[<a id="ref-RFC4653">RFC4653</a>] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton,
"Improving the Robustness of TCP to Non-Congestion
Events", <a href="./rfc4653">RFC 4653</a>, August 2006.
<span class="grey">Allman, et al. Experimental [Page 13]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-14" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
<span class="h2"><a class="selflink" id="appendix-A" href="#appendix-A">Appendix A</a>. Research Issues in Adjusting the Duplicate ACK Threshold</span>
Decreasing the number of duplicate ACKs required to trigger fast
retransmit, as suggested in <a href="#section-3">Section 3</a>, has the drawback of making
fast retransmit less robust in the face of minor network reordering.
Two egregious examples of problems caused by reordering are given in
<a href="#section-4">Section 4</a>. This appendix outlines several schemes that have been
suggested to mitigate the problems caused by Early Retransmit in the
face of segment reordering. These methods need further research
before they are suggested for general use (and current consensus is
that the cases that make Early Retransmit unnecessarily retransmit a
large amount of data are pathological, and therefore, these
mitigations are not generally required).
MITIGATION A.1: Allow a connection to use Early Retransmit as long as
the algorithm is not injecting "too much" spurious data into the
network. For instance, using the information provided by TCP's
D-SACK option [<a href="./rfc2883" title=""An Extension to the Selective Acknowledgement (SACK) Option for TCP"">RFC2883</a>] or SCTP's Duplicate Transmission Sequence
Number (Duplicate-TSN) notification, a sender can determine when
segments sent via Early Retransmit are needless. Likewise, using
Eifel [<a href="./rfc3522" title=""The Eifel Detection Algorithm for TCP"">RFC3522</a>], the sender can detect spurious Early Retransmits.
Once spurious Early Retransmits are detected, the sender can
either eliminate the use of Early Retransmit, or limit the use of
the algorithm to ensure that an acceptably small fraction of the
connection's transmissions are not spurious. For example, a
connection could stop using Early Retransmit after the first
spurious retransmit is detected.
MITIGATION A.2: If a sender cannot reliably determine whether an
Early-Retransmitted segment is spurious or not, the sender could
simply limit Early Retransmits, either to some fixed number per
connection (e.g., Early Retransmit is allowed only once per
connection), or to some small percentage of the total traffic
being transmitted.
MITIGATION A.3: Allow a connection to trigger Early Retransmit using
the criteria given in <a href="#section-3">Section 3</a>, in addition to a "small" timeout
[<a href="#ref-Pax97">Pax97</a>]. For instance, a sender may have to wait for two
duplicate ACKs and then T msec before Early Retransmit is invoked.
The added time gives reordered acknowledgments time to arrive at
the sender and avoid a needless retransmit. Designing a method
for choosing an appropriate timeout is part of the research that
would need to be involved in this scheme.
<span class="grey">Allman, et al. Experimental [Page 14]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-15" ></span>
<span class="grey"><a href="./rfc5827">RFC 5827</a> Early Retransmit for TCP and SCTP April 2010</span>
Authors' Addresses
Mark Allman
International Computer Science Institute
1947 Center Street, Suite 600
Berkeley, CA 94704-1198
USA
Phone: 440-235-1792
EMail: mallman@icir.org
<a href="http://www.icir.org/mallman/">http://www.icir.org/mallman/</a>
Konstantin Avrachenkov
INRIA
2004 route des Lucioles, B.P.93
06902, Sophia Antipolis
France
Phone: 00 33 492 38 7751
EMail: k.avrachenkov@sophia.inria.fr
<a href="http://www-sop.inria.fr/members/Konstantin.Avratchenkov/me.html">http://www-sop.inria.fr/members/Konstantin.Avratchenkov/me.html</a>
Urtzi Ayesta
BCAM-IKERBASQUE LAAS-CNRS
Bizkaia Technology Park, Building 500 7 Avenue Colonel Roche
48160 Derio 31077, Toulouse
Spain France
EMail: urtzi@laas.fr
<a href="http://www.laas.fr/~urtzi">http://www.laas.fr/~urtzi</a>
Josh Blanton
Ohio University
301 Stocker Center
Athens, OH 45701
USA
EMail: jblanton@irg.cs.ohiou.edu
Per Hurtig
Karlstad University
Department of Computer Science
Universitetsgatan 2 651 88
Karlstad
Sweden
EMail: per.hurtig@kau.se
Allman, et al. Experimental [Page 15]
</pre>
|