1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451

Network Working Group C. Hopps
Request for Comments: 2992 NextHop Technologies
Category: Informational November 2000
Analysis of an EqualCost MultiPath Algorithm
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2000). All Rights Reserved.
Abstract
Equalcost multipath (ECMP) is a routing technique for routing
packets along multiple paths of equal cost. The forwarding engine
identifies paths by nexthop. When forwarding a packet the router
must decide which nexthop (path) to use. This document gives an
analysis of one method for making that decision. The analysis
includes the performance of the algorithm and the disruption caused
by changes to the set of nexthops.
1. HashThreshold
One method for determining which nexthop to use when routing with
ECMP can be called hashthreshold. The router first selects a key by
performing a hash (e.g., CRC16) over the packet header fields that
identify a flow. The N nexthops have been assigned unique regions
in the key space. The router uses the key to determine which region
and thus which nexthop to use.
As an example of hashthreshold, upon receiving a packet the router
performs a CRC16 on the packet's header fields that define the flow
(e.g., the source and destination fields of the packet), this is the
key. Say for this destination there are 4 nexthops to choose from.
Each nexthop is assigned a region in 16 bit space (the key space).
For equal usage the router may have chosen to divide it up evenly so
each region is 65536/4 or 16k large. The nexthop is chosen by
determining which region contains the key (i.e., the CRC result).
Hopps Informational [Page 1]
RFC 2992 Analysis of ECMP Algorithm November 2000
2. Analysis
There are a few concerns when choosing an algorithm for deciding
which nexthop to use. One is performance, the computational
requirements to run the algorithm. Another is disruption (i.e., the
changing of which path a flow uses). Balancing is a third concern;
however, since the algorithm's balancing characteristics are directly
related to the chosen hash function this analysis does not treat this
concern in depth.
For this analysis we will assume regions of equal size. If the
output of the hash function is uniformly distributed the distribution
of flows amongst paths will also be uniform, and so the algorithm
will properly implement ECMP. One can implement nonequalcost
multipath routing by using regions of unequal size; however, non
equalcost multipath routing is outside the scope of this document.
2.1. Performance
The performance of the hashthreshold algorithm can be broken down
into three parts: selection of regions for the nexthops, obtaining
the key and comparing the key to the regions to decide which nexthop
to use.
The algorithm doesn't specify the hash function used to obtain the
key. Its performance in this area will be exactly the performance of
the hash function. It is presumed that if this calculation proves to
be a concern it can be done in hardware parallel to other operations
that need to complete before deciding which nexthop to use.
Since regions are restricted to be of equal size the calculation of
region boundaries is trivial. Each boundary is exactly regionsize
away from the previous boundary starting from 0 for the first region.
As we will show, for equal sized regions, we don't need to store the
boundary values.
To choose the nexthop we must determine which region contains the
key. Because the regions are of equal size determining which region
contains the key is a simple division operation.
regionsize = keyspace.size / #{nexthops}
region = key / regionsize;
Thus the time required to find the nexthop is dependent on the way
the nexthops are organized in memory. The obvious use of an array
indexed by region yields O(1).
Hopps Informational [Page 2]
RFC 2992 Analysis of ECMP Algorithm November 2000
2.2. Disruption
Protocols such as TCP perform better if the path they flow along does
not change while the stream is connected. Disruption is the
measurement of how many flows have their paths changed due to some
change in the router. We measure disruption as the fraction of total
flows whose path changes in response to some change in the router.
This can become important if one or more of the paths is flapping.
For a description of disruption and how it affects protocols such as
TCP see [1].
Some algorithms such as roundrobin (i.e., upon receiving a packet
the least recently used nexthop is chosen) are disruptive regardless
of any change in the router. Clearly this is not the case with
hashthreshold. As long as the region boundaries remain unchanged
the same nexthop will be chosen for a given flow.
Because we have required regions to be equal in size the only reason
for a change in region boundaries is the addition or removal of a
nexthop. In this case the regions must all grow or shrink to fill
the key space. The analysis begins with some examples of this.
0123456701234567012345670123456701234567
++++++
 1  2  3  4  5 
+++++++++
 1  2  4  5 
+++++
0123456789012345678901234567890123456789
Figure 1. Before and after deletion of region 3
In figure 1. region 3 has been deleted. The remaining regions grow
equally and shift to compensate. In this case 1/4 of region 2 is now
in region 1, 1/2 (2/4) of region 3 is in region 2, 1/2 of region 3 is
in region 4 and 1/4 of region 4 is in region 5. Since each of the
original regions represent 1/5 of the flows, the total disruption is
1/5*(1/4 + 1/2 + 1/2 + 1/4) or 3/10.
Note that the disruption to flows when adding a region is equivalent
to that of removing a region. That is, we are considering the
fraction of total flows that changes regions when moving from N to
N1 regions, and that same fraction of flows will change when moving
from N1 to N regions.
Hopps Informational [Page 3]
RFC 2992 Analysis of ECMP Algorithm November 2000
0123456701234567012345670123456701234567
++++++
 1  2  3  4  5 
+++++++++
 1  2  3  5 
+++++
0123456789012345678901234567890123456789
Figure 2. Before and after deletion of region 4
In figure 2. region 4 has been deleted. Again the remaining regions
grow equally and shift to compensate. 1/4 of region 2 is now in
region 1, 1/2 of region 3 is in region 2, 3/4 of region 4 is in
region 3 and 1/4 of region 4 is in region 5. Since each of the
original regions represent 1/5 of the flows the, total disruption is
7/20.
To generalize, upon removing a region K the remaining N1 regions
grow to fill the 1/N space. This growth is evenly divided between
the N1 regions and so the change in size for each region is 1/N/(N
1) or 1/(N(N1)). This change in size causes nonend regions to
move. The first region grows and so the second region is shifted
towards K by the change in size of the first region. 1/(N(N1)) of
the flows from region 2 are subsumed by the change in region 1's
size. 2/(N(N1)) of the flows in region 3 are subsumed by region 2.
This is because region 2 has shifted by 1/(N(N1)) and grown by
1/(N(N1)). This continues from both ends until you reach the
regions that bordered K. The calculation for the number of flows
subsumed from the Kth region into the bordering regions accounts for
the removal of the Kth region. Thus we have the following equation.
K1 N
 i  (iK)
disruption = \  + \ 
/ (N)(N1) / (N)(N1)
 
i=1 i=K+1
We can factor 1/((N)(N1)) out as it is constant.
/ K1 N \
1    
=   \ i + \ (iK) 
(N)(N1)  / / 
\   /
1 i=K+1
Hopps Informational [Page 4]
RFC 2992 Analysis of ECMP Algorithm November 2000
We now use the the concrete formulas for the sum of integers. The
first summation is (K)(K1)/2. For the second summation notice that
we are summing the integers from 1 to NK, thus it is (NK)(NK+1)/2.
(K1)(K) + (NK)(NK+1)
= 
2(N)(N1)
Considering the summations, one can see that the least disruption is
when K is as close to half way between 1 and N as possible. This can
be proven by finding the minimum of the concrete formula for K
holding N constant. First break apart the quantities and collect.
2K*K  2K  2NK + N*N + N
= 
2(N)(N1)
K*K  K  NK N + 1
=  + 
(N)(N1) 2(N1)
Since we are minimizing for K the right side (N+1)/2(N1) is constant
as is the denominator (N)(N1) so we can drop them. To minimize we
take the derivative.
d
 (K*K  (N+1)K)
dk
= 2K  (N+1)
Which is zero when K is (N+1)/2.
The last thing to consider is that K must be an integer. When N is
odd (N+1)/2 will yield an integer, however when N is even (N+1)/2
yields an integer + 1/2. In the case, because of symmetry, we get
the least disruption when K is N/2 or N/2 + 1.
Now since the formula is quadratic with a global minimum half way
between 1 and N the maximum possible disruption must occur when edge
regions (1 and N) are removed. If K is 1 or N the formula reduces to
1/2.
The minimum possible disruption is obtained by letting K=(N+1)/2. In
this case the formula reduces to 1/4 + 1/(4*N). So the range of
possible disruption is (1/4, 1/2].
To minimize disruption we recommend adding new regions to the center
rather than the ends.
Hopps Informational [Page 5]
RFC 2992 Analysis of ECMP Algorithm November 2000
3. Comparison to other algorithms
Other algorithms exist to decide which nexthop to use. These
algorithms all have different performance and disruptive
characteristics. Of these algorithms we will only consider ones that
are not disruptive by design (i.e., if no change to the set of next
hops occurs the path a flow takes remains the same). This will
exclude roundrobin and random choice. We will look at moduloN and
highest random weight.
ModuloN is a "simpler" form of hashthreshold. Given N nexthops
the packet header fields which describe the flow are run through a
hash function. A final moduloN is applied to the output of the
hash. This result then directly maps to one of the nexthops.
ModuloN is the most disruptive of the algorithms; if a nexthop is
added or removed the disruption is (N1)/N. The performance of
ModuloN is equivalent to hashthreshold.
Highest random weight (HRW) is a comparative method similar in some
ways to hashthreshold with nonfixed sized regions. For each next
hop, the router seeds a pseudorandom number generator with the
packet header fields which describe the flow and the nexthop to
obtain a weight. The nexthop which receives the highest weight is
selected. The advantage with using HRW is that it has minimal
disruption (i.e., disruption due to adding or removing a nexthop is
always 1/N.) The disadvantage with HRW is that the nexthop
selection is more expensive than hashthreshold. A description of
HRW along with comparisons to other methods can be found in [2].
Although not used for nexthop calculation an example usage of HRW
can be found in [3].
Since each of moduloN, hashthreshold and HRW require a hash on the
packet header fields which define a flow, we can factor the
performance of the hash out of the comparison. If the hash can not
be done inexpensively (e.g., in hardware) it too must be considered
when using any of the above methods.
The lookup performance for hashthreshold, like moduloN is an
optimal O(1). HRW's lookup performance is O(N).
Disruptive behavior is the opposite of performance. HRW is best with
1/N. Hashthreshold is between 1/4 and 1/2. Finally ModuloN is
(N1)/N.
If the complexity of HRW's nexthop selection process is acceptable
we think it should be considered as an alternative to hashthreshold.
This could be the case when, for example, perflow state is kept and
thus the nexthop choice is made infrequently.
Hopps Informational [Page 6]
RFC 2992 Analysis of ECMP Algorithm November 2000
However, when HRW's nexthop selection is seen as too expensive the
obvious choice is hashthreshold as it performs as well as moduloN
and is less disruptive.
4. Security Considerations
This document is an analysis of an algorithm used to implement an
ECMP routing decision. This analysis does not directly affect the
security of the Internet Infrastructure.
5. References
[1] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
Multicast", RFC 2991, November 2000.
[2] Thaler, D. and C.V. Ravishankar, "Using NameBased Mappings to
Increase Hit Rates", IEEE/ACM Transactions on Networking,
February 1998.
[3] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S.,
Handley, M., Jacobson, V., Liu, C., Sharma, P. and L. Wei,
"Protocol Independent MulticastSparse Mode (PIMSM): Protocol
Specification", RFC 2362, June 1998.
6. Author's Address
Christian E. Hopps
NextHop Technologies, Inc.
517 W. William Street
Ann Arbor, MI 481034943
U.S.A
Phone: +1 734 936 0291
EMail: chopps@nexthop.com
Hopps Informational [Page 7]
RFC 2992 Analysis of ECMP Algorithm November 2000
7. Full Copyright Statement
Copyright (C) The Internet Society (2000). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Hopps Informational [Page 8]
