1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949
|
<pre>Independent Submission P. Garg, Ed.
Request for Comments: 7637 Y. Wang, Ed.
Category: Informational Microsoft
ISSN: 2070-1721 September 2015
<span class="h1">NVGRE: Network Virtualization Using Generic Routing Encapsulation</span>
Abstract
This document describes the usage of the Generic Routing
Encapsulation (GRE) header for Network Virtualization (NVGRE) in
multi-tenant data centers. Network Virtualization decouples virtual
networks and addresses from physical network infrastructure,
providing isolation and concurrency between multiple virtual networks
on the same physical network infrastructure. This document also
introduces a Network Virtualization framework to illustrate the use
cases, but the focus is on specifying the data-plane aspect of NVGRE.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This is a contribution to the RFC Series, independently of any other
RFC stream. The RFC Editor has chosen to publish this document at
its discretion and makes no statement about its value for
implementation or deployment. Documents approved for publication by
the RFC Editor are not a candidate for any level of Internet
Standard; see <a href="./rfc5741#section-2">Section 2 of RFC 5741</a>.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
<a href="http://www.rfc-editor.org/info/rfc7637">http://www.rfc-editor.org/info/rfc7637</a>.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a> and the IETF Trust's Legal
Provisions Relating to IETF Documents
(<a href="http://trustee.ietf.org/license-info">http://trustee.ietf.org/license-info</a>) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
<span class="grey">Garg & Wang Informational [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
Table of Contents
<a href="#section-1">1</a>. Introduction ....................................................<a href="#page-2">2</a>
<a href="#section-1.1">1.1</a>. Terminology ................................................<a href="#page-4">4</a>
<a href="#section-2">2</a>. Conventions Used in This Document ...............................<a href="#page-4">4</a>
<a href="#section-3">3</a>. Network Virtualization Using GRE (NVGRE) ........................<a href="#page-4">4</a>
<a href="#section-3.1">3.1</a>. NVGRE Endpoint .............................................<a href="#page-5">5</a>
<a href="#section-3.2">3.2</a>. NVGRE Frame Format .........................................<a href="#page-5">5</a>
<a href="#section-3.3">3.3</a>. Inner Tag as Defined by IEEE 802.1Q ........................<a href="#page-8">8</a>
<a href="#section-3.4">3.4</a>. Reserved VSID ..............................................<a href="#page-8">8</a>
<a href="#section-4">4</a>. NVGRE Deployment Considerations .................................<a href="#page-9">9</a>
<a href="#section-4.1">4.1</a>. ECMP Support ...............................................<a href="#page-9">9</a>
<a href="#section-4.2">4.2</a>. Broadcast and Multicast Traffic ............................<a href="#page-9">9</a>
<a href="#section-4.3">4.3</a>. Unicast Traffic ............................................<a href="#page-9">9</a>
<a href="#section-4.4">4.4</a>. IP Fragmentation ..........................................<a href="#page-10">10</a>
<a href="#section-4.5">4.5</a>. Address/Policy Management and Routing .....................<a href="#page-10">10</a>
<a href="#section-4.6">4.6</a>. Cross-Subnet, Cross-Premise Communication .................<a href="#page-10">10</a>
<a href="#section-4.7">4.7</a>. Internet Connectivity .....................................<a href="#page-12">12</a>
<a href="#section-4.8">4.8</a>. Management and Control Planes .............................<a href="#page-12">12</a>
<a href="#section-4.9">4.9</a>. NVGRE-Aware Devices .......................................<a href="#page-12">12</a>
<a href="#section-4.10">4.10</a>. Network Scalability with NVGRE ...........................<a href="#page-13">13</a>
<a href="#section-5">5</a>. Security Considerations ........................................<a href="#page-14">14</a>
<a href="#section-6">6</a>. Normative References ...........................................<a href="#page-14">14</a>
Contributors ......................................................<a href="#page-16">16</a>
Authors' Addresses ................................................<a href="#page-17">17</a>
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
Conventional data center network designs cater to largely static
workloads and cause fragmentation of network and server capacity [<a href="#ref-6" title=""VL2: A Scalable and Flexible Data Center Network"">6</a>]
[<a href="#ref-7" title=""The Cost of a Cloud: Research Problems in Data Center Networks"">7</a>]. There are several issues that limit dynamic allocation and
consolidation of capacity. Layer 2 networks use the Rapid Spanning
Tree Protocol (RSTP), which is designed to eliminate loops by
blocking redundant paths. These eliminated paths translate to wasted
capacity and a highly oversubscribed network. There are alternative
approaches such as the Transparent Interconnection of Lots of Links
(TRILL) that address this problem [<a href="#ref-13" title=""Transparent Interconnection of Lots of Links (TRILL): Problem and Applicability Statement"">13</a>].
The network utilization inefficiencies are exacerbated by network
fragmentation due to the use of VLANs for broadcast isolation. VLANs
are used for traffic management and also as the mechanism for
providing security and performance isolation among services belonging
to different tenants. The Layer 2 network is carved into smaller-
sized subnets (typically, one subnet per VLAN), with VLAN tags
configured on all the Layer 2 switches connected to server racks that
host a given tenant's services. The current VLAN limits
theoretically allow for 4,000 such subnets to be created. The size
<span class="grey">Garg & Wang Informational [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
of the broadcast domain is typically restricted due to the overhead
of broadcast traffic. The 4,000-subnet limit on VLANs is no longer
sufficient in a shared infrastructure servicing multiple tenants.
Data center operators must be able to achieve high utilization of
server and network capacity. In order to achieve efficiency, it
should be possible to assign workloads that operate in a single Layer
2 network to any server in any rack in the network. It should also
be possible to migrate workloads to any server anywhere in the
network while retaining the workloads' addresses. This can be
achieved today by stretching VLANs; however, when workloads migrate,
the network needs to be reconfigured and that is typically error
prone. By decoupling the workload's location on the LAN from its
network address, the network administrator configures the network
once, not every time a service migrates. This decoupling enables any
server to become part of any server resource pool.
The following are key design objectives for next-generation data
centers:
a) location-independent addressing
b) the ability to a scale the number of logical Layer 2 / Layer 3
networks, irrespective of the underlying physical topology or
the number of VLANs
c) preserving Layer 2 semantics for services and allowing them to
retain their addresses as they move within and across data
centers
d) providing broadcast isolation as workloads move around without
burdening the network control plane
This document describes use of the Generic Routing Encapsulation
(GRE) header [<a href="#ref-3" title=""Generic Routing Encapsulation (GRE)"">3</a>] [<a href="#ref-4" title=""Key and Sequence Number Extensions to GRE"">4</a>] for network virtualization. Network
virtualization decouples a virtual network from the underlying
physical network infrastructure by virtualizing network addresses.
Combined with a management and control plane for the virtual-to-
physical mapping, network virtualization can enable flexible virtual
machine placement and movement and provide network isolation for a
multi-tenant data center.
Network virtualization enables customers to bring their own address
spaces into a multi-tenant data center, while the data center
administrators can place the customer virtual machines anywhere in
the data center without reconfiguring their network switches or
routers, irrespective of the customer address spaces.
<span class="grey">Garg & Wang Informational [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
<span class="h3"><a class="selflink" id="section-1.1" href="#section-1.1">1.1</a>. Terminology</span>
Please refer to RFCs 7364 [<a href="#ref-10" title=""Problem Statement: Overlays for Network Virtualization"">10</a>] and 7365 [<a href="#ref-11" title=""Framework for Data Center (DC) Network Virtualization"">11</a>] for more formal
definitions of terminology. The following terms are used in this
document.
Customer Address (CA): This is the virtual IP address assigned and
configured on the virtual Network Interface Controller (NIC) within
each VM. This is the only address visible to VMs and applications
running within VMs.
Network Virtualization Edge (NVE): This is an entity that performs
the network virtualization encapsulation and decapsulation.
Provider Address (PA): This is the IP address used in the physical
network. PAs are associated with VM CAs through the network
virtualization mapping policy.
Virtual Machine (VM): This is an instance of an OS running on top of
the hypervisor over a physical machine or server. Multiple VMs can
share the same physical server via the hypervisor, yet are completely
isolated from each other in terms of CPU usage, storage, and other OS
resources.
Virtual Subnet Identifier (VSID): This is a 24-bit ID that uniquely
identifies a virtual subnet or virtual Layer 2 broadcast domain.
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Conventions Used in This Document</span>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <a href="./rfc2119">RFC 2119</a> [<a href="#ref-1" title=""Key words for use in RFCs to Indicate Requirement Levels"">1</a>].
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lowercase uses of these words are not to be
interpreted as carrying the significance defined in <a href="./rfc2119">RFC 2119</a>.
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. Network Virtualization Using GRE (NVGRE)</span>
This section describes Network Virtualization using GRE (NVGRE).
Network virtualization involves creating virtual Layer 2 topologies
on top of a physical Layer 3 network. Connectivity in the virtual
topology is provided by tunneling Ethernet frames in GRE over IP over
the physical network.
In NVGRE, every virtual Layer 2 network is associated with a 24-bit
identifier, called a Virtual Subnet Identifier (VSID). A VSID is
carried in an outer header as defined in <a href="#section-3.2">Section 3.2</a>. This allows
<span class="grey">Garg & Wang Informational [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
unique identification of a tenant's virtual subnet to various devices
in the network. A 24-bit VSID supports up to 16 million virtual
subnets in the same management domain, in contrast to only 4,000 that
is achievable with VLANs. Each VSID represents a virtual Layer 2
broadcast domain, which can be used to identify a virtual subnet of a
given tenant. To support multi-subnet virtual topology, data center
administrators can configure routes to facilitate communication
between virtual subnets of the same tenant.
GRE is a Proposed Standard from the IETF [<a href="#ref-3" title=""Generic Routing Encapsulation (GRE)"">3</a>] [<a href="#ref-4" title=""Key and Sequence Number Extensions to GRE"">4</a>] and provides a way
for encapsulating an arbitrary protocol over IP. NVGRE leverages the
GRE header to carry VSID information in each packet. The VSID
information in each packet can be used to build multi-tenant-aware
tools for traffic analysis, traffic inspection, and monitoring.
The following sections detail the packet format for NVGRE; describe
the functions of an NVGRE endpoint; illustrate typical traffic flow
both within and across data centers; and discuss address/policy
management, and deployment considerations.
<span class="h3"><a class="selflink" id="section-3.1" href="#section-3.1">3.1</a>. NVGRE Endpoint</span>
NVGRE endpoints are the ingress/egress points between the virtual and
the physical networks. The NVGRE endpoints are the NVEs as defined
in the Network Virtualization over Layer 3 (NVO3) Framework document
[<a href="#ref-11" title=""Framework for Data Center (DC) Network Virtualization"">11</a>]. Any physical server or network device can be an NVGRE
endpoint. One common deployment is for the endpoint to be part of a
hypervisor. The primary function of this endpoint is to
encapsulate/decapsulate Ethernet data frames to and from the GRE
tunnel, ensure Layer 2 semantics, and apply isolation policy scoped
on VSID. The endpoint can optionally participate in routing and
function as a gateway in the virtual topology. To encapsulate an
Ethernet frame, the endpoint needs to know the location information
for the destination address in the frame. This information can be
provisioned via a management plane or obtained via a combination of
control-plane distribution or data-plane learning approaches. This
document assumes that the location information, including VSID, is
available to the NVGRE endpoint.
<span class="h3"><a class="selflink" id="section-3.2" href="#section-3.2">3.2</a>. NVGRE Frame Format</span>
The GRE header format as specified in RFCs 2784 [<a href="#ref-3" title=""Generic Routing Encapsulation (GRE)"">3</a>] and 2890 [<a href="#ref-4" title=""Key and Sequence Number Extensions to GRE"">4</a>] is
used for communication between NVGRE endpoints. NVGRE leverages the
Key extension specified in <a href="./rfc2890">RFC 2890</a> [<a href="#ref-4" title=""Key and Sequence Number Extensions to GRE"">4</a>] to carry the VSID. The
packet format for Layer 2 encapsulation in GRE is shown in Figure 1.
<span class="grey">Garg & Wang Informational [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
Outer Ethernet Header:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Outer) Destination MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|(Outer)Destination MAC Address | (Outer)Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Outer) Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethertype 0x0800 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Outer IPv4 Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| HL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol 0x2F | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Outer) Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Outer) Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
GRE Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Subnet ID (VSID) | FlowID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Inner Ethernet Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Inner) Destination MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|(Inner)Destination MAC Address | (Inner)Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Inner) Source MAC Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethertype 0x0800 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
<span class="grey">Garg & Wang Informational [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
Inner IPv4 Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| HL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Original IP Payload |
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: GRE Encapsulation Frame Format
Note: HL stands for Header Length.
The outer/delivery headers include the outer Ethernet header and the
outer IP header:
o The outer Ethernet header: The source Ethernet address in the
outer frame is set to the MAC address associated with the NVGRE
endpoint. The destination endpoint may or may not be on the same
physical subnet. The destination Ethernet address is set to the
MAC address of the next-hop IP address for the destination NVE.
The outer VLAN tag information is optional and can be used for
traffic management and broadcast scalability on the physical
network.
o The outer IP header: Both IPv4 and IPv6 can be used as the
delivery protocol for GRE. The IPv4 header is shown for
illustrative purposes. Henceforth, the IP address in the outer
frame is referred to as the Provider Address (PA). There can be
one or more PA associated with an NVGRE endpoint, with policy
controlling the choice of which PA to use for a given Customer
Address (CA) for a customer VM.
In the GRE header:
o The C (Checksum Present) and S (Sequence Number Present) bits in
the GRE header MUST be zero.
<span class="grey">Garg & Wang Informational [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
o The K (Key Present) bit in the GRE header MUST be set to one. The
32-bit Key field in the GRE header is used to carry the Virtual
Subnet ID (VSID) and the FlowID:
- Virtual Subnet ID (VSID): This is a 24-bit value that is used
to identify the NVGRE-based Virtual Layer 2 Network.
- FlowID: This is an 8-bit value that is used to provide per-flow
entropy for flows in the same VSID. The FlowID MUST NOT be
modified by transit devices. The encapsulating NVE SHOULD
provide as much entropy as possible in the FlowID. If a FlowID
is not generated, it MUST be set to all zeros.
o The Protocol Type field in the GRE header is set to 0x6558
(Transparent Ethernet Bridging) [<a href="#ref-2" title=""IEEE 802 Numbers"">2</a>].
In the inner headers (headers of the GRE payload):
o The inner Ethernet frame comprises an inner Ethernet header
followed by optional inner IP header, followed by the IP payload.
The inner frame could be any Ethernet data frame not just IP.
Note that the inner Ethernet frame's Frame Check Sequence (FCS) is
not encapsulated.
o For illustrative purposes, IPv4 headers are shown as the inner IP
headers, but IPv6 headers may be used. Henceforth, the IP address
contained in the inner frame is referred to as the Customer
Address (CA).
<span class="h3"><a class="selflink" id="section-3.3" href="#section-3.3">3.3</a>. Inner Tag as Defined by IEEE 802.1Q</span>
The inner Ethernet header of NVGRE MUST NOT contain the tag as
defined by IEEE 802.1Q [<a href="#ref-5" title=""IEEE Standard for Local and metropolitan area networks--Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks"">5</a>]. The encapsulating NVE MUST remove any
existing IEEE 802.1Q tag before encapsulation of the frame in NVGRE.
A decapsulating NVE MUST drop the frame if the inner Ethernet frame
contains an IEEE 802.1Q tag.
<span class="h3"><a class="selflink" id="section-3.4" href="#section-3.4">3.4</a>. Reserved VSID</span>
The VSID range from 0-0xFFF is reserved for future use.
The VSID 0xFFFFFF is reserved for vendor-specific NVE-to-NVE
communication. The sender NVE SHOULD verify the receiver NVE's
vendor before sending a packet using this VSID; however, such a
verification mechanism is out of scope of this document.
Implementations SHOULD choose a mechanism that meets their
requirements.
<span class="grey">Garg & Wang Informational [Page 8]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-9" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. NVGRE Deployment Considerations</span>
<span class="h3"><a class="selflink" id="section-4.1" href="#section-4.1">4.1</a>. ECMP Support</span>
Equal-Cost Multipath (ECMP) may be used to provide load balancing.
If ECMP is used, it is RECOMMENDED that the ECMP hash is calculated
either using the outer IP frame fields and entire Key field (32 bits)
or the inner IP and transport frame fields.
<span class="h3"><a class="selflink" id="section-4.2" href="#section-4.2">4.2</a>. Broadcast and Multicast Traffic</span>
To support broadcast and multicast traffic inside a virtual subnet,
one or more administratively scoped multicast addresses [<a href="#ref-8" title=""IP Version 6 Addressing Architecture"">8</a>] [<a href="#ref-9" title=""Administratively Scoped IP Multicast"">9</a>] can
be assigned for the VSID. All multicast or broadcast traffic
originating from within a VSID is encapsulated and sent to the
assigned multicast address. From an administrative standpoint, it is
possible for network operators to configure a PA multicast address
for each multicast address that is used inside a VSID; this
facilitates optimal multicast handling. Depending on the hardware
capabilities of the physical network devices and the physical network
architecture, multiple virtual subnets may use the same physical IP
multicast address.
Alternatively, based upon the configuration at the NVE, broadcast and
multicast in the virtual subnet can be supported using N-way unicast.
In N-way unicast, the sender NVE would send one encapsulated packet
to every NVE in the virtual subnet. The sender NVE can encapsulate
and send the packet as described in <a href="#section-4.3">Section 4.3</a> ("Unicast Traffic").
This alleviates the need for multicast support in the physical
network.
<span class="h3"><a class="selflink" id="section-4.3" href="#section-4.3">4.3</a>. Unicast Traffic</span>
The NVGRE endpoint encapsulates a Layer 2 packet in GRE using the
source PA associated with the endpoint with the destination PA
corresponding to the location of the destination endpoint. As
outlined earlier, there can be one or more PAs associated with an
endpoint and policy will control which ones get used for
communication. The encapsulated GRE packet is bridged and routed
normally by the physical network to the destination PA. Bridging
uses the outer Ethernet encapsulation for scope on the LAN. The only
requirement is bidirectional IP connectivity from the underlying
physical network. On the destination, the NVGRE endpoint
decapsulates the GRE packet to recover the original Layer 2 frame.
Traffic flows similarly on the reverse path.
<span class="grey">Garg & Wang Informational [Page 9]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-10" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
<span class="h3"><a class="selflink" id="section-4.4" href="#section-4.4">4.4</a>. IP Fragmentation</span>
<a href="./rfc2003#section-5.1">Section 5.1 of RFC 2003</a> [<a href="#ref-12" title=""IP Encapsulation within IP"">12</a>] specifies mechanisms for handling
fragmentation when encapsulating IP within IP. The subset of
mechanisms NVGRE selects are intended to ensure that NVGRE-
encapsulated frames are not fragmented after encapsulation en route
to the destination NVGRE endpoint and that traffic sources can
leverage Path MTU discovery.
A sender NVE MUST NOT fragment NVGRE packets. A receiver NVE MAY
discard fragmented NVGRE packets. It is RECOMMENDED that the MTU of
the physical network accommodates the larger frame size due to
encapsulation. Path MTU or configuration via control plane can be
used to meet this requirement.
<span class="h3"><a class="selflink" id="section-4.5" href="#section-4.5">4.5</a>. Address/Policy Management and Routing</span>
Address acquisition is beyond the scope of this document and can be
obtained statically, dynamically, or using stateless address
autoconfiguration. CA and PA space can be either IPv4 or IPv6. In
fact, the address families don't have to match; for example, a CA can
be IPv4 while the PA is IPv6, and vice versa.
<span class="h3"><a class="selflink" id="section-4.6" href="#section-4.6">4.6</a>. Cross-Subnet, Cross-Premise Communication</span>
One application of this framework is that it provides a seamless path
for enterprises looking to expand their virtual machine hosting
capabilities into public clouds. Enterprises can bring their entire
IP subnet(s) and isolation policies, thus making the transition to or
from the cloud simpler. It is possible to move portions of an IP
subnet to the cloud; however, that requires additional configuration
on the enterprise network and is not discussed in this document.
Enterprises can continue to use existing communications models like
site-to-site VPN to secure their traffic.
A VPN gateway is used to establish a secure site-to-site tunnel over
the Internet, and all the enterprise services running in virtual
machines in the cloud use the VPN gateway to communicate back to the
enterprise. For simplicity, we use a VPN gateway configured as a VM
(shown in Figure 2) to illustrate cross-subnet, cross-premise
communication.
<span class="grey">Garg & Wang Informational [Page 10]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-11" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
+-----------------------+ +-----------------------+
| Server 1 | | Server 2 |
| +--------+ +--------+ | | +-------------------+ |
| | VM1 | | VM2 | | | | VPN Gateway | |
| | IP=CA1 | | IP=CA2 | | | | Internal External| |
| | | | | | | | IP=CAg IP=GAdc | |
| +--------+ +--------+ | | +-------------------+ |
| Hypervisor | | | Hypervisor| ^ |
+-----------------------+ +-------------------:---+
| IP=PA1 | IP=PA4 | :
| | | :
| +-------------------------+ | : VPN
+-----| Layer 3 Network |------+ : Tunnel
+-------------------------+ :
| :
+-----------------------------------------------:--+
| : |
| Internet : |
| : |
+-----------------------------------------------:--+
| v
| +-------------------+
| | VPN Gateway |
|---| |
IP=GAcorp| External IP=GAcorp|
+-------------------+
|
+-----------------------+
| Corp Layer 3 Network |
| (In CA Space) |
+-----------------------+
|
+---------------------------+
| Server X |
| +----------+ +----------+ |
| | Corp VMe1| | Corp VMe2| |
| | IP=CAe1 | | IP=CAe2 | |
| +----------+ +----------+ |
| Hypervisor |
+---------------------------+
Figure 2: Cross-Subnet, Cross-Premise Communication
The packet flow is similar to the unicast traffic flow between VMs;
the key difference in this case is that the packet needs to be sent
to a VPN gateway before it gets forwarded to the destination. As
part of routing configuration in the CA space, a per-tenant VPN
gateway is provisioned for communication back to the enterprise. The
<span class="grey">Garg & Wang Informational [Page 11]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-12" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
example illustrates an outbound connection between VM1 inside the
data center and VMe1 inside the enterprise network. When the
outbound packet from CA1 to CAe1 reaches the hypervisor on Server 1,
the NVE in Server 1 can perform the equivalent of a route lookup on
the packet. The cross-premise packet will match the default gateway
rule, as CAe1 is not part of the tenant virtual network in the data
center. The virtualization policy will indicate the packet to be
encapsulated and sent to the PA of the tenant VPN gateway (PA4)
running as a VM on Server 2. The packet is decapsulated on Server 2
and delivered to the VM gateway. The gateway in turn validates and
sends the packet on the site-to-site VPN tunnel back to the
enterprise network. As the communication here is external to the
data center, the PA address for the VPN tunnel is globally routable.
The outer header of this packet is sourced from GAdc destined to
GAcorp. This packet is routed through the Internet to the enterprise
VPN gateway, which is the other end of the site-to-site tunnel; at
that point, the VPN gateway decapsulates the packet and sends it
inside the enterprise where the CAe1 is routable on the network. The
reverse path is similar once the packet reaches the enterprise VPN
gateway.
<span class="h3"><a class="selflink" id="section-4.7" href="#section-4.7">4.7</a>. Internet Connectivity</span>
To enable connectivity to the Internet, an Internet gateway is needed
that bridges the virtualized CA space to the public Internet address
space. The gateway needs to perform translation between the
virtualized world and the Internet. For example, the NVGRE endpoint
can be part of a load balancer or a NAT that replaces the VPN Gateway
on Server 2 shown in Figure 2.
<span class="h3"><a class="selflink" id="section-4.8" href="#section-4.8">4.8</a>. Management and Control Planes</span>
There are several protocols that can manage and distribute policy;
however, it is outside the scope of this document. Implementations
SHOULD choose a mechanism that meets their scale requirements.
<span class="h3"><a class="selflink" id="section-4.9" href="#section-4.9">4.9</a>. NVGRE-Aware Devices</span>
One example of a typical deployment consists of virtualized servers
deployed across multiple racks connected by one or more layers of
Layer 2 switches, which in turn may be connected to a Layer 3 routing
domain. Even though routing in the physical infrastructure will work
without any modification with NVGRE, devices that perform specialized
processing in the network need to be able to parse GRE to get access
to tenant-specific information. Devices that understand and parse
the VSID can provide rich multi-tenant-aware services inside the data
center. As outlined earlier, it is imperative to exploit multiple
paths inside the network through techniques such as ECMP. The Key
<span class="grey">Garg & Wang Informational [Page 12]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-13" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
field (a 32-bit field, including both the VSID and the optional
FlowID) can provide additional entropy to the switches to exploit
path diversity inside the network. A diverse ecosystem is expected
to emerge as more and more devices become multi-tenant aware. In the
interim, without requiring any hardware upgrades, there are
alternatives to exploit path diversity with GRE by associating
multiple PAs with NVGRE endpoints with policy controlling the choice
of which PA to use.
It is expected that communication can span multiple data centers and
also cross the virtual/physical boundary. Typical scenarios that
require virtual-to-physical communication include access to storage
and databases. Scenarios demanding lossless Ethernet functionality
may not be amenable to NVGRE, as traffic is carried over an IP
network. NVGRE endpoints mediate between the network-virtualized and
non-network-virtualized environments. This functionality can be
incorporated into Top-of-Rack switches, storage appliances, load
balancers, routers, etc., or built as a stand-alone appliance.
It is imperative to consider the impact of any solution on host
performance. Today's server operating systems employ sophisticated
acceleration techniques such as checksum offload, Large Send Offload
(LSO), Receive Segment Coalescing (RSC), Receive Side Scaling (RSS),
Virtual Machine Queue (VMQ), etc. These technologies should become
NVGRE aware. IPsec Security Associations (SAs) can be offloaded to
the NIC so that computationally expensive cryptographic operations
are performed at line rate in the NIC hardware. These SAs are based
on the IP addresses of the endpoints. As each packet on the wire
gets translated, the NVGRE endpoint SHOULD intercept the offload
requests and do the appropriate address translation. This will
ensure that IPsec continues to be usable with network virtualization
while taking advantage of hardware offload capabilities for improved
performance.
<span class="h3"><a class="selflink" id="section-4.10" href="#section-4.10">4.10</a>. Network Scalability with NVGRE</span>
One of the key benefits of using NVGRE is the IP address scalability
and in turn MAC address table scalability that can be achieved. An
NVGRE endpoint can use one PA to represent multiple CAs. This lowers
the burden on the MAC address table sizes at the Top-of-Rack
switches. One obvious benefit is in the context of server
virtualization, which has increased the demands on the network
infrastructure. By embedding an NVGRE endpoint in a hypervisor, it
is possible to scale significantly. This framework enables location
information to be preconfigured inside an NVGRE endpoint, thus
allowing broadcast ARP traffic to be proxied locally. This approach
can scale to large-sized virtual subnets. These virtual subnets can
be spread across multiple Layer 3 physical subnets. It allows
<span class="grey">Garg & Wang Informational [Page 13]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-14" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
workloads to be moved around without imposing a huge burden on the
network control plane. By eliminating most broadcast traffic and
converting others to multicast, the routers and switches can function
more optimally by building efficient multicast trees. By using
server and network capacity efficiently, it is possible to drive down
the cost of building and managing data centers.
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Security Considerations</span>
This proposal extends the Layer 2 subnet across the data center and
increases the scope for spoofing attacks. Mitigations of such
attacks are possible with authentication/encryption using IPsec or
any other IP-based mechanism. The control plane for policy
distribution is expected to be secured by using any of the existing
security protocols. Further management traffic can be isolated in a
separate subnet/VLAN.
The checksum in the GRE header is not supported. The mitigation of
this is to deploy an NVGRE-based solution in a network that provides
error detection along the NVGRE packet path, for example, using
Ethernet Cyclic Redundancy Check (CRC) or IPsec or any other error
detection mechanism.
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. Normative References</span>
[<a id="ref-1">1</a>] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", <a href="https://www.rfc-editor.org/bcp/bcp14">BCP 14</a>, <a href="./rfc2119">RFC 2119</a>, DOI 10.17487/RFC2119, March 1997,
<<a href="http://www.rfc-editor.org/info/rfc2119">http://www.rfc-editor.org/info/rfc2119</a>>.
[<a id="ref-2">2</a>] IANA, "IEEE 802 Numbers",
<<a href="http://www.iana.org/assignments/ieee-802-numbers">http://www.iana.org/assignments/ieee-802-numbers</a>>.
[<a id="ref-3">3</a>] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina,
"Generic Routing Encapsulation (GRE)", <a href="./rfc2784">RFC 2784</a>,
DOI 10.17487/RFC2784, March 2000,
<<a href="http://www.rfc-editor.org/info/rfc2784">http://www.rfc-editor.org/info/rfc2784</a>>.
[<a id="ref-4">4</a>] Dommety, G., "Key and Sequence Number Extensions to GRE",
<a href="./rfc2890">RFC 2890</a>, DOI 10.17487/RFC2890, September 2000,
<<a href="http://www.rfc-editor.org/info/rfc2890">http://www.rfc-editor.org/info/rfc2890</a>>.
[<a id="ref-5">5</a>] IEEE, "IEEE Standard for Local and metropolitan area
networks--Media Access Control (MAC) Bridges and Virtual Bridged
Local Area Networks", IEEE Std 802.1Q.
[<a id="ref-6">6</a>] Greenberg, A., et al., "VL2: A Scalable and Flexible Data Center
Network", Communications of the ACM,
DOI 10.1145/1897852.1897877, 2011.
<span class="grey">Garg & Wang Informational [Page 14]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-15" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
[<a id="ref-7">7</a>] Greenberg, A., et al., "The Cost of a Cloud: Research Problems
in Data Center Networks", ACM SIGCOMM Computer Communication
Review, DOI 10.1145/1496091.1496103, 2009.
[<a id="ref-8">8</a>] Hinden, R. and S. Deering, "IP Version 6 Addressing
Architecture", <a href="./rfc4291">RFC 4291</a>, DOI 10.17487/RFC4291, February 2006,
<<a href="http://www.rfc-editor.org/info/rfc4291">http://www.rfc-editor.org/info/rfc4291</a>>.
[<a id="ref-9">9</a>] Meyer, D., "Administratively Scoped IP Multicast", <a href="https://www.rfc-editor.org/bcp/bcp23">BCP 23</a>,
<a href="./rfc2365">RFC 2365</a>, DOI 10.17487/RFC2365, July 1998,
<<a href="http://www.rfc-editor.org/info/rfc2365">http://www.rfc-editor.org/info/rfc2365</a>>.
[<a id="ref-10">10</a>] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., Kreeger,
L., and M. Napierala, "Problem Statement: Overlays for Network
Virtualization", <a href="./rfc7364">RFC 7364</a>, DOI 10.17487/RFC7364, October 2014,
<<a href="http://www.rfc-editor.org/info/rfc7364">http://www.rfc-editor.org/info/rfc7364</a>>.
[<a id="ref-11">11</a>] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Rekhter,
"Framework for Data Center (DC) Network Virtualization",
<a href="./rfc7365">RFC 7365</a>, DOI 10.17487/RFC7365, October 2014,
<<a href="http://www.rfc-editor.org/info/rfc7365">http://www.rfc-editor.org/info/rfc7365</a>>.
[<a id="ref-12">12</a>] Perkins, C., "IP Encapsulation within IP", <a href="./rfc2003">RFC 2003</a>,
DOI 10.17487/RFC2003, October 1996,
<<a href="http://www.rfc-editor.org/info/rfc2003">http://www.rfc-editor.org/info/rfc2003</a>>.
[<a id="ref-13">13</a>] Touch, J. and R. Perlman, "Transparent Interconnection of Lots
of Links (TRILL): Problem and Applicability Statement",
<a href="./rfc5556">RFC 5556</a>, DOI 10.17487/RFC5556, May 2009,
<<a href="http://www.rfc-editor.org/info/rfc5556">http://www.rfc-editor.org/info/rfc5556</a>>.
<span class="grey">Garg & Wang Informational [Page 15]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-16" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
Contributors
Murari Sridharan
Microsoft Corporation
1 Microsoft Way
Redmond, WA 98052
United States
Email: muraris@microsoft.com
Albert Greenberg
Microsoft Corporation
1 Microsoft Way
Redmond, WA 98052
United States
Email: albert@microsoft.com
Narasimhan Venkataramiah
Microsoft Corporation
1 Microsoft Way
Redmond, WA 98052
United States
Email: navenkat@microsoft.com
Kenneth Duda
Arista Networks, Inc.
5470 Great America Pkwy
Santa Clara, CA 95054
United States
Email: kduda@aristanetworks.com
Ilango Ganga
Intel Corporation
2200 Mission College Blvd.
M/S: SC12-325
Santa Clara, CA 95054
United States
Email: ilango.s.ganga@intel.com
Geng Lin
Google
1600 Amphitheatre Parkway
Mountain View, CA 94043
United States
Email: genglin@google.com
<span class="grey">Garg & Wang Informational [Page 16]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-17" ></span>
<span class="grey"><a href="./rfc7637">RFC 7637</a> NVGRE September 2015</span>
Mark Pearson
Hewlett-Packard Co.
8000 Foothills Blvd.
Roseville, CA 95747
United States
Email: mark.pearson@hp.com
Patricia Thaler
Broadcom Corporation
3151 Zanker Road
San Jose, CA 95134
United States
Email: pthaler@broadcom.com
Chait Tumuluri
Emulex Corporation
3333 Susan Street
Costa Mesa, CA 92626
United States
Email: chait@emulex.com
Authors' Addresses
Pankaj Garg (editor)
Microsoft Corporation
1 Microsoft Way
Redmond, WA 98052
United States
Email: pankajg@microsoft.com
Yu-Shun Wang (editor)
Microsoft Corporation
1 Microsoft Way
Redmond, WA 98052
United States
Email: yushwang@microsoft.com
Garg & Wang Informational [Page 17]
</pre>
|