1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
|
# What is ikvswitch
## What this package does
This package emulates 3 racks of a datacenter, and its matching
switch infrastructure with a BGP-2-the-host setup.
Its goal is to make it possible to deploy VMs the same way as one
would in a real production environment, with IPv6 un-numbered
connectivity between the spine and the leaf switches in the racks,
and between the leaf switches and the servers.
Each rack has 2 switches, each of them connected to 2 spines over
BGP, using FRR and the bgp-2-the-host over IPv6 un-numbered routing.
The number of racks is limited to 3, as this is enough for a full
HA redundancy, however, the number of ports in leaf switches (ie:
the number of U in your virtual racks) is configurable and unlimited
(the only limits are those of the Linux kernel).
## Designed to let your host server configuration untouched
To avoid making any (possibly distrubing) changes to the host
networking, there's an "internet" VM that is connected to the host
using a very simple L2 connectivity. The only thing that is added
to the host server for routing, is a simple SNAT rule:
```
iptables -t nat -A POSTROUTING -s 192.168.96.0/19 ! -d 192.168.96.0/19 -j SNAT --to-source ${MY_IP}
```
Therefore, if your test server is using BGP-to-the-host itself for
its outbound connectivity (or any other type of setup), there's no
need to add a special configuration to your local BGP daemon, or
any complex routing rules. The above iptables rules (and that rule
only) is enough for a full bi-directional connectivity from your
host server to all switches and hosted VMs.
## Designed for very low number of dependencies
There are a number of other tools designed to setup complex
networking setups, which are probably best fit for testing
a datacenter infrastructure. However, these are often non-free,
and bring a lot of dependencies that need to be installed in
your host server.
Instead, ikvswitch approach is to have a very low amount of
dependency packages. Basically, it needs Qemu, a few networking
utility like bridge-utils, iptables and so on, openstack-debian-images
to create the operating system images of your switches, and that's
about it.
## Designed for application developers, not for network engineers
The goal of ikvswitch is not to help designing a network infrastructure,
but rather to help developers test clustering application (like
OpenStack, but maybe something else...) using a "standard"
bgp-to-the-host L3 network fabric. The only thing that one
needs to do is to "plug VMs in the virtual racks" in a kind
of standard KVM way, knowing that ikvswitch will provide
BGP routing and DHCP.
## Network schema of the virtual infrastructure
What's below is probably the easiest way to understand what the
heck ikvswitch does.
```
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ host server ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
│ ikhn ((dummy net) host nick)
│
│ ikhb (host bridge) doing an L2 connectivity to the host server
│ 192.168.96.1
│
│ ikht (host (internet VM) tap)
│ 192.168.96.2
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ internet ┃
┃ 192.168.98.1 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
│ ikin1 │ ikin2 (internet (VM) nic (TAP))
│ │
│ ikib1 │ ikib2 (internet bridge)
│ │
│ ikit1 │ ikit2 (internet (VM) TAP)
┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓
┃1 ┃ ┃1 ┃
┃ spine1 ┃ ┃ spine 2 ┃
┃192.168.98.2┃ ┃192.168.98.3┃
┃2 3 4 5 6 7 ┃ ┃2 3 4 5 6 7 ┃
┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛
│ │ │ │ │ │ ╭─────────┘ │ │ │ │ │ ifnames (spine tap): ikstXY, where X is 1 (for spine 1) or 2, and Y 2 to 7
│ │ │ │ │ │ │ │ │ │ │ │
│ └─│─│─│─│───────────┐ │ └─│─│─│─────────────┐ brige names: iksbXY, where X is 1 or 2, and Y 2 to 7
│ │ │ │ │ │ │ │ │ │ └─────────────│─────────────────────────────────────────────────────┐
│ │ │ │ │ │ │ │ │ └───────────────│──────────────────────────────────┐ │
│ └─│─│─│─────│─────│─────│───│────────┐ │ │ │
│ │ │ │ │ │ │ └────────│────────│─────────────────┐ │ │
│ └─│─│─────│─────│─────│────────────│────────│────────┐ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ └─│─────│─────│─────│────────────│────────│────────│────────│────────┐ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ └─────│─────│─────│────────────│────────│────────│────────│────────│───────│─────────┐ │
│ │ │ │ │ │ │ │ │ │ │ │
╭--│---------------│-----│-----│-------┐ ╭--│--------│--------│--------│---┐ ╭--│-------│---------│--------│---┐
| │ │ │ │ | | │ │ │ │ | | ifnames: ikltXY, where X is the leaf number from 1 to 6, and Y is the port number, 1 or 2
| ╭─────────────────┐ ╭─────────────┐| |╭─────────────┐ ╭─────────────┐| |╭─────────────┐ ╭─────────────┐|
| │1 2│ │1 2 │| |│ 1 2 │ │1 2 │| |│1 2 │ │1 2 │|
| │ rack1 leaf1 │ │ rack1 leaf2 │| |│ rack2 leaf1 │ │ rack2 leaf2 │| |│ rack3 leaf1 │ │ rack3 leaf2 │|
| │ hostname: leaf1 │ │ hn: leaf2 │| |│ hn: leaf3 │ │ hn: leaf4 │| |│ hn: leaf5 │ │ hn: leaf6 │|
| │ 192.168.98.4 │ │ 192.168.98.5│| |│ 192.168.98.6│ │ 192.168.98.7│| |│192.168.98.8 │ │ 192.168.98.9│|
| │3 4 5 6 7 9 10 ..│ │3 4 5 6 7... │| |│ 3 4... │ │ 3 4... │| |│ 3 4... │ │ 3 4... │|
| └─────────────────┘ └─────────────┘| |└─────────────┘ └─────────────┘| |└─────────────┘ └─────────────┘|
| │ │ │ │ | | │ │ │ │ ifnames: iklsX-Y, where X is the leaf switch number from 1 to 6, and Y is the port number, from 3 to 15
| │ └───────┐ │ │ | | │ └───────┐ │ │ | | │ └───────┐ │ │ |
| │ │ │ │ (end) brige names: ikebXYY, where X is the number of the rack switch, and YY the interface number
| │ ╭───│───────────┘ │ | | │ ╭───│───────┘ │ | | │ ╭───│───────┘ │ |
| │ │ │ ╭───────┘ | | │ │ │ ╭───┘ | | │ │ │ ╭───┘ |
| │ │ │ │ | | │ │ │ │ | | │ │ │ │ |
| │ │ │ │ | | │ │ │ │ | | │ │ │ │ |
| │ │ │ │ | | │ │ │ │ | | │ │ │ │ |
| │ │ │ │ ifnames (tap): ikvmX-Y-Z where X is the rack number from 1 to 3, and Y the VM number (or position in the rack) from 3 to 15, and last is the NIC number (1 or 2)
| ┏━━━━━━━┓ ┏━━━━━━━┓ | | ┏━━━━━━━┓ ┏━━━━━━━┓ | | ┏━━━━━━━┓ ┏━━━━━━━┓ |
| ┃1 2┃ ┃1 2┃ | | ┃1 2┃ ┃1 2┃ | | ┃1 2┃ ┃1 2┃ |
| ┃ VM 1 ┃ ┃ VM 2 ┃ ..... | | ┃ VM 3 ┃ ┃ VM 4 ┃ ..... | | ┃ VM 5 ┃ ┃ VM 6 ┃ ..... |
| ┃ U3 ┃ ┃ U4 ┃ | | ┃ U3 ┃ ┃ U4 ┃ | | ┃ U3 ┃ ┃ U4 ┃ |
| ┗━━━━━━━┛ ┗━━━━━━━┛ | | ┗━━━━━━━┛ ┗━━━━━━━┛ | | ┗━━━━━━━┛ ┗━━━━━━━┛ |
| vrack 1| | vrack 2| | vrack 3|
└--------------------------------------┘ └---------------------------------┘ └---------------------------------┘
```
## What is ran when starting the infrastructure
Each switch is emulated by a virtual machine (see below), that is connected
to its neighbouring switch or server over IPv6 un-numbered (ie:
link-local IPv6 connectivity only). On the local loopback of each
switch, the management IPv4 is added so it is possible to ssh into
them traversing all the infrastructure, thanks to BGP routing.
## What is routed (bridges and VM TAPs)
All of the 192.168.96.0/19 network is routed with NAT on the
host server. All normally physical wires are being virtualized
using a Linux bridge. Every VM interface has its counterpart
"vm tap" in the host server. Then of course, each of these TAP
is then connected to the bridge.
## Nice property with link local
Since ikvswitch sets predictable MAC addresses for the swtich
ports, the IPv6 link-local is always the same, and therefore,
it is rather easy to check for IPv6 link-local connectivity
between switch ports. For example, to check if spine2 is well
connected to the leaf3 switch, we use the port 2 IPv6 of the
leaf3 switch (ie: fe80::a00:27ff:fe06:ac52), and the port
ens7 of the spine2 switch:
```
host-server> # ssh 192.168.97.2
spine1> # ping fe80::a00:27ff:fe06:ac52%ens7
```
The same way, even without FRR BGP routing working, it is
possible to ssh leaf3 from spine1 using the link-local address:
```
spine1> # ssh fe80::a00:27ff:fe06:ac52%ens7
```
## included DHCP server
ikvswitch provides a DHCP server that runs on each odd leaf
switches. That's leaf1, 3 and 5 on the above schematic. These
are by default setting the "next-server" to 192.168.100.2.
Typically, a virtual machine would be setup using that IP
address, connected to any of the leaf switches. This may
later be used to serve as a PXE server, to boot (and install)
your other virtual machines, just like you would do in the
real world with physical servers.
## Target audiance for this package
Anyone that wishes to use a complex BGP to the host setup with
VMs connected to 2 leaf switches, itself connected to 2 spines,
and test that environment.
I wrote this to test a deployment of OpenStack, but any deployment
(for example Kubernetes) may use this infrastructure, if setup
inside VMs.
## What this package does not provide
Even though it prepares the tap interface for VMs, this package
does *not* start VMs that are to be connected to the virtual
networking infrastructure that ikvswitch provides.
# How to use
## Configuration
Edit /etc/ikvswitch/ikvswitch.conf to your liking. The most
important value is the MY_IP one on top of the file, as this
is used for the NAT of what's inside the virtual network.
It's also probably a good idea to edit VM_ROOT_PASS at the
very end of the file.
It isn't needed to configure anything else, unless there is
a conflict with your local network (for example, if the LAN
where your host server is, is on the 192.168.96.0/19 range).
Absolutely all IP subnets, MAC addresses, interface and
bridge names can be configured and named the way you like.
## Starting-up the host networking
Before starting-up all the 9 VMs that will host the virtual
switches, it is mandatory to configure the host server.
Simply do:
```
# ikvswitch-host-networking start
```
to configure the host. This will create nearly 200 virtual NICs
by default, and nearly 100 bridges, so it takes a bit
of time to start.
## Starting-up the switches and connecting to them
It is as simple as:
```
# ikvswitch-setup start
```
That's all there is to it! :)
After that command, it should be possible to ssh into
the "internet" virtual switch:
```
host-server> # ssh 192.168.96.2
```
The ssh authorized_keys inside the switches have been
copied from your host's /root/.ssh/authorized_keys, and
the password is set to what's in /etc/ikvswitch/ikvswitch.conf
(it case you need to debug with VNC).
If everything goes well, from any VM, or even from the
host server, it's possible to reach any of the virtual
switches:
```
spine1> # ssh leaf3
```
or from the host server:
```
host-server> # ssh 192.168.98.6
```
## Debugging over VNC
Each switch is binding a VNC console on ports 5950 to
5958 (by default). So to connect to the leaf1 switch,
here's the command:
```
laptop> # vncviewver host-server:5953
```
then login as root, with the password matching what
is setup in /etc/ikvswitch/ikvswitch.conf (changeme
is the default).
# Spawning a VM connected to a switch
To create a VM just like "VM 1" in the schematic above (ie: port 3
of both rs1 and rs2), use this in the KVM command line for the
network interfaces:
```
-device virtio-net-pci,netdev=net0,mac=MAC_ADDR_OF_THE_VM -netdev tap,id=net0,ifname=ikvm1-3-1 \
-device virtio-net-pci,netdev=net1,mac=MAC_ADDR_OF_THE_VM -netdev tap,id=net1,ifname=ikvm1-3-2
```
The last parameter "ifname=ikvm1-3-1" at the end of the command line
contains the rack number (the first "1"), then the position in the
rack (the number "3" in the middle), and the last number should be
set to "1" for the first NIC, and "2" for the 2nd NIC as described
below:
```
ifname=ikvm1-3-2
│ │ │
│ │ └─ Leaf switch 1 or 2
│ └─── Port number of switches in the rack (from 3 to 15)
└───── Rack number (from 1 to 3)
```
Now, as another example, let's say the VM is in rack 2, position 6:
```
-device virtio-net-pci,netdev=net0,mac=VM_NIC1_MAC_ADDR -netdev tap,id=net0,ifname=ikvm1-3-1 \
-device virtio-net-pci,netdev=net1,mac=VM_NIC2_MAC_ADDR -netdev tap,id=net1,ifname=ikvm1-3-2
```
So only the ifname needs to be modified to change the location
of a VM in the datacenter virtual aisle. If we want to "rack
the VM" in position 5, in rack 3:
```
-device virtio-net-pci,netdev=net0,mac=VM_NIC1_MAC_ADDR -netdev tap,id=net0,ifname=ikvm3-5-1 \
-device virtio-net-pci,netdev=net1,mac=VM_NIC2_MAC_ADDR -netdev tap,id=net1,ifname=ikvm3-5-2
```
As we're having a DHCP server on port ens6 to ens18 on the
leaf1, leaf3 and leaf5 switches, it could conflict with the
BGP-2-host setup. Therefore, on leaf switches, we're having
a VLAN on each of the interface, which is where the final
VMs must connect to. By default it's the VLAN number 10
(this can be changed in ikvswitch.conf).
|