File: README.md

package info (click to toggle)
ikvswitch 1.0.4
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 108 kB
  • sloc: sh: 941; makefile: 2
file content (326 lines) | stat: -rw-r--r-- 17,067 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
# What is ikvswitch
## What this package does

This package emulates 3 racks of a datacenter, and its matching
switch infrastructure with a BGP-2-the-host setup.

Its goal is to make it possible to deploy VMs the same way as one
would in a real production environment, with IPv6 un-numbered
connectivity between the spine and the leaf switches in the racks,
and between the leaf switches and the servers.

Each rack has 2 switches, each of them connected to 2 spines over
BGP, using FRR and the bgp-2-the-host over IPv6 un-numbered routing.

The number of racks is limited to 3, as this is enough for a full
HA redundancy, however, the number of ports in leaf switches (ie:
the number of U in your virtual racks) is configurable and unlimited
(the only limits are those of the Linux kernel).

## Designed to let your host server configuration untouched

To avoid making any (possibly distrubing) changes to the host
networking, there's an "internet" VM that is connected to the host
using a very simple L2 connectivity. The only thing that is added
to the host server for routing, is a simple SNAT rule:

```
iptables -t nat -A POSTROUTING -s 192.168.96.0/19 ! -d 192.168.96.0/19 -j SNAT --to-source ${MY_IP}
```

Therefore, if your test server is using BGP-to-the-host itself for
its outbound connectivity (or any other type of setup), there's no
need to add a special configuration to your local BGP daemon, or
any complex routing rules. The above iptables rules (and that rule
only) is enough for a full bi-directional connectivity from your
host server to all switches and hosted VMs.

## Designed for very low number of dependencies

There are a number of other tools designed to setup complex
networking setups, which are probably best fit for testing
a datacenter infrastructure. However, these are often non-free,
and bring a lot of dependencies that need to be installed in
your host server.

Instead, ikvswitch approach is to have a very low amount of
dependency packages. Basically, it needs Qemu, a few networking
utility like bridge-utils, iptables and so on, openstack-debian-images
to create the operating system images of your switches, and that's
about it.

## Designed for application developers, not for network engineers

The goal of ikvswitch is not to help designing a network infrastructure,
but rather to help developers test clustering application (like
OpenStack, but maybe something else...) using a "standard"
bgp-to-the-host L3 network fabric. The only thing that one
needs to do is to "plug VMs in the virtual racks" in a kind
of standard KVM way, knowing that ikvswitch will provide
BGP routing and DHCP.

## Network schema of the virtual infrastructure

What's below is probably the easiest way to understand what the
heck ikvswitch does.

```
  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃ host server                  ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
   │ ikhn ((dummy net) host nick)
   │ ikhb (host bridge) doing an L2 connectivity to the host server
   │ 192.168.96.1
   │ ikht (host (internet VM) tap)
   │ 192.168.96.2
  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃ internet                     ┃
  ┃ 192.168.98.1                 ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
   │ ikin1                   │ ikin2 (internet (VM) nic (TAP))
   │                         │
   │ ikib1                   │ ikib2 (internet bridge)
   │                         │
   │ ikit1                   │ ikit2 (internet (VM) TAP)
  ┏━━━━━━━━━━━━┓            ┏━━━━━━━━━━━━┓
  ┃1           ┃            ┃1           ┃
  ┃ spine1     ┃            ┃ spine 2    ┃
  ┃192.168.98.2┃            ┃192.168.98.3┃
  ┃2 3 4 5 6 7 ┃            ┃2 3 4 5 6 7 ┃
  ┗━━━━━━━━━━━━┛            ┗━━━━━━━━━━━━┛
   │ │ │ │ │ │     ╭─────────┘ │ │ │ │ │ ifnames (spine tap): ikstXY, where X is 1 (for spine 1) or 2, and Y 2 to 7
   │ │ │ │ │ │     │           │ │ │ │ │  
   │ └─│─│─│─│───────────┐     │ └─│─│─│─────────────┐ brige names: iksbXY, where X is 1 or 2, and Y 2 to 7
   │   │ │ │ │     │     │     │   │ │ └─────────────│─────────────────────────────────────────────────────┐
   │   │ │ │ │     │     │     │   │ └───────────────│──────────────────────────────────┐                  │
   │   └─│─│─│─────│─────│─────│───│────────┐        │                                  │                  │
   │     │ │ │     │     │     │   └────────│────────│─────────────────┐                │                  │
   │     └─│─│─────│─────│─────│────────────│────────│────────┐        │                │                  │
   │       │ │     │     │     │            │        │        │        │                │                  │
   │       └─│─────│─────│─────│────────────│────────│────────│────────│────────┐       │                  │
   │         │     │     │     │            │        │        │        │        │       │                  │
   │         └─────│─────│─────│────────────│────────│────────│────────│────────│───────│─────────┐        │
   │               │     │     │            │        │        │        │        │       │         │        │
╭--│---------------│-----│-----│-------┐ ╭--│--------│--------│--------│---┐ ╭--│-------│---------│--------│---┐
|  │               │     │     │       | |  │        │        │        │   | | ifnames: ikltXY, where X is the leaf number from 1 to 6, and Y is the port number, 1 or 2
| ╭─────────────────┐   ╭─────────────┐| |╭─────────────┐   ╭─────────────┐| |╭─────────────┐   ╭─────────────┐|
| │1               2│   │1     2      │| |│ 1        2  │   │1        2   │| |│1       2    │   │1        2   │|
| │ rack1 leaf1     │   │ rack1 leaf2 │| |│ rack2 leaf1 │   │ rack2 leaf2 │| |│ rack3 leaf1 │   │ rack3 leaf2 │|
| │ hostname: leaf1 │   │ hn: leaf2   │| |│ hn: leaf3   │   │ hn: leaf4   │| |│ hn: leaf5   │   │ hn: leaf6   │|
| │ 192.168.98.4    │   │ 192.168.98.5│| |│ 192.168.98.6│   │ 192.168.98.7│| |│192.168.98.8 │   │ 192.168.98.9│|
| │3 4 5 6 7 9 10 ..│   │3 4 5 6 7... │| |│ 3 4...      │   │ 3 4...      │| |│ 3 4...      │   │ 3 4...      │|
| └─────────────────┘   └─────────────┘| |└─────────────┘   └─────────────┘| |└─────────────┘   └─────────────┘|
|  │ │                   │ │           | |  │ │               │ │ ifnames: iklsX-Y, where X is the leaf switch number from 1 to 6, and Y is the port number, from 3 to 15
|  │ └───────┐           │ │           | |  │ └───────┐       │ │          | |  │ └───────┐       │ │          |
|  │         │           │ │ (end) brige names: ikebXYY, where X is the number of the rack switch, and YY the interface number
|  │     ╭───│───────────┘ │           | |  │     ╭───│───────┘ │          | |  │     ╭───│───────┘ │          |
|  │     │   │     ╭───────┘           | |  │     │   │     ╭───┘          | |  │     │   │     ╭───┘          |
|  │     │   │     │                   | |  │     │   │     │              | |  │     │   │     │              |
|  │     │   │     │                   | |  │     │   │     │              | |  │     │   │     │              |
|  │     │   │     │                   | |  │     │   │     │              | |  │     │   │     │              |
|  │     │   │     │ ifnames (tap): ikvmX-Y-Z where X is the rack number from 1 to 3, and Y the VM number (or position in the rack) from 3 to 15, and last is the NIC number (1 or 2)
| ┏━━━━━━━┓ ┏━━━━━━━┓                  | | ┏━━━━━━━┓ ┏━━━━━━━┓             | | ┏━━━━━━━┓ ┏━━━━━━━┓             |
| ┃1     2┃ ┃1     2┃                  | | ┃1     2┃ ┃1     2┃             | | ┃1     2┃ ┃1     2┃             |
| ┃ VM  1 ┃ ┃ VM  2 ┃  .....           | | ┃ VM  3 ┃ ┃ VM  4 ┃  .....      | | ┃ VM  5 ┃ ┃ VM  6 ┃  .....      |
| ┃ U3    ┃ ┃ U4    ┃                  | | ┃ U3    ┃ ┃ U4    ┃             | | ┃ U3    ┃ ┃ U4    ┃             |
| ┗━━━━━━━┛ ┗━━━━━━━┛                  | | ┗━━━━━━━┛ ┗━━━━━━━┛             | | ┗━━━━━━━┛ ┗━━━━━━━┛             |
|                               vrack 1| |                          vrack 2| |                          vrack 3|
└--------------------------------------┘ └---------------------------------┘ └---------------------------------┘

```

## What is ran when starting the infrastructure

Each switch is emulated by a virtual machine (see below), that is connected
to its neighbouring switch or server over IPv6 un-numbered (ie:
link-local IPv6 connectivity only). On the local loopback of each
switch, the management IPv4 is added so it is possible to ssh into
them traversing all the infrastructure, thanks to BGP routing.

## What is routed (bridges and VM TAPs)

All of the 192.168.96.0/19 network is routed with NAT on the
host server. All normally physical wires are being virtualized
using a Linux bridge. Every VM interface has its counterpart
"vm tap" in the host server. Then of course, each of these TAP
is then connected to the bridge.

## Nice property with link local

Since ikvswitch sets predictable MAC addresses for the swtich
ports, the IPv6 link-local is always the same, and therefore,
it is rather easy to check for IPv6 link-local connectivity
between switch ports. For example, to check if spine2 is well
connected to the leaf3 switch, we use the port 2 IPv6 of the
leaf3 switch (ie: fe80::a00:27ff:fe06:ac52), and the port
ens7 of the spine2 switch:

```
host-server> # ssh 192.168.97.2
spine1> # ping fe80::a00:27ff:fe06:ac52%ens7
```

The same way, even without FRR BGP routing working, it is
possible to ssh leaf3 from spine1 using the link-local address:

```
spine1> # ssh fe80::a00:27ff:fe06:ac52%ens7
```

## included DHCP server

ikvswitch provides a DHCP server that runs on each odd leaf
switches. That's leaf1, 3 and 5 on the above schematic. These
are by default setting the "next-server" to 192.168.100.2.
Typically, a virtual machine would be setup using that IP
address, connected to any of the leaf switches. This may
later be used to serve as a PXE server, to boot (and install)
your other virtual machines, just like you would do in the
real world with physical servers.

## Target audiance for this package

Anyone that wishes to use a complex BGP to the host setup with
VMs connected to 2 leaf switches, itself connected to 2 spines,
and test that environment.

I wrote this to test a deployment of OpenStack, but any deployment
(for example Kubernetes) may use this infrastructure, if setup
inside VMs.

## What this package does not provide

Even though it prepares the tap interface for VMs, this package
does *not* start VMs that are to be connected to the virtual
networking infrastructure that ikvswitch provides.

# How to use
## Configuration

Edit /etc/ikvswitch/ikvswitch.conf to your liking. The most
important value is the MY_IP one on top of the file, as this
is used for the NAT of what's inside the virtual network.

It's also probably a good idea to edit VM_ROOT_PASS at the
very end of the file.

It isn't needed to configure anything else, unless there is
a conflict with your local network (for example, if the LAN
where your host server is, is on the 192.168.96.0/19 range).

Absolutely all IP subnets, MAC addresses, interface and
bridge names can be configured and named the way you like.

## Starting-up the host networking

Before starting-up all the 9 VMs that will host the virtual
switches, it is mandatory to configure the host server.
Simply do:

```
# ikvswitch-host-networking start
```

to configure the host. This will create nearly 200 virtual NICs
by default, and nearly 100 bridges, so it takes a bit
of time to start.

## Starting-up the switches and connecting to them

It is as simple as:

```
# ikvswitch-setup start
```

That's all there is to it! :)

After that command, it should be possible to ssh into
the "internet" virtual switch:

```
host-server> # ssh 192.168.96.2
```

The ssh authorized_keys inside the switches have been
copied from your host's /root/.ssh/authorized_keys, and
the password is set to what's in /etc/ikvswitch/ikvswitch.conf
(it case you need to debug with VNC).

If everything goes well, from any VM, or even from the
host server, it's possible to reach any of the virtual
switches:

```
spine1> # ssh leaf3
```

or from the host server:

```
host-server> # ssh 192.168.98.6
```

## Debugging over VNC

Each switch is binding a VNC console on ports 5950 to
5958 (by default). So to connect to the leaf1 switch,
here's the command:

```
laptop> # vncviewver host-server:5953
```

then login as root, with the password matching what
is setup in /etc/ikvswitch/ikvswitch.conf (changeme
is the default).

# Spawning a VM connected to a switch

To create a VM just like "VM 1" in the schematic above (ie: port 3
of both rs1 and rs2), use this in the KVM command line for the
network interfaces:

```
 -device virtio-net-pci,netdev=net0,mac=MAC_ADDR_OF_THE_VM -netdev tap,id=net0,ifname=ikvm1-3-1 \
 -device virtio-net-pci,netdev=net1,mac=MAC_ADDR_OF_THE_VM -netdev tap,id=net1,ifname=ikvm1-3-2
```

The last parameter "ifname=ikvm1-3-1" at the end of the command line
contains the rack number (the first "1"), then the position in the
rack (the number "3" in the middle), and the last number should be
set to "1" for the first NIC, and "2" for the 2nd NIC as described
below:

```
 ifname=ikvm1-3-2
            │ │ │
            │ │ └─ Leaf switch 1 or 2
            │ └─── Port number of switches in the rack (from 3 to 15)
            └───── Rack number (from 1 to 3)
```

Now, as another example, let's say the VM is in rack 2, position 6:

```
-device virtio-net-pci,netdev=net0,mac=VM_NIC1_MAC_ADDR -netdev tap,id=net0,ifname=ikvm1-3-1 \
-device virtio-net-pci,netdev=net1,mac=VM_NIC2_MAC_ADDR -netdev tap,id=net1,ifname=ikvm1-3-2
```

So only the ifname needs to be modified to change the location
of a VM in the datacenter virtual aisle. If we want to "rack
the VM" in position 5, in rack 3:

```
-device virtio-net-pci,netdev=net0,mac=VM_NIC1_MAC_ADDR -netdev tap,id=net0,ifname=ikvm3-5-1 \
-device virtio-net-pci,netdev=net1,mac=VM_NIC2_MAC_ADDR -netdev tap,id=net1,ifname=ikvm3-5-2
```

As we're having a DHCP server on port ens6 to ens18 on the
leaf1, leaf3 and leaf5 switches, it could conflict with the
BGP-2-host setup. Therefore, on leaf switches, we're having
a VLAN on each of the interface, which is where the final
VMs must connect to. By default it's the VLAN number 10
(this can be changed in ikvswitch.conf).