Wiki source code of Open VSwitch
Version 2.1 by Sebastian Marsching on 2022/03/27 14:30
Show last authors
author | version | line-number | content |
---|---|---|---|
1 | {{toc/}} | ||
2 | |||
3 | # Setting up Open vSwitch on Ubuntu 12.04 LTS | ||
4 | |||
5 | Basically I used an [existing tutorial for Ubuntu 12.04 LTS](http://blog.allanglesit.com/2012/03/linux-kvm-ubuntu-12-04-with-openvswitch/), however I also used some ideas from a [tutorial for Ubuntu 12.10](http://blog.allanglesit.com/2012/10/linux-kvm-ubuntu-12-10-with-openvswitch/). | ||
6 | |||
7 | First we install two packages needed for Open vSwitch (the other packages are automatically pulled in, because they are dependencies): | ||
8 | |||
9 | ```bash | ||
10 | aptitude install openvswitch-brcompat openvswitch-controller | ||
11 | ``` | ||
12 | |||
13 | Next we add the `brcompat` module to `/etc/modules`. I am not entirely sure, whether this is really necessary, however, I found that it helped in avoiding the traditional bridge module to be loaded first. | ||
14 | |||
15 | We also have to enable the bridge compatibility layer by setting `BRCOMPAT=yes` in `/etc/default/openvswitch-switch`. | ||
16 | |||
17 | At this point it is a good idea to reboot, because the `brcompat` module cannot be loaded if the `bridge` module is already loaded. Probably we could just unload the module but rebooting is easier and also helps us in ensuring that it will work the next time we reboot. | ||
18 | |||
19 | Now everything should be ready for configuration. In this example we create a bridge `ovsbr0` that is connected to the physical interface `eth0`. This interface has an untagged VLAN and several tagged VLANs (we just show one as an example). The untagged VLAN also is the one supposed to be used for the management interface of the server. For each of the VLANs we create a bridge that will provide traffice from this VLAN untagged, so that we can bind a virtual machines interface to each VLAN individually. | ||
20 | |||
21 | First we create the main bridge and the bridges for the individual VLANs: | ||
22 | |||
23 | ```bash | ||
24 | ovs-vsctl add-br ovsbr0 | ||
25 | ovs-vsctl add-br ovsbr0v1 1 # Create a bridge for VLAN 1. | ||
26 | ovs-vsctl add-br ovsbr0v2 2 # Create a bridge for VLAN 2. | ||
27 | ``` | ||
28 | |||
29 | Now, assuming that the we want to use VLAN 1 for the management interface of the server, we add a port with this VLAN ID: | ||
30 | |||
31 | ```bash | ||
32 | ovs-vsctl add-port ovsbr0 ovsbr0p1 | ||
33 | ovs-vsctl set port ovsbr0p1 tag=1 | ||
34 | ovs-vsctl set interface ovsbr0p1 type=internal | ||
35 | ovs-vsctl set interface ovsbr0p1 mac="00\:01\:02\:03\:04\:05" | ||
36 | ``` | ||
37 | |||
38 | Please note that some of the attributes are set in the _port_ table, while others are set in the _interface_ table. Obviously the MAC address should be replaced by a proper random MAC address. The page about KVM describes [[how to generate a random MAC address|doc:Linux.KVM.WebHome|anchor="Generating_a_random_MAC_address"]]. You do not have to set a MAC address explicitly, however in this case the MAC address will change after each reboot, which typically is not desirable for the network interface of a server. | ||
39 | |||
40 | Now we change the network configuration in `/etc/network/interfaces`. We have to make sure that each virtual interface is brought up, even if we only use it as a bridge. We do this by bringing it up but disabling any IP configuration on it: | ||
41 | |||
42 | auto eth0 | ||
43 | iface eth0 inet manual | ||
44 | up ifconfig $IFACE 0.0.0.0 up | ||
45 | up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6 | ||
46 | down ifconfig $IFACE down | ||
47 | |||
48 | auto ovsbr0 | ||
49 | iface ovsbr0 inet manual | ||
50 | up ifconfig $IFACE 0.0.0.0 up | ||
51 | up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6 | ||
52 | down ifconfig $IFACE down | ||
53 | |||
54 | auto ovsbr0p1 | ||
55 | # This is just a place-holder. Replace it with the proper configuration for the | ||
56 | # management interface. Typically this is the configuration you had for eth0 | ||
57 | # before. | ||
58 | iface ovsbr0p1 inet dhcp | ||
59 | iface ovsbr0p1 inet6 auto | ||
60 | |||
61 | auto ovsbr0v1 | ||
62 | iface ovsbr0v1 inet manual | ||
63 | up ifconfig $IFACE 0.0.0.0 up | ||
64 | up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6 | ||
65 | down ifconfig $IFACE down | ||
66 | |||
67 | auto ovsbr0v2 | ||
68 | iface ovsbr0v2 inet manual | ||
69 | up ifconfig $IFACE 0.0.0.0 up | ||
70 | up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6 | ||
71 | down ifconfig $IFACE down | ||
72 | |||
73 | For some reasons the system will not detect that the network has already been configured and thus delay startup when using Open vSwitch. Therefore we modify `/etc/init/failsafe.conf` in order to make it not wait for the network configuration to be finished. You can do this by applying the following patch: | ||
74 | |||
75 | ```patch | ||
76 | --- failsafe.conf.dpkg-dist 2013-01-18 22:17:33.000000000 +0100 | ||
77 | +++ failsafe.conf 2013-01-18 22:18:08.000000000 +0100 | ||
78 | @@ -29,10 +29,10 @@ | ||
79 | # the end of this script to avoid letting the system spin forever | ||
80 | # waiting on it to start. | ||
81 | $PLYMOUTH message --text="Waiting for network configuration..." || : | ||
82 | - sleep 40 | ||
83 | + sleep 1 | ||
84 | |||
85 | - $PLYMOUTH message --text="Waiting up to 60 more seconds for network configuration..." || : | ||
86 | - sleep 59 | ||
87 | + $PLYMOUTH message --text="Waiting one more second for network configuration..." || : | ||
88 | + sleep 1 | ||
89 | $PLYMOUTH message --text="Booting system without full network configuration..." || : | ||
90 | |||
91 | # give user 1 second to see this message since plymouth will go | ||
92 | ``` | ||
93 | |||
94 | The last steps have to be performed directly from the server's operator's console, because they will interrupt the network connection. We add `eth0` to the bridge and configure it for the right VLAN mode (VLAN 1 is untagged, all other VLANs are tagged): | ||
95 | |||
96 | ```bash | ||
97 | ovs-vsctl add-port ovsbr0 eth0 | ||
98 | ovs-vsctl set port eth0 tag=1 | ||
99 | ovs-vsctl set port eth0 vlan_mode=native-untagged | ||
100 | ``` | ||
101 | |||
102 | That's it. After rebooting the server again, the network should be working and you can specify the bridge `ovsbr0v1` and `ovsbr0v2` in virtual-machine configurations. | ||
103 | |||
104 | # Setting up Open vSwitch on Ubuntu 14.04 LTS | ||
105 | |||
106 | The instructions are nearly the same as for Ubuntu 12.04 LTS, so I only mention the differences. | ||
107 | |||
108 | Instead of installing `openvswitch-brcompat` and `openvswitch-controller`, install `openvswitch-switch`. You also do not have to enable the `brcompat` module. | ||
109 | |||
110 | You also do not have to make the changes to `failsafe.conf`. The system will boot fine without those changes. | ||
111 | |||
112 | # Using Open vSwitch for a high-availability / fail-over interface | ||
113 | |||
114 | A simple HA setup for an IP address can easily be created using [Pacemaker](http://clusterlabs.org/) and the `ocf:heartbeat:IPaddr2` and `ocf:heartbeat:IPv6addr` scripts. However, this kind of setup has one weakness: During fail-over, the MAC address changes because the IP address is now associated with a different computer and thus a different NIC. This can cause problems with old entries in ARP tables. Linux systems will typically deal with this correctly (they will see the unsolicited ARP message and update their caches), but some other operating systems or dedicated network equipment might cause trouble. For example, I had problems with the ARP cache of a Netgear GSM7328v2 switch, which could only be resolved by waiting a long time or manually clearing the ARP cache. Obviously, both options are not viable for an HA setup, where fail-over has to happen automatically and within seconds. | ||
115 | |||
116 | Therefore, it is desirable to keep the MAC address and transfer it together with the IP address. However, Linux does not allow more than one MAC address for a single interface and (to my knowledge) does not allow explicit configuration of MAC addresses on bridges. Luckily, the latter limitation does not apply to Open vSwitch bridges. Each interface associated with a specific port of a bridge can explicitly configured with a MAC address. This way, we can dynamically add or remove a port with the MAC address for which we want the fail-over setup. | ||
117 | |||
118 | We use the following commands to create the OVS bridge and add the NIC as as a port. In our example, the bridge has the name `ovsbr0` and the NIC has the name `eth0`. We assume that no tagged VLANs are used. | ||
119 | |||
120 | ```bash | ||
121 | ovs-vsctl add-br ovsbr0 | ||
122 | ovs-vsctl add-port ovsbr0 eth0 | ||
123 | ``` | ||
124 | |||
125 | In `/etc/network/interfaces` we create the following configuration: | ||
126 | |||
127 | ``` | ||
128 | auto eth0 | ||
129 | iface eth0 inet manual | ||
130 | up ip link set dev $IFACE up | ||
131 | up sysctl -q -w net.ipv6.conf.$IFACE.disable_ipv6=1 | ||
132 | down ip link set dev $IFACE down | ||
133 | |||
134 | auto ovsbr0 | ||
135 | iface ovsbr0 inet manual | ||
136 | |||
137 | up ip link set dev $IFACE up | ||
138 | up sysctl -q -w net.ipv6.conf.$IFACE.disable_ipv6=1 | ||
139 | down ip link set dev $IFACE down | ||
140 | ``` | ||
141 | |||
142 | We have to bring up the interfaces because otherwise the bridge will not work. On the other hand, we want to disable IPv6 so that the interfaces do not get automatically assigned IPv6 addresses. | ||
143 | |||
144 | In order to manage an OVS bridge port with Pacemaker, we need a corresponding resource script. The following script does the job and should be saved as `$OCF_ROOT/resource.d/marsching/OVSPort` (on most systems, `$OCF_ROOT` is `/usr/lib/ocf`): | ||
145 | |||
146 | ```bash | ||
147 | #!/bin/bash | ||
148 | |||
149 | # OVS bridge port script for Pacemaker - Copyright 2014 Sebastian Marsching | ||
150 | # | ||
151 | # Permission is hereby granted, free of charge, to any person obtaining | ||
152 | # a copy of this software and associated documentation files (the | ||
153 | # "Software"), to deal in the Software without restriction, including | ||
154 | # without limitation the rights to use, copy, modify, merge, publish, | ||
155 | # distribute, sublicense, and/or sell copies of the Software, and to | ||
156 | # permit persons to whom the Software is furnished to do so, subject to | ||
157 | # the following conditions: | ||
158 | # | ||
159 | # The above copyright notice and this permission notice shall be included | ||
160 | # in all copies or substantial portions of the Software. | ||
161 | # | ||
162 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS | ||
163 | # OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF | ||
164 | # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. | ||
165 | # IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY | ||
166 | # CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, | ||
167 | # TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE | ||
168 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | ||
169 | |||
170 | : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat} | ||
171 | . ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs | ||
172 | |||
173 | # Avoid localization issues | ||
174 | unset LC_ALL; export LC_ALL | ||
175 | unset LANGUAGE; export LANGUAGE | ||
176 | LC_ALL=C; export LC_ALL | ||
177 | LC_MESSAGES=C; export LC_MESSAGES | ||
178 | |||
179 | meta_data() { | ||
180 | cat <<EOF | ||
181 | |||
182 | <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"> | ||
183 | <resource-agent name="OVSPort"> | ||
184 | <version>1.0</version> | ||
185 | <longdesc lang="en"> | ||
186 | This script manages an OpenVSwitch port. | ||
187 | It adds a port to a bridge or removes it | ||
188 | respectively and configures the | ||
189 | corresponding network interface. | ||
190 | The network interface has to be already | ||
191 | configured in /etc/network/interfaces, | ||
192 | because this script relies on the ifup | ||
193 | and ifdown tools. | ||
194 | </longdesc> | ||
195 | <shortdesc lang="en">Adds/removes an OVS port and configures the corresponding interface.</shortdesc> | ||
196 | <parameters> | ||
197 | <parameter name="bridge" unique="1" required="1"> | ||
198 | <longdesc lang="en"> | ||
199 | The name of the OVS bridge that the port is added to. | ||
200 | </longdesc> | ||
201 | <shortdesc lang="en">Bridge name</shortdesc> | ||
202 | <content type="string" default="" ></content> | ||
203 | </parameter> | ||
204 | <parameter name="interface" unique="1" required="1"> | ||
205 | <longdesc lang="en"> | ||
206 | The name of the port and the corresponding interface. | ||
207 | </longdesc> | ||
208 | <shortdesc lang="en">Interface/port name</shortdesc> | ||
209 | <content type="string" default="" ></content> | ||
210 | </parameter> | ||
211 | <parameter name="mac" unique="1" required="0"> | ||
212 | <longdesc lang="en"> | ||
213 | The MAC address of the interface. If not specified, | ||
214 | a random MAC address is used. | ||
215 | </longdesc> | ||
216 | <shortdesc lang="en">Interface MAC address</shortdesc> | ||
217 | <content type="string" default="" ></content> | ||
218 | </parameter> | ||
219 | </parameters> | ||
220 | <actions> | ||
221 | <action name="start" timeout="20s" ></action> | ||
222 | <action name="stop" timeout="20s" ></action> | ||
223 | <action name="monitor" depth="0" timeout="20s" interval="15s" ></action> | ||
224 | <action name="validate-all" timeout="20s" ></action> | ||
225 | <action name="meta-data" timeout="5s" ></action> | ||
226 | </actions> | ||
227 | </resource-agent> | ||
228 | EOF | ||
229 | exit $OCF_SUCCESS | ||
230 | } | ||
231 | |||
232 | usage() { | ||
233 | echo "usage: $0 {start|stop|status|monitor|validate-all|meta-data}" >&2 | ||
234 | } | ||
235 | |||
236 | check_is_root() { | ||
237 | if ocf_is_root ; then | ||
238 | : | ||
239 | else | ||
240 | echo "ERROR: This action requires root privileges." >&2 | ||
241 | exit $OCF_ERR_PERM | ||
242 | fi | ||
243 | } | ||
244 | |||
245 | ovs_validate_all() { | ||
246 | check_is_root | ||
247 | if ifquery "$OCF_RESKEY_interface" >/dev/null 2>&1 ; then | ||
248 | if ovs-vsctl br-exists "$OCF_RESKEY_bridge" ; then | ||
249 | return $OCF_SUCCESS | ||
250 | else | ||
251 | echo "ERROR: Bridge \"$OCF_RESKEY_bridge\" not found." >&2 | ||
252 | return $OCF_ERR_CONFIGURED | ||
253 | fi | ||
254 | else | ||
255 | echo "ERROR: No configuration for interface \"$OCF_RESKEY_interface\" found." >&2 | ||
256 | return $OCF_ERR_CONFIGURED | ||
257 | fi | ||
258 | } | ||
259 | |||
260 | ovs_status_internal() { | ||
261 | if ifconfig "$OCF_RESKEY_interface" >/dev/null 2>&1 ; then | ||
262 | return $OCF_SUCCESS | ||
263 | else | ||
264 | return $OCF_NOT_RUNNING | ||
265 | fi | ||
266 | } | ||
267 | |||
268 | ovs_monitor() { | ||
269 | check_is_root | ||
270 | if ovs_status_internal ; then | ||
271 | echo "Interface \"$OCF_RESKEY_interface\" is up." | ||
272 | return $OCF_SUCCESS | ||
273 | else | ||
274 | local rc=$? | ||
275 | echo "Interface \"$OCF_RESKEY_interface\" is down." | ||
276 | return $rc | ||
277 | fi | ||
278 | } | ||
279 | |||
280 | ovs_start() { | ||
281 | check_is_root | ||
282 | ovs_add_port_command="ovs-vsctl add-port $OCF_RESKEY_bridge $OCF_RESKEY_interface -- set interface $OCF_RESKEY_interface type=internal" | ||
283 | if [ -n "$OCF_RESKEY_mac" ]; then | ||
284 | mac="`echo $OCF_RESKEY_mac|sed -e "s/:/\\\\\\\\:/g"`" | ||
285 | ovs_add_port_command="$ovs_add_port_command -- set interface $OCF_RESKEY_interface mac=$mac" | ||
286 | fi | ||
287 | if $ovs_add_port_command >/dev/null 2>&1 && ifup "$OCF_RESKEY_interface" >/dev/null 2>&1 && ovs_status_internal; then | ||
288 | echo "Interface \"$OCF_RESKEY_interface\" is up." | ||
289 | return $OCF_SUCCESS | ||
290 | else | ||
291 | if ifup --force "$OCF_RESKEY_interface" >/dev/null 2>&1 && ovs_status_internal; then | ||
292 | echo "Interface \"$OCF_RESKEY_interface\" is up." | ||
293 | return $OCF_SUCCESS | ||
294 | else | ||
295 | echo "ERROR: Could not bring up interface \"$OCF_RESKEY_interface\"." >&2 | ||
296 | return $OCF_ERR_GENERIC | ||
297 | fi | ||
298 | fi | ||
299 | } | ||
300 | |||
301 | ovs_stop() { | ||
302 | check_is_root | ||
303 | ifdown "$OCF_RESKEY_interface" >/dev/null 2>&1 | ||
304 | if ovs_status_internal; then | ||
305 | ifdown --force "$OCF_RESKEY_interface" >/dev/null 2>&1 | ||
306 | ovs-vsctl del-port "$OCF_RESKEY_interface" | ||
307 | if ovs_status_internal; then | ||
308 | echo "ERROR: Could not bring down interface \"$OCF_RESKEY_interface\"." >&2 | ||
309 | return $OCF_ERR_GENERIC | ||
310 | else | ||
311 | echo "Interface \"$OCF_RESKEY_interface\" is down." | ||
312 | return $OCF_SUCCESS | ||
313 | fi | ||
314 | else | ||
315 | echo "Interface \"$OCF_RESKEY_interface\" is down." | ||
316 | return $OCF_SUCCESS | ||
317 | fi | ||
318 | } | ||
319 | |||
320 | case $1 in | ||
321 | meta-data) | ||
322 | meta_data | ||
323 | ;; | ||
324 | start) | ||
325 | ovs_validate_all && ovs_start | ||
326 | ;; | ||
327 | stop) | ||
328 | ovs_stop | ||
329 | ;; | ||
330 | monitor) | ||
331 | ovs_monitor | ||
332 | ;; | ||
333 | validate-all) | ||
334 | ovs_validate_all | ||
335 | ;; | ||
336 | *) | ||
337 | usage | ||
338 | exit $OCF_ERR_UNIMPLEMENTED | ||
339 | ;; | ||
340 | esac | ||
341 | |||
342 | exit $? | ||
343 | ``` | ||
344 | |||
345 | We can now define a resource using `crm`: | ||
346 | |||
347 | primitive iface-ovsbr0p1 ocf:marsching:OVSPort \ | ||
348 | params bridge="ovsbr0" interface="ovsbr0p1" mac="02:00:00:00:00:01" \ | ||
349 | op monitor interval="15s" | ||
350 | |||
351 | However, in order for this to work, we also need a configuration for this interface in `/etc/network/interfaces`: | ||
352 | |||
353 | iface ovsbr0p1 inet static | ||
354 | address 192.0.2.1 | ||
355 | netmask 255.255.255.0 | ||
356 | iface ovsbr0p1 inet6 static | ||
357 | address 2001:db8::1 | ||
358 | netmask 64 | ||
359 | accept_ra 0 | ||
360 | autoconf 0 | ||
361 | |||
362 | Please note that there is no `auto` line for this interface, because it is brought up and down by the resource management script. If there are other interfaces on the same bridge (with IP addresses in the same subnet), some additional configuration is needed. We have to use source-based routing to make sure the right interface (and thus the right MAC address) is used in ARP replies (see the [discussion on Server Fault](http://serverfault.com/questions/247472/arp-replies-contain-wrong-mac-address)). In addition to that, the [arp_filter](https://lwn.net/Articles/45386/#arp_filter) flag might need to be set on the affected interfaces (it was not needed in my case). | ||
363 | |||
364 | iface ovsbr0p1 inet static | ||
365 | address 192.0.2.1 | ||
366 | netmask 255.255.255.0 | ||
367 | post-up ip route add 192.0.2.0/24 table $IFACE dev $IFACE | ||
368 | post-up ip route add default table $IFACE via 192.0.2.254 dev $IFACE | ||
369 | post-up ip rule add from 192.0.2.1/32 pref 1000 table $IFACE | ||
370 | pre-down ip rule del from 192.0.2.1/32 pref 1000 table $IFACE | ||
371 | iface ovsbr0p1 inet6 static | ||
372 | address 2001:db8::1 | ||
373 | netmask 64 | ||
374 | accept_ra 0 | ||
375 | autoconf 0 | ||
376 | |||
377 | The exact routes obviously depend on the actual environment. Typically, you will want the same routes as in the "normal" routing table, just with the different source device. This configuration assumes that the name of the interface is also a valid alias for a routing table ID. These aliases are configured in `/etc/iproute2/rt_tables`. You could also use a numeric identifier in the commands, however I find it easier to manage the numeric IDs in one central file and just use aliases in all other locations. | ||
378 | |||
379 | For IPv6, source-based routing seems not to be needed. The NDP implementation seems to use the MAC address of the interface to which the IP address is actually assigned. |