Open vSwitch

Setting up Open vSwitch on Ubuntu 12.04 LTS

Basically I used an existing tutorial for Ubuntu 12.04 LTS, however I also used some ideas from a tutorial for Ubuntu 12.10.

First we install two packages needed for Open vSwitch (the other packages are automatically pulled in, because they are dependencies):

aptitude install openvswitch-brcompat openvswitch-controller

Next we add the brcompat module to /etc/modules. I am not entirely sure, whether this is really necessary, however, I found that it helped in avoiding the traditional bridge module to be loaded first.

We also have to enable the bridge compatibility layer by setting BRCOMPAT=yes in /etc/default/openvswitch-switch.

At this point it is a good idea to reboot, because the brcompat module cannot be loaded if the bridge module is already loaded. Probably we could just unload the module but rebooting is easier and also helps us in ensuring that it will work the next time we reboot.

Now everything should be ready for configuration. In this example we create a bridge ovsbr0 that is connected to the physical interface eth0. This interface has an untagged VLAN and several tagged VLANs (we just show one as an example). The untagged VLAN also is the one supposed to be used for the management interface of the server. For each of the VLANs we create a bridge that will provide traffice from this VLAN untagged, so that we can bind a virtual machines interface to each VLAN individually.

First we create the main bridge and the bridges for the individual VLANs:

ovs-vsctl add-br ovsbr0
ovs-vsctl add-br ovsbr0v1 1 # Create a bridge for VLAN 1.
ovs-vsctl add-br ovsbr0v2 2 # Create a bridge for VLAN 2.

Now, assuming that the we want to use VLAN 1 for the management interface of the server, we add a port with this VLAN ID:

ovs-vsctl add-port ovsbr0 ovsbr0p1
ovs-vsctl set port ovsbr0p1 tag=1
ovs-vsctl set interface ovsbr0p1 type=internal
ovs-vsctl set interface ovsbr0p1 mac="00\:01\:02\:03\:04\:05"

Please note that some of the attributes are set in the port table, while others are set in the interface table. Obviously the MAC address should be replaced by a proper random MAC address. The page about KVM describes how to generate a random MAC address. You do not have to set a MAC address explicitly, however in this case the MAC address will change after each reboot, which typically is not desirable for the network interface of a server.

Now we change the network configuration in /etc/network/interfaces. We have to make sure that each virtual interface is brought up, even if we only use it as a bridge. We do this by bringing it up but disabling any IP configuration on it:

auto eth0
iface eth0 inet manual
        up ifconfig $IFACE up
        up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        down ifconfig $IFACE down

auto ovsbr0
iface ovsbr0 inet manual
        up ifconfig $IFACE up 
        up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        down ifconfig $IFACE down

auto ovsbr0p1
# This is just a place-holder. Replace it with the proper configuration for the
# management interface. Typically this is the configuration you had for eth0
# before.
iface ovsbr0p1 inet dhcp
iface ovsbr0p1 inet6 auto

auto ovsbr0v1
iface ovsbr0v1 inet manual
        up ifconfig $IFACE up
        up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        down ifconfig $IFACE down

auto ovsbr0v2
iface ovsbr0v2 inet manual
        up ifconfig $IFACE up
        up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        down ifconfig $IFACE down

For some reasons the system will not detect that the network has already been configured and thus delay startup when using Open vSwitch. Therefore we modify /etc/init/failsafe.conf in order to make it not wait for the network configuration to be finished. You can do this by applying the following patch:

--- failsafe.conf.dpkg-dist     2013-01-18 22:17:33.000000000 +0100
+++ failsafe.conf       2013-01-18 22:18:08.000000000 +0100
@@ -29,10 +29,10 @@
     # the end of this script to avoid letting the system spin forever
     # waiting on it to start.
        $PLYMOUTH message --text="Waiting for network configuration..." || :
-       sleep 40
+       sleep 1
-       $PLYMOUTH message --text="Waiting up to 60 more seconds for network configuration..." || :
-       sleep 59
+       $PLYMOUTH message --text="Waiting one more second for network configuration..." || :
+       sleep 1
        $PLYMOUTH message --text="Booting system without full network configuration..." || :
     # give user 1 second to see this message since plymouth will go

The last steps have to be performed directly from the server's operator's console, because they will interrupt the network connection. We add eth0 to the bridge and configure it for the right VLAN mode (VLAN 1 is untagged, all other VLANs are tagged):

ovs-vsctl add-port ovsbr0 eth0
ovs-vsctl set port eth0 tag=1
ovs-vsctl set port eth0 vlan_mode=native-untagged

That's it. After rebooting the server again, the network should be working and you can specify the bridge ovsbr0v1 and ovsbr0v2 in virtual-machine configurations.

Setting up Open vSwitch on Ubuntu 14.04 LTS

The instructions are nearly the same as for Ubuntu 12.04 LTS, so I only mention the differences.

Instead of installing openvswitch-brcompat and openvswitch-controller, install openvswitch-switch. You also do not have to enable the brcompat module.

You also do not have to make the changes to failsafe.conf. The system will boot fine without those changes.

Using Open vSwitch for a high-availability / fail-over interface

A simple HA setup for an IP address can easily be created using Pacemaker and the ocf:heartbeat:IPaddr2 and ocf:heartbeat:IPv6addr scripts. However, this kind of setup has one weakness: During fail-over, the MAC address changes because the IP address is now associated with a different computer and thus a different NIC. This can cause problems with old entries in ARP tables. Linux systems will typically deal with this correctly (they will see the unsolicited ARP message and update their caches), but some other operating systems or dedicated network equipment might cause trouble. For example, I had problems with the ARP cache of a Netgear GSM7328v2 switch, which could only be resolved by waiting a long time or manually clearing the ARP cache. Obviously, both options are not viable for an HA setup, where fail-over has to happen automatically and within seconds.

Therefore, it is desirable to keep the MAC address and transfer it together with the IP address. However, Linux does not allow more than one MAC address for a single interface and (to my knowledge) does not allow explicit configuration of MAC addresses on bridges. Luckily, the latter limitation does not apply to Open vSwitch bridges. Each interface associated with a specific port of a bridge can explicitly configured with a MAC address. This way, we can dynamically add or remove a port with the MAC address for which we want the fail-over setup.

We use the following commands to create the OVS bridge and add the NIC as as a port. In our example, the bridge has the name ovsbr0 and the NIC has the name eth0. We assume that no tagged VLANs are used.

   1 ovs-vsctl add-br ovsbr0
   2 ovs-vsctl add-port ovsbr0 eth0

In /etc/network/interfaces we create the following configuration:

auto eth0
iface eth0 inet manual
        up ip link set dev $IFACE up
        up sysctl -q -w net.ipv6.conf.$IFACE.disable_ipv6=1
        down ip link set dev $IFACE down

auto ovsbr0
        iface ovsbr0 inet manual
        up ip link set dev $IFACE up
        up sysctl -q -w net.ipv6.conf.$IFACE.disable_ipv6=1
        down ip link set dev $IFACE down

We have to bring up the interfaces because otherwise the bridge will not work. On the other hand, we want to disable IPv6 so that the interfaces do not get automatically assigned IPv6 addresses.

In order to manage an OVS bridge port with Pacemaker, we need a corresponding resource script. The following script does the job and should be saved as $OCF_ROOT/resource.d/marsching/OVSPort (on most systems, $OCF_ROOT is /usr/lib/ocf):

   1 #!/bin/bash
   3 #   OVS bridge port script for Pacemaker - Copyright 2014 Sebastian Marsching
   4 #
   5 #   Permission is hereby granted, free of charge, to any person obtaining
   6 #   a copy of this software and associated documentation files (the
   7 #   "Software"), to deal in the Software without restriction, including
   8 #   without limitation the rights to use, copy, modify, merge, publish,
   9 #   distribute, sublicense, and/or sell copies of the Software, and to
  10 #   permit persons to whom the Software is furnished to do so, subject to
  11 #   the following conditions:
  12 #
  13 #   The above copyright notice and this permission notice shall be included
  14 #   in all copies or substantial portions of the Software.
  15 #
  24 : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
  25 . ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
  27 # Avoid localization issues
  28 unset LC_ALL; export LC_ALL
  29 unset LANGUAGE; export LANGUAGE
  30 LC_ALL=C; export LC_ALL
  33 meta_data() {
  34   cat <<EOF
  35 <?xml version="1.0"?>
  36 <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
  37 <resource-agent name="OVSPort">
  38   <version>1.0</version>
  39   <longdesc lang="en">
  40     This script manages an OpenVSwitch port.
  41     It adds a port to a bridge or removes it
  42     respectively and configures the
  43     corresponding network interface.
  44     The network interface has to be already
  45     configured in /etc/network/interfaces,
  46     because this script relies on the ifup
  47     and ifdown tools.
  48   </longdesc>
  49   <shortdesc lang="en">Adds/removes an OVS port and configures the corresponding interface.</shortdesc>
  50   <parameters>
  51     <parameter name="bridge" unique="1" required="1">
  52       <longdesc lang="en">
  53         The name of the OVS bridge that the port is added to.
  54       </longdesc>
  55       <shortdesc lang="en">Bridge name</shortdesc>
  56       <content type="string" default="" />
  57     </parameter>
  58     <parameter name="interface" unique="1" required="1">
  59       <longdesc lang="en">
  60         The name of the port and the corresponding interface.
  61       </longdesc>
  62       <shortdesc lang="en">Interface/port name</shortdesc>
  63       <content type="string" default="" />
  64     </parameter>
  65     <parameter name="mac" unique="1" required="0">
  66       <longdesc lang="en">
  67         The MAC address of the interface. If not specified,
  68         a random MAC address is used.
  69       </longdesc>
  70       <shortdesc lang="en">Interface MAC address</shortdesc>
  71       <content type="string" default="" />
  72     </parameter>
  73   </parameters>
  74   <actions>
  75     <action name="start" timeout="20s" />
  76     <action name="stop" timeout="20s" />
  77     <action name="monitor" depth="0" timeout="20s" interval="15s" />
  78     <action name="validate-all" timeout="20s" />
  79     <action name="meta-data" timeout="5s" />
  80   </actions>
  81 </resource-agent>
  82 EOF
  83   exit $OCF_SUCCESS
  84 }
  86 usage() {
  87   echo "usage: $0 {start|stop|status|monitor|validate-all|meta-data}" >&2
  88 }
  90 check_is_root() {
  91   if ocf_is_root ; then
  92     :
  93   else
  94     echo "ERROR: This action requires root privileges." >&2
  95     exit $OCF_ERR_PERM
  96   fi
  97 }
  99 ovs_validate_all() {
 100   check_is_root
 101   if ifquery "$OCF_RESKEY_interface" >/dev/null 2>&1 ; then
 102     if ovs-vsctl br-exists "$OCF_RESKEY_bridge" ; then
 103       return $OCF_SUCCESS
 104     else
 105       echo "ERROR: Bridge \"$OCF_RESKEY_bridge\" not found." >&2
 106       return $OCF_ERR_CONFIGURED
 107     fi
 108   else
 109     echo "ERROR: No configuration for interface \"$OCF_RESKEY_interface\" found." >&2
 110     return $OCF_ERR_CONFIGURED
 111   fi
 112 }
 114 ovs_status_internal() {
 115   if ifconfig "$OCF_RESKEY_interface" >/dev/null 2>&1 ; then
 116     return $OCF_SUCCESS
 117   else
 118     return $OCF_NOT_RUNNING
 119   fi
 120 }
 122 ovs_monitor() {
 123   check_is_root
 124   if ovs_status_internal ; then
 125     echo "Interface \"$OCF_RESKEY_interface\" is up."
 126     return $OCF_SUCCESS
 127   else
 128     local rc=$?
 129     echo "Interface \"$OCF_RESKEY_interface\" is down."
 130     return $rc
 131   fi
 132 }
 134 ovs_start() {
 135   check_is_root
 136   ovs_add_port_command="ovs-vsctl add-port $OCF_RESKEY_bridge $OCF_RESKEY_interface -- set interface $OCF_RESKEY_interface type=internal"
 137   if [ -n "$OCF_RESKEY_mac" ]; then
 138     mac="`echo $OCF_RESKEY_mac|sed -e "s/:/\\\\\\\\:/g"`"
 139     ovs_add_port_command="$ovs_add_port_command -- set interface $OCF_RESKEY_interface mac=$mac"
 140   fi
 141   if $ovs_add_port_command >/dev/null 2>&1 && ifup "$OCF_RESKEY_interface" >/dev/null 2>&1 && ovs_status_internal; then
 142     echo "Interface \"$OCF_RESKEY_interface\" is up."
 143     return $OCF_SUCCESS
 144   else
 145     if ifup --force "$OCF_RESKEY_interface" >/dev/null 2>&1 && ovs_status_internal; then
 146       echo "Interface \"$OCF_RESKEY_interface\" is up."
 147       return $OCF_SUCCESS
 148     else
 149       echo "ERROR: Could not bring up interface \"$OCF_RESKEY_interface\"." >&2
 150       return $OCF_ERR_GENERIC
 151     fi
 152   fi
 153 }
 155 ovs_stop() {
 156   check_is_root
 157   ifdown "$OCF_RESKEY_interface" >/dev/null 2>&1
 158   if ovs_status_internal; then
 159     ifdown --force "$OCF_RESKEY_interface" >/dev/null 2>&1
 160     ovs-vsctl del-port "$OCF_RESKEY_interface"
 161     if ovs_status_internal; then
 162       echo "ERROR: Could not bring down interface \"$OCF_RESKEY_interface\"." >&2
 163       return $OCF_ERR_GENERIC
 164     else
 165       echo "Interface \"$OCF_RESKEY_interface\" is down."
 166       return $OCF_SUCCESS
 167     fi
 168   else
 169     echo "Interface \"$OCF_RESKEY_interface\" is down."
 170     return $OCF_SUCCESS
 171   fi
 172 }
 174 case $1 in
 175   meta-data)
 176     meta_data
 177     ;;
 178   start)
 179     ovs_validate_all && ovs_start
 180     ;;
 181   stop)
 182     ovs_stop
 183     ;;
 184   monitor)
 185     ovs_monitor
 186     ;;
 187   validate-all)
 188     ovs_validate_all
 189     ;;
 190   *)
 191     usage
 193     ;;
 194 esac
 196 exit $?

We can now define a resource using crm:

primitive iface-ovsbr0p1 ocf:marsching:OVSPort \
        params bridge="ovsbr0" interface="ovsbr0p1" mac="02:00:00:00:00:01" \
        op monitor interval="15s"

However, in order for this to work, we also need a configuration for this interface in /etc/network/interfaces:

iface ovsbr0p1 inet static
iface ovsbr0p1 inet6 static
        address 2001:db8::1
        netmask 64
        accept_ra 0
        autoconf 0

Please note that there is no auto line for this interface, because it is brought up and down by the resource management script. If there are other interfaces on the same bridge (with IP addresses in the same subnet), some additional configuration is needed. We have to use source-based routing to make sure the right interface (and thus the right MAC address) is used in ARP replies (see the discussion on Server Fault). In addition to that, the arp_filter flag might need to be set on the affected interfaces (it was not needed in my case).

iface ovsbr0p1 inet static
        post-up ip route add table $IFACE dev $IFACE
        post-up ip route add default table $IFACE via dev $IFACE
        post-up ip rule add from pref 1000 table $IFACE
        pre-down ip rule del from pref 1000 table $IFACE
iface ovsbr0p1 inet6 static
        address 2001:db8::1
        netmask 64
        accept_ra 0
        autoconf 0

The exact routes obviously depend on the actual environment. Typically, you will want the same routes as in the "normal" routing table, just with the different source device. This configuration assumes that the name of the interface is also a valid alias for a routing table ID. These aliases are configured in /etc/iproute2/rt_tables. You could also use a numeric identifier in the commands, however I find it easier to manage the numeric IDs in one central file and just use aliases in all other locations.

For IPv6, source-based routing seems not to be needed. The NDP implementation seems to use the MAC address of the interface to which the IP address is actually assigned.

CategoryEnglish CategoryLinux CategoryNetwork

Linux/OpenVSwitch (last edited 2017-08-13 10:14:33 by SebastianMarsching)