Open vSwitch

Last modified by Sebastian Marsching on 2022/05/29 14:05

Setting up Open vSwitch on Ubuntu 12.04 LTS

Basically I used an existing tutorial for Ubuntu 12.04 LTS, however I also used some ideas from a tutorial for Ubuntu 12.10.

First we install two packages needed for Open vSwitch (the other packages are automatically pulled in, because they are dependencies):

aptitude install openvswitch-brcompat openvswitch-controller

Next we add the brcompat module to /etc/modules. I am not entirely sure, whether this is really necessary, however, I found that it helped in avoiding the traditional bridge module to be loaded first.

We also have to enable the bridge compatibility layer by setting BRCOMPAT=yes in /etc/default/openvswitch-switch.

At this point it is a good idea to reboot, because the brcompat module cannot be loaded if the bridge module is already loaded. Probably we could just unload the module but rebooting is easier and also helps us in ensuring that it will work the next time we reboot.

Now everything should be ready for configuration. In this example we create a bridge ovsbr0 that is connected to the physical interface eth0. This interface has an untagged VLAN and several tagged VLANs (we just show one as an example). The untagged VLAN also is the one supposed to be used for the management interface of the server. For each of the VLANs we create a bridge that will provide traffice from this VLAN untagged, so that we can bind a virtual machines interface to each VLAN individually.

First we create the main bridge and the bridges for the individual VLANs:

ovs-vsctl add-br ovsbr0
ovs-vsctl add-br ovsbr0v1 1 # Create a bridge for VLAN 1.
ovs-vsctl add-br ovsbr0v2 2 # Create a bridge for VLAN 2.

Now, assuming that the we want to use VLAN 1 for the management interface of the server, we add a port with this VLAN ID:

ovs-vsctl add-port ovsbr0 ovsbr0p1
ovs-vsctl set port ovsbr0p1 tag=1
ovs-vsctl set interface ovsbr0p1 type=internal
ovs-vsctl set interface ovsbr0p1 mac="00\:01\:02\:03\:04\:05"

Please note that some of the attributes are set in the port table, while others are set in the interface table. Obviously the MAC address should be replaced by a proper random MAC address. The page about KVM describes how to generate a random MAC address. You do not have to set a MAC address explicitly, however in this case the MAC address will change after each reboot, which typically is not desirable for the network interface of a server.

Now we change the network configuration in /etc/network/interfaces. We have to make sure that each virtual interface is brought up, even if we only use it as a bridge. We do this by bringing it up but disabling any IP configuration on it:

auto eth0
iface eth0 inet manual
        up ifconfig $IFACE 0.0.0.0 up
        up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        down ifconfig $IFACE down

auto ovsbr0
iface ovsbr0 inet manual
        up ifconfig $IFACE 0.0.0.0 up
        up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        down ifconfig $IFACE down

auto ovsbr0p1
# This is just a place-holder. Replace it with the proper configuration for the
# management interface. Typically this is the configuration you had for eth0
# before.
iface ovsbr0p1 inet dhcp
iface ovsbr0p1 inet6 auto

auto ovsbr0v1
iface ovsbr0v1 inet manual
        up ifconfig $IFACE 0.0.0.0 up
        up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        down ifconfig $IFACE down

auto ovsbr0v2
iface ovsbr0v2 inet manual
        up ifconfig $IFACE 0.0.0.0 up
        up echo 1 >/proc/sys/net/ipv6/conf/$IFACE/disable_ipv6
        down ifconfig $IFACE down

For some reasons the system will not detect that the network has already been configured and thus delay startup when using Open vSwitch. Therefore we modify /etc/init/failsafe.conf in order to make it not wait for the network configuration to be finished. You can do this by applying the following patch:

--- failsafe.conf.dpkg-dist     2013-01-18 22:17:33.000000000 +0100
+++ failsafe.conf       2013-01-18 22:18:08.000000000 +0100
@@ -29,10 +29,10 @@
     # the end of this script to avoid letting the system spin forever
     # waiting on it to start.
        $PLYMOUTH message --text="Waiting for network configuration..." || :
-       sleep 40
+       sleep 1

-       $PLYMOUTH message --text="Waiting up to 60 more seconds for network configuration..." || :
-       sleep 59
+       $PLYMOUTH message --text="Waiting one more second for network configuration..." || :
+       sleep 1
        $PLYMOUTH message --text="Booting system without full network configuration..." || :

     # give user 1 second to see this message since plymouth will go

The last steps have to be performed directly from the server's operator's console, because they will interrupt the network connection. We add eth0 to the bridge and configure it for the right VLAN mode (VLAN 1 is untagged, all other VLANs are tagged):

ovs-vsctl add-port ovsbr0 eth0
ovs-vsctl set port eth0 tag=1
ovs-vsctl set port eth0 vlan_mode=native-untagged

That's it. After rebooting the server again, the network should be working and you can specify the bridge ovsbr0v1 and ovsbr0v2 in virtual-machine configurations.

Setting up Open vSwitch on Ubuntu 14.04 LTS

The instructions are nearly the same as for Ubuntu 12.04 LTS, so I only mention the differences.

Instead of installing openvswitch-brcompat and openvswitch-controller, install openvswitch-switch. You also do not have to enable the brcompat module.

You also do not have to make the changes to failsafe.conf. The system will boot fine without those changes.

Using Open vSwitch for a high-availability / fail-over interface

A simple HA setup for an IP address can easily be created using Pacemaker and the ocf:heartbeat:IPaddr2 and ocf:heartbeat:IPv6addr scripts. However, this kind of setup has one weakness: During fail-over, the MAC address changes because the IP address is now associated with a different computer and thus a different NIC. This can cause problems with old entries in ARP tables. Linux systems will typically deal with this correctly (they will see the unsolicited ARP message and update their caches), but some other operating systems or dedicated network equipment might cause trouble. For example, I had problems with the ARP cache of a Netgear GSM7328v2 switch, which could only be resolved by waiting a long time or manually clearing the ARP cache. Obviously, both options are not viable for an HA setup, where fail-over has to happen automatically and within seconds.

Therefore, it is desirable to keep the MAC address and transfer it together with the IP address. However, Linux does not allow more than one MAC address for a single interface and (to my knowledge) does not allow explicit configuration of MAC addresses on bridges. Luckily, the latter limitation does not apply to Open vSwitch bridges. Each interface associated with a specific port of a bridge can explicitly configured with a MAC address. This way, we can dynamically add or remove a port with the MAC address for which we want the fail-over setup.

We use the following commands to create the OVS bridge and add the NIC as as a port. In our example, the bridge has the name ovsbr0 and the NIC has the name eth0. We assume that no tagged VLANs are used.

ovs-vsctl add-br ovsbr0
ovs-vsctl add-port ovsbr0 eth0

In /etc/network/interfaces we create the following configuration:

auto eth0
iface eth0 inet manual
        up ip link set dev $IFACE up
        up sysctl -q -w net.ipv6.conf.$IFACE.disable_ipv6=1
        down ip link set dev $IFACE down

auto ovsbr0
        iface ovsbr0 inet manual

        up ip link set dev $IFACE up
        up sysctl -q -w net.ipv6.conf.$IFACE.disable_ipv6=1
        down ip link set dev $IFACE down

We have to bring up the interfaces because otherwise the bridge will not work. On the other hand, we want to disable IPv6 so that the interfaces do not get automatically assigned IPv6 addresses.

In order to manage an OVS bridge port with Pacemaker, we need a corresponding resource script. The following script does the job and should be saved as $OCF_ROOT/resource.d/marsching/OVSPort (on most systems, $OCF_ROOT is /usr/lib/ocf):

#!/bin/bash

#   OVS bridge port script for Pacemaker - Copyright 2014 Sebastian Marsching
#
#   Permission is hereby granted, free of charge, to any person obtaining
#   a copy of this software and associated documentation files (the
#   "Software"), to deal in the Software without restriction, including
#   without limitation the rights to use, copy, modify, merge, publish,
#   distribute, sublicense, and/or sell copies of the Software, and to
#   permit persons to whom the Software is furnished to do so, subject to
#   the following conditions:
#
#   The above copyright notice and this permission notice shall be included
#   in all copies or substantial portions of the Software.
#
#   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
#   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
#   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
#   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
#   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
#   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
#   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs

# Avoid localization issues
unset LC_ALL; export LC_ALL
unset LANGUAGE; export LANGUAGE
LC_ALL=C; export LC_ALL
LC_MESSAGES=C; export LC_MESSAGES

meta_data() {
  cat <<EOF

<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="OVSPort">
  <version>1.0</version>
  <longdesc lang="en">
    This script manages an OpenVSwitch port.
    It adds a port to a bridge or removes it
    respectively and configures the
    corresponding network interface.
    The network interface has to be already
    configured in /etc/network/interfaces,
    because this script relies on the ifup
    and ifdown tools.
  </longdesc>
  <shortdesc lang="en">Adds/removes an OVS port and configures the corresponding interface.</shortdesc>
  <parameters>
    <parameter name="bridge" unique="1" required="1">
      <longdesc lang="en">
        The name of the OVS bridge that the port is added to.
      </longdesc>
      <shortdesc lang="en">Bridge name</shortdesc>
      <content type="string" default="" ></content>
    </parameter>
    <parameter name="interface" unique="1" required="1">
      <longdesc lang="en">
        The name of the port and the corresponding interface.
      </longdesc>
      <shortdesc lang="en">Interface/port name</shortdesc>
      <content type="string" default="" ></content>
    </parameter>
    <parameter name="mac" unique="1" required="0">
      <longdesc lang="en">
        The MAC address of the interface. If not specified,
        a random MAC address is used.
      </longdesc>
      <shortdesc lang="en">Interface MAC address</shortdesc>
      <content type="string" default="" ></content>
    </parameter>
  </parameters>
  <actions>
    <action name="start" timeout="20s" ></action>
    <action name="stop" timeout="20s" ></action>
    <action name="monitor" depth="0" timeout="20s" interval="15s" ></action>
    <action name="validate-all" timeout="20s" ></action>
    <action name="meta-data" timeout="5s" ></action>
  </actions>
</resource-agent>
EOF

 exit $OCF_SUCCESS
}

usage() {
 echo "usage: $0 {start|stop|status|monitor|validate-all|meta-data}" >&2
}

check_is_root() {
 if ocf_is_root ; then
    :
 else
   echo "ERROR: This action requires root privileges." >&2
   exit $OCF_ERR_PERM
 fi
}

ovs_validate_all() {
  check_is_root
 if ifquery "$OCF_RESKEY_interface" >/dev/null 2>&1 ; then
   if ovs-vsctl br-exists "$OCF_RESKEY_bridge" ; then
     return $OCF_SUCCESS
   else
     echo "ERROR: Bridge \"$OCF_RESKEY_bridge\" not found." >&2
     return $OCF_ERR_CONFIGURED
   fi
 else
   echo "ERROR: No configuration for interface \"$OCF_RESKEY_interface\" found." >&2
   return $OCF_ERR_CONFIGURED
 fi
}

ovs_status_internal() {
 if ifconfig "$OCF_RESKEY_interface" >/dev/null 2>&1 ; then
   return $OCF_SUCCESS
 else
   return $OCF_NOT_RUNNING
 fi
}

ovs_monitor() {
  check_is_root
 if ovs_status_internal ; then
   echo "Interface \"$OCF_RESKEY_interface\" is up."
   return $OCF_SUCCESS
 else
   local rc=$?
   echo "Interface \"$OCF_RESKEY_interface\" is down."
   return $rc
 fi
}

ovs_start() {
  check_is_root
 ovs_add_port_command="ovs-vsctl add-port $OCF_RESKEY_bridge $OCF_RESKEY_interface -- set interface $OCF_RESKEY_interface type=internal"
 if [ -n "$OCF_RESKEY_mac" ]; then
   mac="`echo $OCF_RESKEY_mac|sed -e "s/:/\\\\\\\\:/g"`"
   ovs_add_port_command="$ovs_add_port_command -- set interface $OCF_RESKEY_interface mac=$mac"
 fi
 if $ovs_add_port_command >/dev/null 2>&1 && ifup "$OCF_RESKEY_interface" >/dev/null 2>&1 && ovs_status_internal; then
   echo "Interface \"$OCF_RESKEY_interface\" is up."
   return $OCF_SUCCESS
 else
   if ifup --force "$OCF_RESKEY_interface" >/dev/null 2>&1 && ovs_status_internal; then
     echo "Interface \"$OCF_RESKEY_interface\" is up."
     return $OCF_SUCCESS
   else
     echo "ERROR: Could not bring up interface \"$OCF_RESKEY_interface\"." >&2
     return $OCF_ERR_GENERIC
   fi
 fi
}

ovs_stop() {
  check_is_root
  ifdown "$OCF_RESKEY_interface" >/dev/null 2>&1
 if ovs_status_internal; then
    ifdown --force "$OCF_RESKEY_interface" >/dev/null 2>&1
    ovs-vsctl del-port "$OCF_RESKEY_interface"
   if ovs_status_internal; then
     echo "ERROR: Could not bring down interface \"$OCF_RESKEY_interface\"." >&2
     return $OCF_ERR_GENERIC
   else
     echo "Interface \"$OCF_RESKEY_interface\" is down."
     return $OCF_SUCCESS
   fi
 else
   echo "Interface \"$OCF_RESKEY_interface\" is down."
   return $OCF_SUCCESS
 fi
}

case $1 in
  meta-data)
    meta_data
    ;;
  start)
    ovs_validate_all && ovs_start
    ;;
  stop)
    ovs_stop
    ;;
  monitor)
    ovs_monitor
    ;;
  validate-all)
    ovs_validate_all
    ;;
  *)
    usage
   exit $OCF_ERR_UNIMPLEMENTED
    ;;
esac

exit $?

We can now define a resource using crm:

primitive iface-ovsbr0p1 ocf:marsching:OVSPort \
        params bridge="ovsbr0" interface="ovsbr0p1" mac="02:00:00:00:00:01" \
        op monitor interval="15s"

However, in order for this to work, we also need a configuration for this interface in /etc/network/interfaces:

iface ovsbr0p1 inet static
        address 192.0.2.1
        netmask 255.255.255.0
iface ovsbr0p1 inet6 static
        address 2001:db8::1
        netmask 64
        accept_ra 0
        autoconf 0

Please note that there is no auto line for this interface, because it is brought up and down by the resource management script. If there are other interfaces on the same bridge (with IP addresses in the same subnet), some additional configuration is needed. We have to use source-based routing to make sure the right interface (and thus the right MAC address) is used in ARP replies (see the discussion on Server Fault). In addition to that, the arp_filter flag might need to be set on the affected interfaces (it was not needed in my case).

iface ovsbr0p1 inet static
        address 192.0.2.1
        netmask 255.255.255.0
        post-up ip route add 192.0.2.0/24 table $IFACE dev $IFACE
        post-up ip route add default table $IFACE via 192.0.2.254 dev $IFACE
        post-up ip rule add from 192.0.2.1/32 pref 1000 table $IFACE
        pre-down ip rule del from 192.0.2.1/32 pref 1000 table $IFACE
iface ovsbr0p1 inet6 static
        address 2001:db8::1
        netmask 64
        accept_ra 0
        autoconf 0

The exact routes obviously depend on the actual environment. Typically, you will want the same routes as in the "normal" routing table, just with the different source device. This configuration assumes that the name of the interface is also a valid alias for a routing table ID. These aliases are configured in /etc/iproute2/rt_tables. You could also use a numeric identifier in the commands, however I find it easier to manage the numeric IDs in one central file and just use aliases in all other locations.

For IPv6, source-based routing seems not to be needed. The NDP implementation seems to use the MAC address of the interface to which the IP address is actually assigned.