Open-Source | Entries from November 2014

Process tracking in Upstart

Trouble with IPv6 in a KVM guest running the 3.13 kernel

Posted by Sebastian Marsching on Saturday, November 15. 2014

Recently, I exprienced a problem with the process tracking in Upstart:

I wanted start a daemon process running as a specific user. However, I needed root privileges in the pre-start script, so I could not use the setuid/setgid options. I tried to use su in the exec option, but then Upstart would not track the right process. The expect fork and expect daemon options did not help either. As a nasty side effect, these options cannot be tested easily, because having the wrong option will lead to Upstart waiting for an already dead process to die and there is no way to reset the status in Upstart. At least, there is a workaround for effectively resetting the status without restarting the whole computer.

The problem is that su forks when running the command it is asked to run instead of calling exec from the main process. Unfortunately, The process I had to run would fork again because I had to run it as a daemon (not running it as a daemon had some undesirable side effects). Finally, I found the solution: Instead of using su, start-stop-daemon can be used. This tool will not fork and therefore it will not upset Upstart's process tracking. For example, the line

exec start-stop-daemon --start --chuid daemonuser --exec /bin/server_cmd

will run /bin/server_cmd as daemonuser without forking.

This way, expect fork or expect daemon can be used, just depending on the fork behavior of the final process.

November '14

Posted by Sebastian Marsching on Sunday, November 2. 2014

Some time ago, I wrote about two problems with the 3.13 kernel shipping with Ubuntu 14.04 LTS Trusty Tahr: One turned out to be a problem with KSM on NUMA machines acting as Linux KVM hosts and was fixed in later releases of the 3.13 kernel. The other one affected IPv6 routing between virtual machines on the same host. Finally, I figured out the cause of the second problem and how it can be solved.

I use two different kinds of network setups for Linux KVM hosts: For virtual-machine servers in our own network, the virtual machines get direct bridged access to the network (actually I use OpenVSwitch on the VM hosts for bridging specific VLANs, but this is just a technical detail). For this kind of setup, everything works fine, even when using the 3.13 kernel. However, we also have some VM hosts that are actually not in our own network, but are hosted in various data centers. For these VM hosts, I use a routed network configuration. This means that all traffic coming from and going to the virtual machines is routed by the VM host. On layer 2 (Ethernet), the virtual machines only see the VM host and the hosting provider's router only sees the physical machine.

This kind of setup has two advantages: First, it always works, even if the hosting provider expects to only see a single, well-known MAC address (which might be desirable for security reasons). Second, the VM host can act as a firewall, only allowing specific traffic to and from the outside world. In fact, the VM host can also act as a router between different virtual machines, thus protecting them from each other should one be compromised.

The problems with IPv6 only appear when using this kind of setup, where the Linux KVM host acts as a router, not a bridge. The symptoms are that IPv6 packets between two virtual machines are occasionally dropped, while communication with the VM host and the outside world continues to work fine. This is caused by the neigbor-discovery mechanism in IPv6. From the perspective of the VM host, all virtual machines are in the same network. Therefore, it sends an ICMPv6 redirect message in order to indicate that the VM should contact the other VM directly. However, this does not work because the network setup only allows traffic between the VM host and individual virtual machines, but no traffic between two virtual machines (otherwise it could not act as a firewall). Therefore, the neighbor-discovery mechanism determines the other VM to be not available (it should be on the same network but does not answer). After some time, the entry in the neighbor table (that you can inspect with ip neigh show) will expire and communication will work again for a short time, until the next redirect message is received and the same story starts again.

There are two possible solutions to this: The proper one would be to use an individual interface for each guest on the VM host. In this case, the VM host would not expect the virtual machines to be on the same network and thus stop sending redirect packets. Unfortunately, this makes the setup more complex and - if using a separate /64 for each interface - needs a lot of address space. The simpler albeit sketchy solution is to prevent the redict messages from having any effect. For IPv4, one could disable the sending of redirect messages through the sysctl option net.ipv4.conf.<interface>.send_redirects. For IPv6 however, this option is not available. So one could either use an iptables rule on the OUTPUT chain for blocking those packets or simply configure the KVM guests to ignore such packets. I chose the latter approach and added

# IPv6 redirects cause problems because of our routing scheme.
net.ipv6.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

to /etc/sysctl.conf in all affected virtual machines.

I do not know, why this behavior changed with kernel 3.13. One would expect the same problem to appear with older kernel versions, but I guess there must have been some change in the details of how NDP and redirect messages are handled.

Addendum (2014-11-02):

Adding the suggested options to sysctl.conf does not seem to fix the problem completely. For some reasons, an individual network interface can still have this setting enabled. Therefore, I now added the following line to the IPv6 configuration of the affected interface in /etc/network/interfaces:

        post-up sysctl net.ipv6.conf.$IFACE.accept_redirects=0

This finally fixes it, even if the other options are not added to sysctl.conf.