Skip to content

Update on KVM problems with kernel 3.13

A few weeks ago, I wrote about problems with kernel 3.13 on Ubuntu 12.04 LTS  and 14.04 LTS.

Most likely, the problem that caused the excessive CPU load and occassional high network latencey has been fixed by now and the fix is going to be included in version 3.13.0-33 of the kernel package. I experienced this problem on a multi-processor machine, so it is probable that this was the problem with KSM and NUMA that has been fixed.

I am not sure, whether the problems that I had  with IPv6 connectivity are also solved by this fix: I had experienced those problems on a single-processor (but multi-core) machine, so it does not sound like a NUMA problem to me.

Anyhow, I will give the 3.13 kernel another try when the updated version is released. For the moment, I have migrated all server machines back to the 3.2 kernel, because the 3.5 kernel's end-of-life is soon and the 3.13 kernel has not been ready for production use yet. I do not expect to have considerable gains by using a newer kernel version on the servers anyway, so for the moment, the 3.2 kernel is a good option.


Linux KVM Problems with Ubuntu 14.04 LTS / Kernel 3.13.0-30

A few days ago I upgraded a virtual-machine host from Ubuntu 12.04 LTS (Precise Pangolin) to Ubuntu 14.04 LTS (Trusty Tahr). First, I thought that everything was working fine.

However, a short time later I noticed funny problems with the network connectivtity, particularly (but not only) affecting Windows guests. Occasionally, ICMP echo requests would only be answered with an enormous delay (seconds) or sometimes not even be answered at all. TCP connections to guests would stall very often. At the same time the load on the host system would be high even though the CPU usage would not be extremely heavy.

After I downgraded the virtual-machine host back to Ubuntu 12.04 LTS (and consequently to kernel 3.5) this problems disappeared immediately.

It seems like this is a bug related to the 3.13 kernel shipped with Ubuntu 14.04 LTS. There is a bug report on Launchpad and a discussion on Server Fault. It might be that the other problems that I experienced with the backported 3.13 kernel are related to this issue.

For the moment I will keep our virtual-machine hosts on Ubuntu 12.04 LTS and kernel 3.5, until the problems with the 3.13 kernel have been sorted out.

Trouble after installing linux-generic-lts-trusty in Ubuntu 12.04 LTS

Yesterday I updated a lot of computers (hosts as well as virtual machines) running Ubuntu 12.04 LTS (Precise Pangolin) to the backported version of the 3.13 kernel. This kernel is provided by the linux-image-generic-lts-trusty package which is installed (together with the linux-headers-generic-lts-trusty package) when installing linux-generic-lts-trusty. By installing the backported kernel (before the update all Ubuntu 12.04 LTS systems where running on the 3.5 kernel provided by linux-generic-lts-quantal) I wanted to increase the uniformity between the Ubuntu 12.04 LTS and Ubuntu 14.04 LTS systems.

After installing the new kernel and rebooting the machines, funny network problems started to happen. For some virtual machines, IPv6 communication between virtual machines running on the same VM host became very unreliable. For other virtual machines, I experienced occassional huge delays (up to several seconds) for IPv4 packets.

After testing around for a few hours (at the same time I had upgraded a virtual-machine host to Ubuntu 14.04 LTS and first suspected this upgrade, specifically the new version of OpenVSwitch), I found out that these network problems were indeed caused by the new kernel in the virtual machines. If one of two virtual machines running on the same host had the new kernel running, the problems with IPv6 appeared. If both were running the old kernel version, the problems disappeared. The other problem with the massively delayed IPv4 packets was a bit harder to reproduce. Funnily, it already became much better when I downgraded just one of the virtual machines on the host.

At the current stage (linux-image-generic-lts-quantal-3.13.0-30), there seems to be a massive problem with the IP stack of the kernel. For some reasons, this problems only seem to be triggered if the kernel is running in a (Linux KVM) virtual machine. For now, I downgraded all virtual machines back to the old kernel version.

I have to do some more tests to find out whether these problems are caused by the newer kernel in general or whether they are specifically caused by the backported version. At the moment I only have one virtual machine with Ubuntu 14.04 LTS, so I will have to setup some test VMs to carry out more tests.

Until then, I can only recommend to stay away from the backported 3.13 kernel, at least for virtual machines.

Nagios check_linux_raid missing in Ubuntu 14.04 LTS

I just upgrade a KVM virtual machine host from Ubuntu 12.04 LTS (Precise Pangolin) to Ubuntu 14.04 LTS (Trusty Tahr). Everything went smoothly except for one problem: The check_linux_raid script is missing in the updated version of the nagios-plugins-standard package.

The nagios-plugins-contrib package seems to contain a script which basically does the same job, but this package has a lot of other plugins that pull tons of additional dependencies, so I did not want to install this package. Luckily, just copying the check_linux_raid script from a system with the older version of Ubuntu worked fine for me.