Skip to content

Trouble after installing linux-generic-lts-trusty in Ubuntu 12.04 LTS

Yesterday I updated a lot of computers (hosts as well as virtual machines) running Ubuntu 12.04 LTS (Precise Pangolin) to the backported version of the 3.13 kernel. This kernel is provided by the linux-image-generic-lts-trusty package which is installed (together with the linux-headers-generic-lts-trusty package) when installing linux-generic-lts-trusty. By installing the backported kernel (before the update all Ubuntu 12.04 LTS systems where running on the 3.5 kernel provided by linux-generic-lts-quantal) I wanted to increase the uniformity between the Ubuntu 12.04 LTS and Ubuntu 14.04 LTS systems.

After installing the new kernel and rebooting the machines, funny network problems started to happen. For some virtual machines, IPv6 communication between virtual machines running on the same VM host became very unreliable. For other virtual machines, I experienced occassional huge delays (up to several seconds) for IPv4 packets.

After testing around for a few hours (at the same time I had upgraded a virtual-machine host to Ubuntu 14.04 LTS and first suspected this upgrade, specifically the new version of OpenVSwitch), I found out that these network problems were indeed caused by the new kernel in the virtual machines. If one of two virtual machines running on the same host had the new kernel running, the problems with IPv6 appeared. If both were running the old kernel version, the problems disappeared. The other problem with the massively delayed IPv4 packets was a bit harder to reproduce. Funnily, it already became much better when I downgraded just one of the virtual machines on the host.

At the current stage (linux-image-generic-lts-quantal-3.13.0-30), there seems to be a massive problem with the IP stack of the kernel. For some reasons, this problems only seem to be triggered if the kernel is running in a (Linux KVM) virtual machine. For now, I downgraded all virtual machines back to the old kernel version.

I have to do some more tests to find out whether these problems are caused by the newer kernel in general or whether they are specifically caused by the backported version. At the moment I only have one virtual machine with Ubuntu 14.04 LTS, so I will have to setup some test VMs to carry out more tests.

Until then, I can only recommend to stay away from the backported 3.13 kernel, at least for virtual machines.

Nagios check_linux_raid missing in Ubuntu 14.04 LTS

I just upgrade a KVM virtual machine host from Ubuntu 12.04 LTS (Precise Pangolin) to Ubuntu 14.04 LTS (Trusty Tahr). Everything went smoothly except for one problem: The check_linux_raid script is missing in the updated version of the nagios-plugins-standard package.

The nagios-plugins-contrib package seems to contain a script which basically does the same job, but this package has a lot of other plugins that pull tons of additional dependencies, so I did not want to install this package. Luckily, just copying the check_linux_raid script from a system with the older version of Ubuntu worked fine for me.

Windows Server 2012 R2 Windows Update Error 80072F8F

After installing the KB2919355 update, Windows Update would always present error 80072F8F when checking for updates.

Now, one might assume that this is the problem with WSUS 3.x that you can read about everywhere. However, in my case it was not. The WSUS server was using SSL/TLS, however it was running on Windows Server 2012 R2 as well. I looked into this problem for many hours and was mislead by two things: First, if you search for this problem on the Internet, there are so many articles talking about the well-known problem with old WSUS servers, that you hardly find anything else. Second, I also could not the WSUS site (or any other SSL-enabled site) in Internet Explorer on the affected machines.

I still do not know where the problem with Internet Explorer comes from - it might well have existed from the beginning and uninstalling KB2919355 did not fix it. The problem with WSUS however was indeed caused by KB2919355...

The certificate used by the WSUS site is signed by one of our internal certificate authorities. For some reason, which does not matter, the CRL for this specific CA could not be downloaded from the location specified in the server certificate. The server which should have served the CRL sent an HTTP redirect to an invalid URL instead. Before installing KB2919355, this did not matter. Windows Update was still working fine. After installing the update however, Windows seems to download the URL and fail the connection, if the CRL cannot be downloaded. Obviously, this is a much more secure approach. However, there is no message indicating the cause of the problem in the event log, so the administrator has to find the cause of the problem by trial-and-error. This is something Microsoft could really improve.

After fixing the problem with the CRL, so that it could be downloaded correctly, Windows Update worked again without any problems. I am just a bit annoyed because I spent nearly an entire day figuring this out...

Migrating the EJBCA database from H2 to PostgreSQL

I recently installed EJBCA for managing our internal public key infrastructure (PKI). Before using EJBCA, I used openssl from the command-line, but this got uncomfortable, in particular for managing certificate revocation lists (CRLs).

Unfortunately, I made a small but significant mistake when setting up EJBCA: I chose to use the default embedded H2 database. While this database for sure could handle the load for our small PKI, it is inconvenient when trying to make backups: The whole application server needs to be stopped in order to ensure consistency of the backups, a solution which is rather impractical. Therefore I wanted to migrate the EJBCA database from H2 to PostgreSQL.

However, H2 and PostgreSQL are quite different, and the SQL dump generated by H2 could not be easily imported into PostgreSQL. After trying various approaches, I luckily found the nice tool SQuirreL SQL, which (besides other things) can copy tables between databases - even databases of different type. Obviously, this will not solve all migration problems, but for my situation it worked quite well.

I documented the whole migration process in my wiki, in case someone else wants to do the same.

Listing local users with PowerShell in Windows Server 2012 R2

Listing local users with PowerShell is easy: A reference to the local computer object can be created with $computer = [ADSI] "WinNT://." and then user and group accounts can be accessed through the Children property ($computer.Children).

There is only one problem with this: On Windows Server 2012 R2 (and also on Windows 8.1 according to here and here), the process will hang indefinitely when trying to iterate over the children. Even canceling the action with CTRL-C will not work. The only way to get out of this is closing and restarting PowerShell.

Luckily there is a workaround for this bug: When explicitly filtered for user and group objects, the children can be iterated. So before iterating over the Children property, you should configure the schema filter by calling $computer.Children.SchemaFilter.AddRange(@("user", "group")) and everything should work nicely.

PS: I would have posted this solution to the two site discussing this issue. However, both force you to register before posting anything. In my opinion this is a great way of keeping people from participating.

Bare-Metal Recovery of Windows Server 2012 R2 using Bacula

I have been using Bacula as our main backup system for years. While Bacula works perfectly for Linux systems, bare-metal recovery (also known as disaster recovery) of Windows systems has been an open issue ever since.

The Bacula manual describes some procedures, but they only apply to systems running an operating system not newer than Windows Server 2003 R2. Even these procedures remain a bit unclear. If you look for solutions that cover Windows Server 2008 and newer versions of Windows, you will only find a few mailing-list posts that discuss using Windows Server Backup in combination with Bacula. However, none of these solutions sound very appealing.

I believe that you do not have a backup unless you tested the restore, I wanted to find out the best way for backing up a Windows system with Bacula. So I spent some time and installed a Windows Server 2012 R2 system in a virtual machine, made a backup with Bacula, and then tried to restore this backup in a new virtual machine. I actually succeeded without using Windows Server Backup or any other third-party tool. It really seems to work with a Bacula-only solution.

I documented the steps I used in the wiki, just in case I might have to restore a Windows System from a Bacula backup in the future. Maybe this guide is useful for you as well.

Blinking Cursor but System not Booting on HP ProLiant DL160 G6 Server

Recently I experienced a strange problem with a HP ProLiant DL160 G6 server:

Sometimes after seeing the BIOS initialization messages, the system would not boot but just show a blank screen with a blinking cursor. After power-cycling, this problem sometimes would disappear and sometimes it would appear again.

Frankly this problem puzzled me. Luckily, someone else had experienced this problem before and found the reason:

This problem can be caused by an incompatible KVM console. In my case I had been using a Sharkoon PS/2 to USB adapter in order to connect the system to an ATEN KVM switch (the server does not have PS/2 and the KVM switch does not have USB). After I connected a USB keyboard directly, the problem disappeared, even if the PS/2 to USB adapter was connected in parallel.

Unfortunately I have not figured out yet, whether the problem is caused by the adapter, the KVM switch or the PS/2 keyboard. Maybe I will try a different adapter and report, whether this fixes the problem.

OpenLDAP Server not listening on IPv6 Socket in Zimbra 8

Recently I have been experiencing a strange with an installation of the Community Edition of Zimbra Collaboration Server 8: Although all services were running, no e-mails were delivered. In the log file /var/log/zimbra.log I found messages like "zimbra amavis[9323]: (09323-01) (!!)TROUBLE in process_request: connect_to_ldap: unable to connect at (eval 111) line 152.".

The strange things about this was, that the OpenLDAP daemon (slapd) was running and answering requests. After restarting Zimbra (/etc/init.d/zimbra restart), the problem disappeared, however it reappeared after the next reboot.

After some time I figured out, that - right after the reboot - slapd was only listening on an IPv4 socket, not on an IPv6 socket. After restarting the OpenLDAP server (ldap stop && ldap start as user zimbra), the problem disappeared again and netstat showed that now slapd was also listening on the IPv6 socket.

In the end I could not figure out, why the OpenLDAP daemon would only listen on IPv4 when started during system boot but would listen on both IPv4 and IPv6 when started later. I was suspecting some problem with name resolution in the early boot process (although both the IPv4 and the IPv6 address were listed in /etc/hosts).

However, I found a work-around for the problem: By setting the local configuration option ldap_bind_url to ldap:/// (zmlocalconfig -e ldap_bind_url=ldap:///) , I could configure OpenLDAP to listen on all local interfaces, which apparently fixed the problem.

RTFM or better don't...

While I am writing about curious bugs, here is another one, although technically it is not really a bug.

When setting up Icinga with mod_gearman, I wondered why service-checks where running on the assigned mod_gearman worker node, but host-checks were running on the main Icinga server and were not distributed using mod_gearman. I checked the configuration again and again, but could not find an error. Also searching the web did not bring much useful information.

The only thing that I could find were hints that do_hostchecks had to set to "yes" in /etc/mod-gearman/module.conf. But according to the mod_gearman documentation, this option was set to "yes" by default.

Well, as it turns out, the flag is set to "no" by default, at least in the version of mod_gearman that is available in the software repositories of Ubuntu 12.04 LTS (Precise Pangolin). By the way, the manual that is distributed in the source archive of mod_gearman 1.2.2 (this is the same version that comes with Ubuntu) says the same, so it is not a thing that was changed recently.

OpenDKIM bug in Zimbra Collaboration Server

Recently I stumbled across a bug in the OpenDKIM configuration of the Zimbra Collaboration Server.

In ZCS 8.0.3 (Community Edition, but I guess the same applies to the Network Edition), the file /opt/zimbra/conf/ specifies the socket that OpenDKIM listens on in the following way:

Socket                %%zimbraInetMode%%:8465@[%%zimbraLocalBindAddress%%]

This results in the following socket address of "inet6:8465@[::1]" in the final file (opendkim.conf). However, the Postfix configuration file /opt/zimbra/postfix/conf/ specifies the socket as "inet:localhost:8465". This leads to Postfix trying to connect to an IPv4 socket, while OpenDKIM is listening on an IPv6 socket, so that the connection cannot be established.

The fix is quite easy: By changing "%%zimbraInetMode%%:8465@[%%zimbraLocalBindAddress%%]" to "inet:8465@[]" in and restarting Zimbra, OpenDKIM can be made to listen on an IPv4 socket, so that Postfix can connect again.

The curious thing is, that this bug has already been reported half a year ago and has supposedly been fixed. However, it seems like this fix was only applied to the 9.0 branch of Zimbra and not to Zimbra 8.0.

Update on KVM Shutdown on Ubuntu

About two years ago, I wrote an article about how to make libvirt on Ubuntu 10.04 LTS to shutdown the virtual machines gracefully, when the host system is shutdown or rebooted.

Now I recently found out, that they implemented a similar approach in Ubuntu 12.04 LTS. The only problem with this is, that the default timeout is too short (30 seconds) for virtual machines running complex services. Therefore, I documented how to change this timeout in my wiki.

Less Trouble with KVM virtio and DHCP

In an earlier blog post I claimed that I was seeing problems with VMs using the virtio driver for networking on an Ubuntu 12.04 LTS KVM host using DHCP.

However, as far as I am concerned, this claim was wrong. I now figured out, that the messages about bad UDP checksums had nothing to do with my problem. I was rather experiencing the problems caused by a configuration that did not list the VLAN network interface (eth0.X) on which the DHCP relay agent received the answers from the DHCP server.

The mean thing is, that switch away from virtio fixed this problem. However, this was not because of the UDP checksums now being right (this was merely a side effect). It fixed the problem, because when not using the virtio driver, the DHCP relay agent would receive the answer packets, even if they were received on a VLAN interface it was not listening to. I can only guess that the implementation for VLAN-tagged interfaces is slightly different when using the virtio driver.

After adding the interface to the list of interfaces used by the DHCP relay agent, the DHCP packets are relayed correctly, even if using the virtio driver. The messages about bad UDP checksums now reappeared in the log file, but obviously this is not causing any trouble.

On the other hand, according to a bug report some users really seem to have problems with DHCP when using the virtio driver. However, this might only affect Ubuntu 12.04 LTS guests but not VMs on a Ubuntu 12.04 LTS host.

Trouble with KVM virtio and DHCP

Lately I experienced a problem with KVM-based virtual machine running a DHCP server and another one running a DHCP relay (for both I use the ISC implementations). The DHCP relay was complaining about "bad udp checksums". Using tcpdump and wireshark I quickly found out, that the software was right and the UDP checksums were in fact wrong. After some searching, I found a bug report, that basically described the same problem.

Although I cannot verify this, I think the problem might be related to the fact that I recently upgraded the host machine from Ubuntu 10.04 LTS (Lucid) to Ubuntu 12.04 LTS (Precise). As a workaround, I deactivated the use of the "virtio" support for the network interface in both virtual machines, which seems to fix the problem, because then the UDP checksums are correct.

However, when I performed the same change for a virtual machine still running on an Ubuntu 10.04 LTS host, this actually caused a problem: If VLAN interfaces are used inside the virtual machine, the normal non-virtio driver will screw things up on a Ubuntu 10.04 LTS host.

Long story short: For virtual machines running on a Ubuntu 10.04 LTS host you should use and for a Ubuntu 12.04 LTS you should avoid virtio networking.

Update [2012-07-08]: It seems like the conclusions I draw in this article are actually wrong. Therefore I posted an update clarifying the situation.

Trouble with globally installed Firefox extensions after software upgrade in Ubuntu

Some time ago Ubuntu 10.04 LTS received a Firefox update from the 3.6 branch to the more recent versions (10.0+).

After that upgrade, Firefox on one of my Ubuntu systems suddenly appeared in English instead of the correct locale (German in my case). I first thought that this might be a problem with some localization packages not being installed correctly. However, the problem persisted after upgrading the system to Ubuntu 12.04 LTS.

Some globally extensions (in particular the language pack) showed up in an old version in the add-ons list, although the newest version was installed. Finally I found out, that Firefox looks for the extensions in /usr/lib/firefox/extensions. The new language packs however had been installed in /usr/lib/firefox-addons/extensions. On other systems, /usr/lib/firefox/extensions is a symbol link to /usr/lib/firefox-addons/extensions. In my case however the directory existed and contained the files from the old versions of the language packs.

For some reasons, the old language packs (which had different package names) had not been removed and thus the Firefox upgrade did not place the symbol link (because the directory was not empty). After manually removing the old versions of the language packs, deleting the directory and reinstalling Firefox, the symbol link was created automatically and suddenly the globally installed Firefox add-ons worked again.

KVM and Graceful Shutdown on Ubuntu

For quite some time, I have been trying to figure out, how to gracefully shutdown the KVM-based virtual machines running on a Ubuntu 10.04 LTS (Lucid Lynx) host system. This problem consists of two parts: First you have to make the virtual machines support the shutdown event from libvirt and second you have to call the shutdown action for each virtual machine on system shutdown.

The first part is very easy for Linux VMs and also not too hard for Windows VMs. I described the necessary steps in my wiki

The seconds part is harder to accomplish: On Ubuntu 8.04 LTS (Hardy Heron) I just modified the /etc/init.d/libvirt-bin script to call a Python script in the stop action. This solution was not perfect, as it meant that the virtual machines were also shutdown, when libvirtd was just restarted, however it was a quick and easy solution.

For Ubuntu 10.04, the init script has been converted to an Upstart job. So the easiest way was to create a upstart job that is starting on the stopping libvirt-bin event. However, this did not solve the problem, because the system powered off or rebooted before the shutdown of the virtual machines was finished. As it turns out, Ubuntu 10.04 uses an odd combination of Upstart jobs and traditional init scripts. This leads to a situation, where /etc/init.d/halt or /etc/init.d/reboot are called, before all upstart jobs have stopped, when one of the upstart jobs needs a significant amount of time to stop. This can be solved by adding an init script, than runs before the halt or reboot scripts and waits for the respective Upstart job to finish. In fact, it is best to run this script before the sendsigs script to avoid processes started by one of the upstart jobs to receive a SIGKILL.

I added the complete scripts and configuration files needed for this feature to my wiki. In fact, this solution also ensures, that the virtual machines are only shutdown if libvirt is stopped because of a runlevel change. Thus, the libvirt-bin package can now be upgraded without resulting in a restart of the VMs.

For me, automatically shutting down the virtual machines is very important. The KVM hosts I manage are connected to an uninterruptible power supply with limited battery time. Although in the past years I remember only a single time, the host systems were shutdown because the battery was nearly empty (most power interrupts are very short), I want to make sure that all virtual machines are in a safe, consistent state, when the power finally goes off. So I hope that the scripts in the wiki are also helpful to other people, having the same problem.