Skip to content

Apt: Writing more data than expected

If apt (apt-get, aptitude, etc.) fails with an error message like

Get:1 http://example.com/ubuntu xenial/main amd64 python-tornado amd64 4.2.1-2~ds+1 [275 kB]
Err:1 http://example.com/ubuntu xenial/main amd64 python-tornado amd64 4.2.1-2~ds+1
  Writing more data than expected (275122 > 275100)

but the file in the repository has the expected size, a caching proxy (e.g. Apt-Cacher-NG) might be at fault. This can happen when the package in the repository has been changed instead of releasing a new package with a new version number. This will typically not happen for the official repositories, but it might happen for third-party repositories.

In this case, there are only two solutions: Bypass the proxy or remove the old file from the proxy's cache. In the case of Apt-Cacher-NG, this can be achived by going to the web interface, checking the “Validate by file name AND file directory (use with care),” and “then validate file contents through checksum (SLOW), also detecting corrupt files,” options and clicking “Start Scan and/or Expiration”. This scan should detect the broken packages, which can then be selected by checking “Tag” next to each package and subsequently deleted by clicking “Delete selected files”.

Using CRLs in Icinga 2

Icinga 2.x offers a cluster mode which (from an administrator's point of view) is one of the most important features introduces with the 2.x release. Using the cluster feature, check commands can be executed on satellite nodes or even the complete scheduling of checks can be delegated to other nodes, while still keeping the configuration in a single place.

In order to enable secure communication within the cluster, Icinga 2 uses a public key infrastructure (PKI). This PKI can be managed with the icinga2 pki commands. However, there is no command for generating a CRL. For this reason, it is necessary to use the openssl ca command for generating a CRL. I have documented the steps necessary for generating a CRL in my wiki.

Funnily, it seems like no one has used a CRL in Icinga 2 so far. I know this, because up to today, Icinga 2 has a bug that makes it impossible to load a CRL. Luckily, yours truly already fixed this bug and this bugfix is going to be included in the next Icinga 2 release.

I find it strange that obviously no one is using CRLs, because Icinga 2 uses a very long validity period when generating certificates (15 years), so it is quite likely that at some point a node is decommissioned and thus the corresponding certificate shall be removed.

Design flaws of the Linux page cache

Like most modern operating systems, Linux has a page cache. This means that when reading from or writing to a file, the data is actually cached in the system’s memory, so that subsequent read requests can be served much faster, without having to access the non-volatile storage.

This is an extremely useful facility, which has a huge impact on system performance. For typical workloads, a significant amount of data is used repeatedly and caching this data reduces the latency dramatically. For example, imagine that every time you ran “ls”, the executable would actually be read from the hard disk. This would undoubtedly make the interactive experience much less responsive.

Unfortunately, there is a downside to how Linux caches file-system data: As long as there is free system memory, the caching is a no-brainer and simply caching everything works perfectly. After some time however, the system memory will be used completely and the kernel has to decide how to free memory in order to cache I/O data. Basically, Linux will evict old data from the cache and even move process data to the swap area in order to make space for data newly arriving in the cache. In most cases, this makes sense: If data has not been used for some time (even data that is not from a disk but belongs to a process which simply has not needed it in some time), it makes sense to evict it from the cache or put it into the swap area so that the fast memory can be used for fresh data that is more likely to be needed again soon. Most times, these heuristics work great, but there is one big exception: Processes that read or write a lot of data, but only do so once.

For example, imagine a backup process. A backup process will read nearly all data stored on disk (and maybe write it to an archive file), but caching the data from this read or write operations does not make a lot of sense because it is not very likely to be accessed again soon (at least not more likely than if the backup process did not run at all). However, the Linux page cache will still cache that data and move other data out of memory. Now, accessing that data again will result in slow read requests from the hardware, effectively slowing the system down, sometimes to a point that is worse than if the page cache had been disabled completely.

For a long time, I never thought about this problem. In retrospect, systems sometimes seemed slow after running a backup, but I never connected the dots until recently I started to see a problem that looked weird at first: Every night, I got an e-mail from the Icinga monitoring system warning me about the swap space on two (very similar) systems running extremely low. First, I expected that some nightly maintenance problem might simply need a lot of memory (I recently had installed a software upgrade on these systems), so I assigned more memory to the virtual machines. I expected that the amount of free swap space would increase by the amount of extra memory assigned, but it did not. The swap space was still used almost completely. Therefore, I looked at the memory consumption while the problem was present, and the result was astonishing: Both memory and swap were virtually fully used, but about 90 percent of the memory was actually used by the page cache, not by any process.

Investigating the problem closer revelealed that the high usage of swap space always occurred shortly after the nightly backup process started running. To be absolutely sure, I manually started the process during the day and I could immediately see the page cache and swap usage growing rapidly. It was clear that the backup process reading a lot of data was responsible for the problem. If you think about it, this is not the kernel’s fault: It simply cannot know that the data will not be needed again and moving other data out of the way is actually counter productive.

So everything that is needed to fix this kind of problem is a way to tell the kernel “please do not cache the data that I read, I will not need it again anyway”. While this sounds simple, it actually is a huge problem with current kernel releases.

There is no way to tell the kernel “do not cache the data accessed by this process”. In fact, there are only four ways for influencing the caching behavior that I am aware of:

  1. The page cache can be disabled globally for the time of running the backup process. While it might alleviate the impact of the problem, it is still not desirable because the page cache actually is useful and disabling it will have a negative impact on system performance. However, during the backup process, disabling the page cache might still result in better performance than not doing anything at all.
  2. Some suggest mounting the file-system with the “sync” flag. However, this will only circumvent the page cache for write requests, not for read requests. It also has very negative impacts on performance, so I only list it here because it is suggested by some people.
  3. The files can be opened with the O_DIRECT flag. This tells the kernel that I/O should bypass the caches. Unfortunately, using O_DIRECT has a lot of side effects. In particular, the size and offset of the memory buffer used for I/O operations has to match certain alignment restrictions that depend on the Kernel version used and the file-system type. There is no API for querying these restrictions, so one can only choose alignment to a rather large size and hope that it is sufficient. Of couse, this also means that simply modifying the code of an application that opens a file is not sufficient. Every piece of code that reads from or writes to a file has to be changed so that it matches the alignment restrictions. This definitely is not an easy task, so using O_DIRECT is more of a theoretical approach.
  4. Last but not least, there is the posix_fadvise function. This function allows a program in user space to tell the kernel how it is going to use a file (e.g. read it sequentially), so that the kernel can optimize things like the file-system cache. There are two flags that can be passed to this function which sound particularly promising: According to the man page, the POSIX_FADV_NOREUSE flag specifies that “the specified data will be accessed only once” and the POSIX_FADV_DONTNEED flag specifies that “the specified data will not be accessed in the near future”.

So it sounds like posix_fadvise can solve the problem and all we need to do is find a way to tell the program (tar in my case) to call it. As it turns out, there even is a small tool called “nocache” that acts as a wrapper around arbitrary programs and uses the LD_PRELOAD mechanism to catch attempts to open a file and adds POSIX_FADV_NOREUSE right after opening the file.

This would be the perfect solution if the Linux kernel actually cared about POSIX_FADV_NOREUSE. Unfortunately, till this day, it simply ignores this flag. In 2011, there was a patch that tried to add support for this flag to the kernel, but it never made it into the mainline kernel (the reasons are unknown to me).

Actually, the nocache tool is aware of this and adds a workaround: When closing the file, it calls posix_fadvise with the POSIX_FADV_DONTNEED flag. It has to do this when closing the file (instead of when opening it) because POSIX_FADV_DONTNEED only removes data from the page cache, it does not prevent it from being added to the page cache.

So I installed nocache, and wrapped the call to tar with it, expecting that this would finally solve the problem. Surprisingly, it did not. At first, it seemed like the page cache was filling less rapidly, but after a while, everything looked quite like before. I tried to add the “-n” parameter to nocache, telling it to call posix_fadvise multiple times (the documentation suggested to do so), but this did not help either.

After giving this some more thought, I realized why: Using POSIX_FADV_DONTNEED when closing the file works great when working with many small or medium-sized files. For large files, however, it does not work. By the time it is called, all of the file has already been put into the page cache, causing the same problems as when many small files are read without nocache. This means that posix_fadvise has to be called repeatedly while reading from the file to ensure that the amount of data cached never grows too large. While there are only a few API calls for opening files, there are ways for reading from a file (e.g. memory-mapped I/O) that nocache simply cannot catch. This means that the only solution is actually patching the program that is reading or writing data.

This is why I created a patch for GNU tar version 1.29. This patch calls posix_fadvise after reading each block of data and thus ensures that the page cache is never polluted by tar. Unfortunately, this patch is not portable, nor does it provide a command-line argument for enabling or disabling this behavior, so it is not really suitable for inclusion into the general source code of GNU tar. The patch only takes care of not polluting the file-system cache when reading files. It does not do so when writing them, but for me this is sufficient because in my backup script, tar writes to the standard output anyway.

When using this patched version of tar, the memory problems disappear completely. This makes me confident that the change is sufficient, so I am going to use this patched version of tar for all systems that I backup with tar. Unfortunately, I use Bareos for the backup of most systems, so I will have to find a solution for that software, too. Maybe we are lucky, and support for POSIX_FADV_NOREUSE will finally be added to Linux at some point in the future, but until then patching the software involved in the backup process seems like the only feasible way to go.

ISC DHCP server not starting on Ubuntu 16.04 LTS

After upgrading two systems running the ISC DHCP server from Ubuntu 14.04 LTS (Trusty) to Ubuntu 16.04 LTS (Xenial), I experienced some trouble with the DHCP server not starting on system boot. The log file contained the following message:

Internet Systems Consortium DHCP Server 4.3.3
Copyright 2004-2015 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /etc/dhcp/dhcpd.conf
Database file: /var/lib/dhcp/dhcpd.leases
PID file: /run/dhcp-server/dhcpd.pid
Wrote 0 class decls to leases file.
Wrote 0 deleted host decls to leases file.
Wrote 0 new dynamic host decls to leases file.
Wrote 389 leases to leases file.

No subnet declaration for ovsbr0p1 (no IPv4 addresses).
** Ignoring requests on ovsbr0p1.  If this is not what
   you want, please write a subnet declaration
   in your dhcpd.conf file for the network segment
   to which interface ovsbr0p1 is attached. **

 

Not configured to listen on any interfaces!

If you think you have received this message due to a bug rather
than a configuration issue please read the section on submitting
bugs on either our web page at www.isc.org or in the README file
before submitting a bug.  These pages explain the proper
process and the information we find helpful for debugging..

exiting.

Obviously the problem was that the DHCP server was started before the network interface ovsbr0p1 (the primary network interface of the host) was brought up. I guessed that this problem was somehow related to the migration from Upstart to Systemd. Still, it was strange because the isc-dhcp-server.service was configured to be started after the network-online.target.

I could not figure out why Systemd though that the network-online.target had been reached before the interface was configured completely, but I found a simple workaround: I added a file /etc/systemd/system/isc-dhcp-server.service.d/retry.conf with the following content:

[Service]
Restart=on-failure
RestartSec=15

This means that the first start of the DHCP server will still fail most of the time, but Systemd will retry 15 seconds later and typically the network interface will be online by then (if it is not, Systemd will try again another 15 seconds later).

Network problems after upgrading to Ubuntu 16.04 LTS

After upgrading a virtual machine from Ubuntu 14.04 LTS to Ubuntu 16.04 LTS, I was getting weird problems with the network configuration. The network would still be brought up, but scripts that were specified in /etc/network/interfaces would not be run when the corresponding interface was brought up.

The logfile /var/log/syslog would contain messages like "ifup: interface eth0 already configured". On other VMs that had also been updated from Ubuntu 14.04 LTS and uses a similar network configuration, this problem would not appear.

I found the solution to this problem in the Ubuntu Forums: There was a file /etc/udev/rules.d/85-ifupdown.rules that caused problems with the network initialization. After deleting this file, the problems went away. I guess that this file was present in a rather old release of Ubuntu and thus the problem only appears when a system has previously been upgraded from that release of Ubuntu.

Disabling the annoying "Visit ..." entry in the Firefox address-bar drop-down - Part 2

Some time ago, I wrote about how to disable the annoying “Visit ...” entry in the Firefox address-bar drop-down.

Unfortunately, this method does not work any longer with Firefox 48. Fortunately, I found an article that give a great overview of the methods that still work with Firefox 48.

I chose the option to install Classic Theme Restorer addon because I felt that this option might be a bit more stable regarding future updates than manually tweaking the CSS.

Open vSwitch and Multicasting

Recently, I noticed the following messages in the system log of a Ubuntu 14.04 LTS host that is running radvd:

Jun 28 13:15:33 myhost radvd[5782]:    do you need to add the UnicastOnly flag?
Jun 28 13:15:33 myhost radvd[5782]: interface ovsbr0v20p0 does not support multicast

At first, I was surprised, but after writing a small program, that checks for the IFF_MULTICAST flag in the interfaces attributes, I realized that the interface in fact does not support multicasts (or at least says so).

As it turns out, virtual interfaces added to an Open vSwitch bridge do not support multicasts in older versions of Open vSwitch (Ubuntu 14.04 LTS ships with Open vSwitch 2.0.2). I cannot tell for sure, in which version muticast support has been added. Looking at the changelog, it looks like this is present since Open vSwitch 2.4.0. Anyway, the version of Open vSwitch shipped with Ubuntu 16.04 LTS (Open vSwitch 2.5.0) supports multicasts on virtual interfaces and the IFF_MULTICAST flag is set for those interfaces.

This means that radvd should not have any problems when using an Open vSwitch virtual interface on Ubuntu 16.04 LTS.

Bug in the Apache Maven Javadoc Plugin

This afternoon, I spent several hours to figure out a problem that in the end turned out to be a bug in the Apache Maven Javadoc Plugin (version 2.10.3).

I wanted to use a custom stylesheet when building the Javadocs of all modules of a multi-module Maven project, so I generated a JAR that contained the stylesheet file, added it to the dependencies of the plugin and referenced it in the <stylesheetfile> tag.

To my suprise, Maven kept complaining with a message like

[WARNING] Unable to find the resource 'path/to/my/stylesheet.css'. Using default Javadoc resources.

I checked everything, tried various ways to configure the dependency, etc., but I could not get it to work. So I resorted to the last thing one can do when a software does not work as expected. I grabbed the source code of the plugin, found the relevant part that generated the message, and attached to the Maven process with a debugger. As it turned out, the problem was actually caused by a bug in the plugin that lead to resources from dependencies not being resolved correctly.

I filed a bug report and attached a patch to the bug report that fixes the problem for me. I hope that this patch will soon make it into a release version of the plugin. Until then, maybe this article helps someone else by saving the time to look for the cause of the issue.

UDP sockets broken again in Ubuntu 14.04 LTS

Some time ago, a regression was introduced into the 3.13 line kernel used by Ubuntu 14.04 LTS that broke UDP sockets when they were used in a certain way (e.g. like FreeRADIUS does). This bug was fixed in 3.13.0-67 and I hoped to never see it again.

Two days ago, I realized that one of our RADIUS servers was not working correctly any longer. I could not tell how long this problem had existed (the second RADIUS server still worked and in monitoring the primary one also worked, so the problem went undetected for a very long time).

After looking for the cause of the problem for quite some time, I remembered the problem described earlier and tried an old kernel version. Bingo! This fixed the problem. After looking at the changelog of the current 3.13 line kernel from trusty-proposed (that also fixes the problem) I found a reference to another bug report that describes the problem (don't be fooled by the bug's description, it also applies to IPv4).

As it turns out, the first regression had been caused by backporting an optimization regarding UDP checksum calculation from a newer Linux kernel. However, this change exposed a problem that had been fixed in the newer kernel but not in Ubuntu's branch of kernel 3.13. This regression was fixed by simply removing the patch again. This was okay because it was just an optimization.

Some time later, someone (who obviously was not aware of this regression) again thought that backporting the optimzation was a good idea, so it got reintroduced in 3.13.0-69. Now, it looks like they fixed the bug in 3.13.0-78 by actually fixing the underlying problem and not by removing the patch again. Therefore, I hope that we will not see the regression a third time. However, I am a bit annoyed that they did not do better testing when backporting the patch after there had already been a regression around it once. Maybe the Ubuntu team's decision to not use a kernel with long-term support and do maintenance themselves was not so wise after all.

Disabling the annoying "Visit ..." entry in the Firefox address-bar drop-down

I find the new address bar features introduced in a recent Firefox version very annoying. The suggestion of search terms can be disabled easily (does anyone really want to have everything entered into the address bar sent to an external service?), but this still leaves this annoying "Visit ..." entry at the top of the list of visited addresses.

This entry is annoying for several reasons:

  1. It does not add any functionality: Just hitting enter has (nearly) the same effect as selecting this entry.
  2. It can easily be selected accidentally when you actually want the top entry from the list of visited addresses.
  3. Typically, it does not suggest the address you actually want to visit for two reasons: First, it suggests visiting the top URL of the suggested site, even if you always visit a specific path. Second, it suggest a plain (HTTP) URL, even if the site actually only supports HTTPS and you never visited it with HTTP.

I have no idea, why they added such a stupid feature to Firefox at all (I suspect that it was originally designed for a different purpose that really only makes sense if you also enable the other features) and why they did not add an option to the UI for disabling it.

Fortunately, someone found out that it can be disabled by setting the browser.urlbar.unifiedcomplete option to false in about:config. Seeing how often the page with the solution has already been visited, I am definitely not the only person who is annoyed by this stupid new feature.

Spring's @RequestMapping annotation works on private methods

Recently, I spent a lot of time on debugging a nasty problem with Spring WebMVC and Spring Security.

I had a class annotated with @Controller and a method annotated with @RequestMapping. I wanted to protected this method using the @Secured annotation. So I turned on global method security by adding @EnabledGlobalMethodSecurity with the right parameters to my @Configuration class, but it did not work. The method could still be called without having the proper privileges (or being authenticated at all).

After hours of debugging, I found out that the AOP advice was not applied to my controller method because it would not find the method when processing the controller class. At that moment I realized that the method had been declared package private. AOP proxies are not applied to non-public methods (for CGLIB proxies this would be possible, but in general it is not desirable and Spring does not do it).

This left me with the question: Why does the request mapping work. The answer is simple: When looking for methods with the @RequestMapping annotation, Spring does not check the method's access modifiers. As the method is invoked using reflection, it will work even if the method has been declared private (unless there is a SecurityManager in charge, but for most Spring applications there will not be one).

This leaves us with a very awkward situation: Private methods might be called by external code and if there is an @Secured annotation on them, it will be ignored. In my opinion, this is a bug: The @RequestMapping annotation should only work on public methods. There are actually four places in Spring where this could be fixed (Spring 4.1.7):

  1. AbstractHandlerMapping.java: line 172
  2. AbstractHandlerMapping.java: line 207
  3. HandlerMethodSelector.java: line 60
  4. RequestMappingHandlerMapping.java: line 187

It would be completely sufficient to check whether the method is public in one of these places. Until this is fixed in Spring (and it might never get fixed because the fix would break backward compatibility), I use my own RequestMappingHandlerMapping which does the check:

public class PublicOnlyRequestMappingHandlerMapping extends
        RequestMappingHandlerMapping {

    @Override
    protected RequestMappingInfo getMappingForMethod(Method method,
            Class<?> handlerType) {
        RequestMappingInfo info = super
                .getMappingForMethod(method, handlerType);
        if (info != null && !Modifier.isPublic(method.getModifiers())) {
            logger.warn("Ignoring non-public method with @RequestMapping annotation: "
                    + method);
            return null;
        } else {
            return info;
        }
    }

}

As you can see, the implementation is very simple. I first call the super method and then check whether the method is public so that I can generate a warning message when @RequestMapping has been used on a non-public method. If one does not care about such a message, once can check the method's access modifier fist and only invoke the super method when the investigated method is public.

In order to use the custom RequestMappingHandlerMapping, we have to use a custom implementation of WebMvcConfigurationSupport (when using Java Config):

@Configuration
public class CustomWebMvcConfiguration extends
        DelegatingWebMvcConfiguration {

    @Bean
    @Override
    public RequestMappingHandlerAdapter requestMappingHandlerAdapter() {
        RequestMappingHandlerAdapter adapter = super
                .requestMappingHandlerAdapter();
        adapter.setIgnoreDefaultModelOnRedirect(true);
        return adapter;
    }

    @Bean
    @Override
    public RequestMappingHandlerMapping requestMappingHandlerMapping() {
        RequestMappingHandlerMapping handlerMapping = new PublicOnlyRequestMappingHandlerMapping();
        handlerMapping.setOrder(0);
        handlerMapping.setInterceptors(getInterceptors());
        handlerMapping
                .setContentNegotiationManager(mvcContentNegotiationManager());

        PathMatchConfigurer configurer = getPathMatchConfigurer();
        if (configurer.isUseSuffixPatternMatch() != null) {
            handlerMapping.setUseSuffixPatternMatch(configurer
                    .isUseSuffixPatternMatch());
        }
        if (configurer.isUseRegisteredSuffixPatternMatch() != null) {
            handlerMapping.setUseRegisteredSuffixPatternMatch(configurer
                    .isUseRegisteredSuffixPatternMatch());
        }
        if (configurer.isUseTrailingSlashMatch() != null) {
            handlerMapping.setUseTrailingSlashMatch(configurer
                    .isUseTrailingSlashMatch());
        }
        if (configurer.getPathMatcher() != null) {
            handlerMapping.setPathMatcher(configurer.getPathMatcher());
        }
        if (configurer.getUrlPathHelper() != null) {
            handlerMapping.setUrlPathHelper(configurer.getUrlPathHelper());
        }

        return handlerMapping;
    }

}

This implementation copies the implementation of requestMappingHandlerMapping() from the parent class, but replaces the actual implementation used with our own class. In addition to that, this configuration also overrides requestMappingHandlerAdapter() in order to the the ignoreDefaultModelOnRedirect attribute. This is the recommended setting for new Spring WebMVC applications, but it cannot be made the default in Spring because it would break backward compatibility. Of course, the two changes are completely independent, so you can choose to only implement either of them.

Using the Red Pitaya with an Asus N10 Nano WiFi dongle

Recently, I got a Red Pitaya, which is a very neat toy for my electronics lab. A few months ago I already got a BitScope Micro. Back then, the Red Pitaya costed about 450 EUR, while the BitScope Micro (with BNC Adapter) costed  only about 150 EUR. However, compared to the Red Pitaya, the features of the BitScope Micro are quite limited. In particular, the features of the signal generator are quite limited and the sampling rate (and sampling length) of the two analog inputs is low. Now, the Red Pitaya has been reduced in price, so that it is available for about 250 EUR, so I could not resist any longer.

One of the many neat features of the Red Pitaya is the fact that you do not need a PC with any special software installed and a USB connection to use it. You can simply connect over the network with a browser, so a table it enough to use it, making it much more flexible in use. However, by default a wired Ethernet connection is still needed.

Luckily, it can act as a WiFi access point when installing the right WiFi USB dongle. The Red Pitaya manual recommends the Edimax EW7811Un, but the supplier where I ordered the Red Pitaya did not have this dongle on stock. Basically, the choice of the dongle is limited by the kernel module(s) compiled into the Linux kernel used by the Red Pitaya eco-system, so any device that is compatible with the rtl8192cu driver should work.

Therefore, I got an Asus N10 nano USB WiFi dongle, which supposedly uses a chip from this family. There seem to be two versions of this device and only one of them is using the Realtek chip. However, the datasheet from the supplier explictly specified the Realtek chip, so I expected it to work.

When the hardware arrived, I prepared the Micro SD card for the Red Pitaya, plugged everything in, and to my surprise the WiFi did not show up. So I added a wired connection in order to login using SSH and investigate the situation.

The USB WiFi dongle was detected by Linux (as it showed up in /sys/bus/usb), however, the corresponding network device was missing and could not be brought up. In the end, I found out, that the driver was not enabled, because the Linux kernel was rather old and the device ID was simply not known yet to the driver.

Such a problem can easily be fixed by explicitly telling the driver to support the device ID (echo "0b05 17ba" >/sys/bus/usb/drivers/rtl8192cu/new_id for my Asus N10 nano WiFI USB dongle) and suddenly I could bring up the network device wlan0. Unfortunately, there was still no "Red Pitaya" WiFI available.

A closer investigation of the startup scripts showed, that the software for hosting an access point was only started, when the device wlan0 was already available during startup. Luckily, there was a slightly uncoventional but rather easy way to ensure this: The necessary code could be added to the file /etc/network/config on the SD card (/opt/etc/network/config in the Red Pitaya's file system). This file is sourced by the initialization script, so any shell code present in this file will be executed before bringing up the network. I simply added the following lines to the beginning of the file:

# Add device ID for Asus N10 Nano to rtl8192cu driver
echo "0b05 17ba" >/sys/bus/usb/drivers/rtl8192cu/new_id

After making this change and rebooting the Red Pitaya, the WiFi access-point mode worked like a charm.

Debian installer with custom proxy and custom repository key

When using the mirror/http/proxy option in a preseed file for the Debian installer, I experienced a problem: The GPG key for a local Apt repository could not be loaded because the installer is using the proxy when downloading the GPG key. This fails, if the proxy is an instance of Apt-Cacher NG (which only allows access to certain well-defined paths).

As it turns out, I am not the first one to experience this problem: There is an open bug in both the Ubuntu and the Debian bug trackers. Unfortunately, these bugs have been open for four years without any significant action.

Luckily, there is a workaround: By specifying a URL starting with "file:///" instead of "http://" in the preseed file, this bug can be avoided. Of course, you have to use a preseed script at the early stage to actually download the file to the specified path. However, this is possible, because the preseed script does not have to use the Apt-Cacher NG proxy.

You might want to add unset http_proxy to the start of every shell script that you run from the preseed file. This includes scripts that are run through in-target, because in-target seems to add this environment variable.

Better entropy in virtual machines

I had the problem that a Tomcat 7 server in a virtual machine would take ages to start, even though there were only a few rather small applications deployed in it. I think that these problems first appeared after upgrading to Ubuntu 14.04 LTS, but this might just have been a coincidence. Fortunately, the log file gave a hint to the cause of the problem:

INFO: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [206,789] milliseconds.
[...]
INFO: Server startup in 220309 ms

So the initialization of the random number generator (RNG) was responsible for most of the startup time. When you think about it, this is not that surprising: When the system has just booted, there is virtually no entropy available, so the read from /dev/random might block for a very long time. In a physical system, one can use something like haveged or a hardware RNG to fill the kernel's entropy pool, but what about a virtual machine?

Luckily, in recent versions of Linux KVM and libvirt, there is a way to feed entropy from a virtualization host to a virtual machine. In the virtual machine, the device appears as a hardware RNG (/dev/hwrng). Have a look at my wiki for a configuration example.

In the virtual machine, one still needs to read the data from the virtual RNG and feed it into the kernel's entropy pool. For this purpose, the daemon from the rng-tools package does a good job.

Using the combination of haveged in the VM host and rng-tools in the VM, I could significantly boost the startup time of the Tomcat server:

INFO: Server startup in 11831 ms

Process tracking in Upstart

Recently, I exprienced a problem with the process tracking in Upstart:

I wanted start a daemon process running as a specific user. However, I needed root privileges in the pre-start script, so I could not use the setuid/setgid options. I tried to use su in the exec option, but then Upstart would not track the right process. The expect fork and expect daemon options did not help either. As a nasty side effect, these options cannot be tested easily, because having the wrong option will lead to Upstart waiting for an already dead process to die and there is no way to reset the status in Upstart. At least, there is a workaround for effectively resetting the status without restarting the whole computer.

The problem is that su forks when running the command it is asked to run instead of calling exec from the main process. Unfortunately, The process I had to run would fork again because I had to run it as a daemon (not running it as a daemon had some undesirable side effects). Finally, I found the solution: Instead of using su, start-stop-daemon can be used. This tool will not fork and therefore it will not upset Upstart's process tracking. For example, the line

exec start-stop-daemon --start --chuid daemonuser --exec /bin/server_cmd
will run /bin/server_cmd as daemonuser without forking.

This way, expect fork or expect daemon can be used, just depending on the fork behavior of the final process.