Skip to content

Issues with SYSVOL share after installing KB3161561

Recently, I got funny issues with group policies on Windows Server 2012 R2. These issues manifested themselves with the following symptoms:

  • When trying to edit a group policy, the Group Policy Management tool would present an error like “Group Policy Error: You do not have permission to perfrom this operation. Details: Access is denied.” The Group Policy Management Editor would still open, but the group policy would not be displayed.
  • Sometimes, the group policy editor would open, but when trying to navigate through the tree, it would display an error message like “Error (0x80070005) occurred parsing file. Access is denied.”  I believe that this error is only present when using the central store for administrative templates.
  • The event log would contain messages like: “The processing of Group Policy failed. Windows attempted to read the file \\domain\sysvol\domain\Policies\uuid\gpt.ini from a domain controller and was not successful. Group Policy settings may not be applied until this event is resolved. This issue may be transient and could be caused by one or more of the following:
    a) Name Resolution/Network Connectivity to the current domain controller.
    b) File Replication Service Latency (a file created on another domain controller has not replicated to the current domain controller).
    c) The Distributed File System (DFS) client has been disabled.”
  • When trying to open \\\SYSVOL in the file brower, a prompt to enter credentials or an “Access is denied” error message would be displayed.

Like suggested in the TechNet forums, disabling the “Hardened UNC paths” feature that was introduced with KB3000483 fixed these issues, but obviously this is not a solution because this will actually reintroduce the vulnerability (MITM-attack on SYSVOL share) that was addressed by KB3000483.

After some time, I realized that these problems had first appeared after installing the June security updates, so I looked through the corresponding knowledge base articles and found KB3161561. This article actually mentions (some of) the issues described earlier in the “Known issues in this security update” section. It also offers a different workaround that works without disabling the “Hardened UNC paths” feature: Setting the “SmbServerNameHardeningLevel” to 0. However, using this workaround has other security implications (described in an MSDN article). Last but not least, MS15-083 describes a third workaround that involves disabling version 1 of the SMB protocol on the server, but this workaround did not solve the problem for me.

Changing the “SmbServerNameHardeningLevel” to 0 might not work when this setting is reset by a group policy (as it was in my case). In this case, the corresponding group policy needs to be changed and the “Computer Configuration\Windows Settings\Local Policies\Security Options\Microsoft network server: Server SPN target name validation level” option needs to be set to “Off”.

Open vSwitch and Multicasting

Recently, I noticed the following messages in the system log of a Ubuntu 14.04 LTS host that is running radvd:

Jun 28 13:15:33 myhost radvd[5782]:    do you need to add the UnicastOnly flag?
Jun 28 13:15:33 myhost radvd[5782]: interface ovsbr0v20p0 does not support multicast

At first, I was surprised, but after writing a small program, that checks for the IFF_MULTICAST flag in the interfaces attributes, I realized that the interface in fact does not support multicasts (or at least says so).

As it turns out, virtual interfaces added to an Open vSwitch bridge do not support multicasts in older versions of Open vSwitch (Ubuntu 14.04 LTS ships with Open vSwitch 2.0.2). I cannot tell for sure, in which version muticast support has been added. Looking at the changelog, it looks like this is present since Open vSwitch 2.4.0. Anyway, the version of Open vSwitch shipped with Ubuntu 16.04 LTS (Open vSwitch 2.5.0) supports multicasts on virtual interfaces and the IFF_MULTICAST flag is set for those interfaces.

This means that radvd should not have any problems when using an Open vSwitch virtual interface on Ubuntu 16.04 LTS.

Bug in the Apache Maven Javadoc Plugin

This afternoon, I spent several hours to figure out a problem that in the end turned out to be a bug in the Apache Maven Javadoc Plugin (version 2.10.3).

I wanted to use a custom stylesheet when building the Javadocs of all modules of a multi-module Maven project, so I generated a JAR that contained the stylesheet file, added it to the dependencies of the plugin and referenced it in the <stylesheetfile> tag.

To my suprise, Maven kept complaining with a message like

[WARNING] Unable to find the resource 'path/to/my/stylesheet.css'. Using default Javadoc resources.

I checked everything, tried various ways to configure the dependency, etc., but I could not get it to work. So I resorted to the last thing one can do when a software does not work as expected. I grabbed the source code of the plugin, found the relevant part that generated the message, and attached to the Maven process with a debugger. As it turned out, the problem was actually caused by a bug in the plugin that lead to resources from dependencies not being resolved correctly.

I filed a bug report and attached a patch to the bug report that fixes the problem for me. I hope that this patch will soon make it into a release version of the plugin. Until then, maybe this article helps someone else by saving the time to look for the cause of the issue.

UDP sockets broken again in Ubuntu 14.04 LTS

Some time ago, a regression was introduced into the 3.13 line kernel used by Ubuntu 14.04 LTS that broke UDP sockets when they were used in a certain way (e.g. like FreeRADIUS does). This bug was fixed in 3.13.0-67 and I hoped to never see it again.

Two days ago, I realized that one of our RADIUS servers was not working correctly any longer. I could not tell how long this problem had existed (the second RADIUS server still worked and in monitoring the primary one also worked, so the problem went undetected for a very long time).

After looking for the cause of the problem for quite some time, I remembered the problem described earlier and tried an old kernel version. Bingo! This fixed the problem. After looking at the changelog of the current 3.13 line kernel from trusty-proposed (that also fixes the problem) I found a reference to another bug report that describes the problem (don't be fooled by the bug's description, it also applies to IPv4).

As it turns out, the first regression had been caused by backporting an optimization regarding UDP checksum calculation from a newer Linux kernel. However, this change exposed a problem that had been fixed in the newer kernel but not in Ubuntu's branch of kernel 3.13. This regression was fixed by simply removing the patch again. This was okay because it was just an optimization.

Some time later, someone (who obviously was not aware of this regression) again thought that backporting the optimzation was a good idea, so it got reintroduced in 3.13.0-69. Now, it looks like they fixed the bug in 3.13.0-78 by actually fixing the underlying problem and not by removing the patch again. Therefore, I hope that we will not see the regression a third time. However, I am a bit annoyed that they did not do better testing when backporting the patch after there had already been a regression around it once. Maybe the Ubuntu team's decision to not use a kernel with long-term support and do maintenance themselves was not so wise after all.

Disabling the annoying "Visit ..." entry in the Firefox address-bar drop-down

I find the new address bar features introduced in a recent Firefox version very annoying. The suggestion of search terms can be disabled easily (does anyone really want to have everything entered into the address bar sent to an external service?), but this still leaves this annoying "Visit ..." entry at the top of the list of visited addresses.

This entry is annoying for several reasons:

  1. It does not add any functionality: Just hitting enter has (nearly) the same effect as selecting this entry.
  2. It can easily be selected accidentally when you actually want the top entry from the list of visited addresses.
  3. Typically, it does not suggest the address you actually want to visit for two reasons: First, it suggests visiting the top URL of the suggested site, even if you always visit a specific path. Second, it suggest a plain (HTTP) URL, even if the site actually only supports HTTPS and you never visited it with HTTP.

I have no idea, why they added such a stupid feature to Firefox at all (I suspect that it was originally designed for a different purpose that really only makes sense if you also enable the other features) and why they did not add an option to the UI for disabling it.

Fortunately, someone found out that it can be disabled by setting the browser.urlbar.unifiedcomplete option to false in about:config. Seeing how often the page with the solution has already been visited, I am definitely not the only person who is annoyed by this stupid new feature.

Time synchronization done right

Time synchronization between computers is important for many applications. For some applications (e.g. Apache Cassandra databases), it is even critical for data consistency.

Still, there are quite a few common misconceptions about how time synchronization in a network should be done correctly. Unfortunately, those misconceptions can easily lead to synchronization schemes that are by far less than optimal.

Accidentally, when looking for something else, I came a cross a series of two articles (part one, part two) that excellently describe the problems of the synchronization schemes that are commonly used and explain how to setup a scheme that actually provides precise synchronization.

In short, the only proper way for getting clock synchronization with the properties that most people want is setting up an internal pool of NTP servers that synchronizes against external references and have all other computers in the network synchronize against this internal pool. This is also the scheme that I have been successfully using for years.

However, you do not have worry: As long as you only need rough synchronization (so that the clock will show about the right time and not drift away more and more), the common scheme of synchronizing each individual computer against an external pool is typically okay, too. You just should be aware that there will allways be some clock skew between the computers and that you the computers will drift apart significantly if the connection to the external pool is interrupted for an extended period of time.


Shortly after writing this article, I found two more articles that are loosely connected to this topic. The first one basically tells us that we cannot rely on synchronized clocks because there are just too many ways how things can go wrong. The second one gives a practical example of how quickly things can go wrong when it comes to time synchronization.

Spring's @RequestMapping annotation works on private methods

Recently, I spent a lot of time on debugging a nasty problem with Spring WebMVC and Spring Security.

I had a class annotated with @Controller and a method annotated with @RequestMapping. I wanted to protected this method using the @Secured annotation. So I turned on global method security by adding @EnabledGlobalMethodSecurity with the right parameters to my @Configuration class, but it did not work. The method could still be called without having the proper privileges (or being authenticated at all).

After hours of debugging, I found out that the AOP advice was not applied to my controller method because it would not find the method when processing the controller class. At that moment I realized that the method had been declared package private. AOP proxies are not applied to non-public methods (for CGLIB proxies this would be possible, but in general it is not desirable and Spring does not do it).

This left me with the question: Why does the request mapping work. The answer is simple: When looking for methods with the @RequestMapping annotation, Spring does not check the method's access modifiers. As the method is invoked using reflection, it will work even if the method has been declared private (unless there is a SecurityManager in charge, but for most Spring applications there will not be one).

This leaves us with a very awkward situation: Private methods might be called by external code and if there is an @Secured annotation on them, it will be ignored. In my opinion, this is a bug: The @RequestMapping annotation should only work on public methods. There are actually four places in Spring where this could be fixed (Spring 4.1.7):

  1. line 172
  2. line 207
  3. line 60
  4. line 187

It would be completely sufficient to check whether the method is public in one of these places. Until this is fixed in Spring (and it might never get fixed because the fix would break backward compatibility), I use my own RequestMappingHandlerMapping which does the check:

public class PublicOnlyRequestMappingHandlerMapping extends
        RequestMappingHandlerMapping {

    protected RequestMappingInfo getMappingForMethod(Method method,
            Class<?> handlerType) {
        RequestMappingInfo info = super
                .getMappingForMethod(method, handlerType);
        if (info != null && !Modifier.isPublic(method.getModifiers())) {
            logger.warn("Ignoring non-public method with @RequestMapping annotation: "
                    + method);
            return null;
        } else {
            return info;


As you can see, the implementation is very simple. I first call the super method and then check whether the method is public so that I can generate a warning message when @RequestMapping has been used on a non-public method. If one does not care about such a message, once can check the method's access modifier fist and only invoke the super method when the investigated method is public.

In order to use the custom RequestMappingHandlerMapping, we have to use a custom implementation of WebMvcConfigurationSupport (when using Java Config):

public class CustomWebMvcConfiguration extends
        DelegatingWebMvcConfiguration {

    public RequestMappingHandlerAdapter requestMappingHandlerAdapter() {
        RequestMappingHandlerAdapter adapter = super
        return adapter;

    public RequestMappingHandlerMapping requestMappingHandlerMapping() {
        RequestMappingHandlerMapping handlerMapping = new PublicOnlyRequestMappingHandlerMapping();

        PathMatchConfigurer configurer = getPathMatchConfigurer();
        if (configurer.isUseSuffixPatternMatch() != null) {
        if (configurer.isUseRegisteredSuffixPatternMatch() != null) {
        if (configurer.isUseTrailingSlashMatch() != null) {
        if (configurer.getPathMatcher() != null) {
        if (configurer.getUrlPathHelper() != null) {

        return handlerMapping;


This implementation copies the implementation of requestMappingHandlerMapping() from the parent class, but replaces the actual implementation used with our own class. In addition to that, this configuration also overrides requestMappingHandlerAdapter() in order to the the ignoreDefaultModelOnRedirect attribute. This is the recommended setting for new Spring WebMVC applications, but it cannot be made the default in Spring because it would break backward compatibility. Of course, the two changes are completely independent, so you can choose to only implement either of them.

Why I don't like checked exceptions

One of the rather obscure features of the Java programming language is the support for checked exceptions. Most other languages running on the Java virtual machine (JVM) do not have them and most non-JVM programming languages do not have them either.

You might be surprised that I call checked exceptions "obscure" even though it is easy to understand their concept and to use them. However, I suspect that most experienced Java developers share my sentiment (if you don't, please speak up in the comments), while it is anything but obvious to beginners why checked exceptions might be problematic.

Actually, I have to admit that when I first learned the Java programming language (which must have been around Java 1.2 or 1.3), I liked the concept of checked exceptions. I typically prefer statically typed languages over dynamically typed ones because I like to have every support that a compiler can give me in statically verifying my code. Checked exceptions seem to be a logical extension of this concept, where the compiler can check whether all error conditions that might occur are actually handled by the code.

Unfortunately, the concept of checked exceptions has rather severe limitations which become apparent in larger projects. In this article, I want to explore why checked exceptions are a good idea that unfortunately fails when being put to practical use. I hope that this might be useful to Java beginners who are writing their first library and have to decide where to use checked and where to use unchecked exceptions.

Before taking a closer look at the problems with checked exceptions, we want to quickly revisit the top level of the exception hierarchy in Java and how the three different types of exceptions are handled differently.

In Java, all exceptions inherit from Throwable. There are three distinct types of exceptions: Unchecked exceptions that signal an error condition in the JVM (for example when a class cannot be loaded or a memory allocation fails) are derived from Error, which in turn is derived from Throwable. These exceptions are unchecked, which means that they can be thrown by any code without having been declared explicitly. Exception, which is also derived from Throwable, is the base class for all checked exceptions. Checked exceptions have to be declared explicitly in a method declaration. If a method declares that it throws a checked exception, the calling code must either catch this exception or must also declare that it throws the exception. Finally, there is the RuntimeException which is derived from Exception, but like Error is a base class for unchecked exceptions.

Even though both Errors and RuntimeExceptions represent unchecked exceptions, they are used for different purposes. Errors are typically thrown by the JVM only and are typically non-recoverable. For example, it is hard to recover from an error when loading a class, because this is typical caused by a problem with the class file. Therefore, Errors are rarely caught but will typically lead to program termination. Even if they are caught, the program will often behave erratically after getting such an exception (everyone who has experienced an OutOfMemoryError in Eclipse knows what I am talking about). RuntimeExceptions, on the other hand, often signal errors in the program's logic. For example, a NoSuchElementException happens when trying to get a non-existing element from a List.

Exception is used for checked exceptions which are typically caused by an exceptional situation (not necessarily considered an error). For example, an IOException is triggered when an I/O operation cannot be finished. Such a situation might not necessarily indicate an error, because it can simply happen when trying to access a resouce that no longer exists (e.g. a network connection might have been closed by a peer).

In summary, exceptions of type Error are typically only thrown by the JVM, exceptions of type RuntimeException are thrown by Java code, but usually do not have to be expected, and exceptions of type Exception (checked exceptions) have to be expected and need to be handled somehow.

In my opinion, there are two flaws in this concept: a minor obvious one and two major less obvious ones. The minor flaw is the class hierarchy. RuntimeException, the base class for unchecked exceptions, is derived from Exception, the base class for checked exceptions. It would be more reasonable to derive RuntimeException from Throwable directly, but this design flaw does not cause any actual trouble.

The first major flaw is that the distinction between exceptions that have to be expected (and thus should be checked exceptions) and exceptions that do not have to be expected (and thus should be unchecked exceptions) is not always clear. What if an exception has to be expected but cannot be reasonably handled locally? For example, a FileNotFoundException might be non-recoverable if an important configuration or database file is missing. There are three potential solutions for such a case: We can let the exception bubble up the stack (which means that now a lot of methods have to declare that they throw a FileNotFoundException), we can wrap it in a different kind of exception (e.g. in a MyLibraryException), or we can wrap it in a RuntimeException.

The first solution is problematic because of the second flaw described in the next paragraph. The second solution (which is the one recommended in the API docs) is not perfect either, because it means that information about the actual cause is lost. The actual cause can still be attached to the new exception (since Java 1.4), however it gets harder to catch the individual cause because a catch clause cannot test the cause of an exception and there is typically no documentation about which kind of exceptions might be wrapped in another exception. The third solution (which is very common) converts the checked exception to an unchecked exception, but like the second one, information about the actual cause is now more difficult to access.

The second major flaw is related to the concept of having checked exceptions bubble up the stack. This approach does not work well when using an inversion of control (IoC) pattern. This pattern is very prominent (for reasons that are outside the scope of this article), but cannot be properly used with checked exceptions. The generic (framework) code has to know which checked exceptions are thrown by user code that is called from the generic code. Obviously, it cannot, so the interface to the user code has to specify no checked exceptions at all or a checked exception specific to the calling code. This means that the user code has to wrap its exceptions in the checked exception specified by the framework or in an unchecked exception. This leaves the problem described earlier, where the code calling the framework code would now need to unwrap exceptions and handle their causes, even though it cannot (always) know which exceptions might be causing the exception that is caught.

Now, let's see how this changes when we consistently use unchecked exceptions instead of checked exceptions. Unchecked exceptions can easily bubble up through framework code, so we can catch them where we want to, but we do not have to catch them where we cannot handle them anyway. We do not lose any information about the exception, meaning we can still catch a very specific exception at a rather high level.

Obviously, it is important to document which unchecked exceptions are thrown by a method, so that calling code can know which exceptions it might want to catch. If there is framework code in between, it might not always be obvious which exceptions can occur, however this is not worse than with an exceptions of an unknown type that is wrapped in a checked exception.

The Spring framework, for example, chooses to use unchecked exceptions for most error conditions. I tend to use the same approach when I write library code.

It is tempting to use checked exceptions in order to force the user of a library method to handle a certain situation, but this rarely works. The InterruptedException thrown by many standard library methods is a good example: This exception is thrown when a thread is interrupted while it is blocked waiting for some event. This makes sense, because the thread should not wait any longer, when it has been interrupted. However, it is very common to see code like the following:

try {
} catch (InterruptedException e) {
    // Ignore the exception

This is very dangerous because Java will clear the interruption status of a thread when throwing an InterruptedException. This means that code on a higher level, that checks whether the thread has been interrupted (you will often find this as a loop condition), will never know that the thread has been interrupted.

Therefore, the correct way to handle an InterruptedException is the following:

try {
} catch (InterruptedException e) {

This will mark the current thread as interrupted, so that code on a higher level will get correct results when checking whether the thread has been interrupted.

Now, imagine that InterruptedException was an unchecked exception: Code that did not want to deal with it simply would not. This would result in the exception bubbling up until it is handled explicitly or the current thread is terminated. In most cases, this is exactly what the programmer wants (ensure that the thread terminates when it is interrupted). In the few cases where one wants to react on the InterruptedException in a different way than terminating the thread, one could still catch the exception explicitly. Everyone doing this would probably be aware of the fact that the thread has to be interrupted again if other code should see that it has been interrupted. So use of an unchecked instead of a checked exception would actually result in better code in most cases.

Actually, the tendency to abandon checked exceptions can also be seen in other languages. C++ had a feature that is simlar to checked exceptions in C++ 98 and 03: A function can explicitly declare which exceptions it throws and the compiler will enforce that it does not throw any other exceptions (when it does, the program terminates). In C++ 11, this feature has been deprecated. Instead, C++ now offers the nothrow keyword in order to specify that a function does never throw any kind of exception. This is a consequence of the experience that it is rarely practical to explicitly specify which exceptions a function might throw.

In summary, checked exceptions are a nice concept that unfortunately is not useful in practice. Therefore, it is my opinion that they should be avoided in general and unchecked exceptions should be preferred where feasible.

Using the Red Pitaya with an Asus N10 Nano WiFi dongle

Recently, I got a Red Pitaya, which is a very neat toy for my electronics lab. A few months ago I already got a BitScope Micro. Back then, the Red Pitaya costed about 450 EUR, while the BitScope Micro (with BNC Adapter) costed  only about 150 EUR. However, compared to the Red Pitaya, the features of the BitScope Micro are quite limited. In particular, the features of the signal generator are quite limited and the sampling rate (and sampling length) of the two analog inputs is low. Now, the Red Pitaya has been reduced in price, so that it is available for about 250 EUR, so I could not resist any longer.

One of the many neat features of the Red Pitaya is the fact that you do not need a PC with any special software installed and a USB connection to use it. You can simply connect over the network with a browser, so a table it enough to use it, making it much more flexible in use. However, by default a wired Ethernet connection is still needed.

Luckily, it can act as a WiFi access point when installing the right WiFi USB dongle. The Red Pitaya manual recommends the Edimax EW7811Un, but the supplier where I ordered the Red Pitaya did not have this dongle on stock. Basically, the choice of the dongle is limited by the kernel module(s) compiled into the Linux kernel used by the Red Pitaya eco-system, so any device that is compatible with the rtl8192cu driver should work.

Therefore, I got an Asus N10 nano USB WiFi dongle, which supposedly uses a chip from this family. There seem to be two versions of this device and only one of them is using the Realtek chip. However, the datasheet from the supplier explictly specified the Realtek chip, so I expected it to work.

When the hardware arrived, I prepared the Micro SD card for the Red Pitaya, plugged everything in, and to my surprise the WiFi did not show up. So I added a wired connection in order to login using SSH and investigate the situation.

The USB WiFi dongle was detected by Linux (as it showed up in /sys/bus/usb), however, the corresponding network device was missing and could not be brought up. In the end, I found out, that the driver was not enabled, because the Linux kernel was rather old and the device ID was simply not known yet to the driver.

Such a problem can easily be fixed by explicitly telling the driver to support the device ID (echo "0b05 17ba" >/sys/bus/usb/drivers/rtl8192cu/new_id for my Asus N10 nano WiFI USB dongle) and suddenly I could bring up the network device wlan0. Unfortunately, there was still no "Red Pitaya" WiFI available.

A closer investigation of the startup scripts showed, that the software for hosting an access point was only started, when the device wlan0 was already available during startup. Luckily, there was a slightly uncoventional but rather easy way to ensure this: The necessary code could be added to the file /etc/network/config on the SD card (/opt/etc/network/config in the Red Pitaya's file system). This file is sourced by the initialization script, so any shell code present in this file will be executed before bringing up the network. I simply added the following lines to the beginning of the file:

# Add device ID for Asus N10 Nano to rtl8192cu driver
echo "0b05 17ba" >/sys/bus/usb/drivers/rtl8192cu/new_id

After making this change and rebooting the Red Pitaya, the WiFi access-point mode worked like a charm.

Debian installer with custom proxy and custom repository key

When using the mirror/http/proxy option in a preseed file for the Debian installer, I experienced a problem: The GPG key for a local Apt repository could not be loaded because the installer is using the proxy when downloading the GPG key. This fails, if the proxy is an instance of Apt-Cacher NG (which only allows access to certain well-defined paths).

As it turns out, I am not the first one to experience this problem: There is an open bug in both the Ubuntu and the Debian bug trackers. Unfortunately, these bugs have been open for four years without any significant action.

Luckily, there is a workaround: By specifying a URL starting with "file:///" instead of "http://" in the preseed file, this bug can be avoided. Of course, you have to use a preseed script at the early stage to actually download the file to the specified path. However, this is possible, because the preseed script does not have to use the Apt-Cacher NG proxy.

You might want to add unset http_proxy to the start of every shell script that you run from the preseed file. This includes scripts that are run through in-target, because in-target seems to add this environment variable.

Better entropy in virtual machines

I had the problem that a Tomcat 7 server in a virtual machine would take ages to start, even though there were only a few rather small applications deployed in it. I think that these problems first appeared after upgrading to Ubuntu 14.04 LTS, but this might just have been a coincidence. Fortunately, the log file gave a hint to the cause of the problem:

INFO: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [206,789] milliseconds.
INFO: Server startup in 220309 ms

So the initialization of the random number generator (RNG) was responsible for most of the startup time. When you think about it, this is not that surprising: When the system has just booted, there is virtually no entropy available, so the read from /dev/random might block for a very long time. In a physical system, one can use something like haveged or a hardware RNG to fill the kernel's entropy pool, but what about a virtual machine?

Luckily, in recent versions of Linux KVM and libvirt, there is a way to feed entropy from a virtualization host to a virtual machine. In the virtual machine, the device appears as a hardware RNG (/dev/hwrng). Have a look at my wiki for a configuration example.

In the virtual machine, one still needs to read the data from the virtual RNG and feed it into the kernel's entropy pool. For this purpose, the daemon from the rng-tools package does a good job.

Using the combination of haveged in the VM host and rng-tools in the VM, I could significantly boost the startup time of the Tomcat server:

INFO: Server startup in 11831 ms

Non-blocking DatagramChannel and empty UDP packets

I just found out the hard way that there are two bugs when using a non-blocking DatagramChannel in Java with empty (zero payload size) UDP packets.

The first one is not so much a bug but more an API limitation: When sending an empty UDP packet, you cannot tell whether it has been actually sent. The method returns the number of bytes sent and returns zero when the packet has not been sent, but if you send an empty packet, the number of bytes sent is zero even if the send operation suceeded. So there is no way to tell whether the send operation was successful.

The second bug is more serious and this one clearly is a bug in the implementation. When using a DatagramChannel with a Selector, the selector does not return from its select operation when an empty packet has been received. This means that the select call might block forever and you will only see the empty packets received once a non-empty packet arrives.

I describe the two problems and a possible workaround in more detail in my wiki. In the project that I am working on, I can live with not knowing for sure whether the packet was sent (I just try again later if there is no reaction) and for the case where I have to receive packets, I am now using blocking I/O. However, I still think that this is a nasty bug that should be fixed in a future release of Java.

Process tracking in Upstart

Recently, I exprienced a problem with the process tracking in Upstart:

I wanted start a daemon process running as a specific user. However, I needed root privileges in the pre-start script, so I could not use the setuid/setgid options. I tried to use su in the exec option, but then Upstart would not track the right process. The expect fork and expect daemon options did not help either. As a nasty side effect, these options cannot be tested easily, because having the wrong option will lead to Upstart waiting for an already dead process to die and there is no way to reset the status in Upstart. At least, there is a workaround for effectively resetting the status without restarting the whole computer.

The problem is that su forks when running the command it is asked to run instead of calling exec from the main process. Unfortunately, The process I had to run would fork again because I had to run it as a daemon (not running it as a daemon had some undesirable side effects). Finally, I found the solution: Instead of using su, start-stop-daemon can be used. This tool will not fork and therefore it will not upset Upstart's process tracking. For example, the line

exec start-stop-daemon --start --chuid daemonuser --exec /bin/server_cmd
will run /bin/server_cmd as daemonuser without forking.

This way, expect fork or expect daemon can be used, just depending on the fork behavior of the final process.

iTunes Bug: Apps not syncing to iPhone or iPad

While setting up my new iPad yesterday, I experienced a strange problem. iTunes (on Windows) would repeatedly crash with a problem in msvcrt10.dll when trying to copy the apps to the iPad.

In the Apple support forums, I found the explanation and a workaround for this problem: It seems like iTunes 11.4 introduces a but (that is still present in iTunes 12) that causes a crash when apps are stored on network share and referenced using a UNC path. In my case, the Music folder (which is the default location for the iTunes library) is redirected to a UNC path pointing to a DFS share. Interestingly, this bug only affects apps, not music or videos.

In order to make the apps sync again, the path that iTunes uses for referencing the files needs to be shared to a regular path with a drive letter. This can either be achieved by copying the apps to a local driver or mapping the network share to a drive letter. Either way, all apps need to be deleted from the iTunes library (but not deleted on disk) and re-added using the regular path. Obviosuly, iTunes has to be configured to not automatically copy files to its default library location. After this change, the synchronization should work. Finally, the apps can be deleted again and re-added using the UNC path - once the apps are on the device (with the newest version) iTunes will not try to copy them again, thus avoiding the bug.

However, I find it annoying that this bug has been known since mid of September and still has not been fixed by Apple.

Trouble with IPv6 in a KVM guest running the 3.13 kernel

Some time ago, I wrote about two problems with the 3.13 kernel shipping with Ubuntu 14.04 LTS Trusty Tahr: One turned out to be a problem with KSM on NUMA machines acting as Linux KVM hosts and was fixed in later releases of the 3.13 kernel. The other one affected IPv6 routing between virtual machines on the same host. Finally, I figured out the cause of the second problem and how it can be solved.

I use two different kinds of network setups for Linux KVM hosts: For virtual-machine servers in our own network, the virtual machines get direct bridged access to the network (actually I use OpenVSwitch on the VM hosts for bridging specific VLANs, but this is just a technical detail). For this kind of setup, everything works fine, even when using the 3.13 kernel. However, we also have some VM hosts that are actually not in our own network, but are hosted in various data centers. For these VM hosts, I use a routed network configuration. This means that all traffic coming from and going to the virtual machines is routed by the VM host. On layer 2 (Ethernet), the virtual machines only see the VM host and the hosting provider's router only sees the physical machine.

This kind of setup has two advantages: First, it always works, even if the hosting provider expects to only see a single, well-known MAC address (which might be desirable for security reasons). Second, the VM host can act as a firewall, only allowing specific traffic to and from the outside world. In fact, the VM host can also act as a router between different virtual machines, thus protecting them from each other should one be compromised.

The problems with IPv6 only appear when using this kind of setup, where the Linux KVM host acts as a router, not a bridge. The symptoms are that IPv6 packets between two virtual machines are occasionally dropped, while communication with the VM host and the outside world continues to work fine. This is caused by the neigbor-discovery mechanism in IPv6. From the perspective of the VM host, all virtual machines are in the same network. Therefore, it sends an ICMPv6 redirect message in order to indicate that the VM should contact the other VM directly. However, this does not work because the network setup only allows traffic between the VM host and individual virtual machines, but no traffic between two virtual machines (otherwise it could not act as a firewall). Therefore, the neighbor-discovery mechanism determines the other VM to be not available (it should be on the same network but does not answer). After some time, the entry in the neighbor table (that you can inspect with ip neigh show) will expire and communication will work again for a short time, until the next redirect message is received and the same story starts again.

There are two possible solutions to this: The proper one would be to use an individual interface for each guest on the VM host. In this case, the VM host would not expect the virtual machines to be on the same network and thus stop sending redirect packets. Unfortunately, this makes the setup more complex and - if using a separate /64 for each interface - needs a lot of address space. The simpler albeit sketchy solution is to prevent the redict messages from having any effect. For IPv4, one could disable the sending of redirect messages through the sysctl option net.ipv4.conf.<interface>.send_redirects. For IPv6 however, this option is not available. So one could either use an iptables rule on the OUTPUT chain for blocking those packets or simply configure the KVM guests to ignore such packets. I chose the latter approach and added

# IPv6 redirects cause problems because of our routing scheme.
net.ipv6.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

to /etc/sysctl.conf in all affected virtual machines.

I do not know, why this behavior changed with kernel 3.13. One would expect the same problem to appear with older kernel versions, but I guess there must have been some change in the details of how NDP and redirect messages are handled.

Addendum (2014-11-02):

Adding the suggested options to sysctl.conf does not seem to fix the problem completely. For some reasons, an individual network interface can still have this setting enabled. Therefore, I now added the following line to the IPv6 configuration of the affected interface in /etc/network/interfaces:

        post-up sysctl net.ipv6.conf.$IFACE.accept_redirects=0

This finally fixes it, even if the other options are not added to sysctl.conf.