Skip to content

Apt: Writing more data than expected

If apt (apt-get, aptitude, etc.) fails with an error message like

Get:1 xenial/main amd64 python-tornado amd64 4.2.1-2~ds+1 [275 kB]
Err:1 xenial/main amd64 python-tornado amd64 4.2.1-2~ds+1
  Writing more data than expected (275122 > 275100)

but the file in the repository has the expected size, a caching proxy (e.g. Apt-Cacher-NG) might be at fault. This can happen when the package in the repository has been changed instead of releasing a new package with a new version number. This will typically not happen for the official repositories, but it might happen for third-party repositories.

In this case, there are only two solutions: Bypass the proxy or remove the old file from the proxy's cache. In the case of Apt-Cacher-NG, this can be achived by going to the web interface, checking the “Validate by file name AND file directory (use with care),” and “then validate file contents through checksum (SLOW), also detecting corrupt files,” options and clicking “Start Scan and/or Expiration”. This scan should detect the broken packages, which can then be selected by checking “Tag” next to each package and subsequently deleted by clicking “Delete selected files”.

Using CRLs in Icinga 2

Icinga 2.x offers a cluster mode which (from an administrator's point of view) is one of the most important features introduces with the 2.x release. Using the cluster feature, check commands can be executed on satellite nodes or even the complete scheduling of checks can be delegated to other nodes, while still keeping the configuration in a single place.

In order to enable secure communication within the cluster, Icinga 2 uses a public key infrastructure (PKI). This PKI can be managed with the icinga2 pki commands. However, there is no command for generating a CRL. For this reason, it is necessary to use the openssl ca command for generating a CRL. I have documented the steps necessary for generating a CRL in my wiki.

Funnily, it seems like no one has used a CRL in Icinga 2 so far. I know this, because up to today, Icinga 2 has a bug that makes it impossible to load a CRL. Luckily, yours truly already fixed this bug and this bugfix is going to be included in the next Icinga 2 release.

I find it strange that obviously no one is using CRLs, because Icinga 2 uses a very long validity period when generating certificates (15 years), so it is quite likely that at some point a node is decommissioned and thus the corresponding certificate shall be removed.

The war in Syria: Why it does not seem to end

Recently, I came across an excellent New York Times article that gives an explanation about why the situation in Syria is worse than in most civil wars:

Syria’s Paradox: Why the War Only Ever Seems to Get Worse

The arcticle is already about four months old, so it might look outdated regarding the latest developments. However, after reading it, I am pessimistic about the current attempts to establish peace and fear that violence will return soon.

Design flaws of the Linux page cache

Like most modern operating systems, Linux has a page cache. This means that when reading from or writing to a file, the data is actually cached in the system’s memory, so that subsequent read requests can be served much faster, without having to access the non-volatile storage.

This is an extremely useful facility, which has a huge impact on system performance. For typical workloads, a significant amount of data is used repeatedly and caching this data reduces the latency dramatically. For example, imagine that every time you ran “ls”, the executable would actually be read from the hard disk. This would undoubtedly make the interactive experience much less responsive.

Unfortunately, there is a downside to how Linux caches file-system data: As long as there is free system memory, the caching is a no-brainer and simply caching everything works perfectly. After some time however, the system memory will be used completely and the kernel has to decide how to free memory in order to cache I/O data. Basically, Linux will evict old data from the cache and even move process data to the swap area in order to make space for data newly arriving in the cache. In most cases, this makes sense: If data has not been used for some time (even data that is not from a disk but belongs to a process which simply has not needed it in some time), it makes sense to evict it from the cache or put it into the swap area so that the fast memory can be used for fresh data that is more likely to be needed again soon. Most times, these heuristics work great, but there is one big exception: Processes that read or write a lot of data, but only do so once.

For example, imagine a backup process. A backup process will read nearly all data stored on disk (and maybe write it to an archive file), but caching the data from this read or write operations does not make a lot of sense because it is not very likely to be accessed again soon (at least not more likely than if the backup process did not run at all). However, the Linux page cache will still cache that data and move other data out of memory. Now, accessing that data again will result in slow read requests from the hardware, effectively slowing the system down, sometimes to a point that is worse than if the page cache had been disabled completely.

For a long time, I never thought about this problem. In retrospect, systems sometimes seemed slow after running a backup, but I never connected the dots until recently I started to see a problem that looked weird at first: Every night, I got an e-mail from the Icinga monitoring system warning me about the swap space on two (very similar) systems running extremely low. First, I expected that some nightly maintenance problem might simply need a lot of memory (I recently had installed a software upgrade on these systems), so I assigned more memory to the virtual machines. I expected that the amount of free swap space would increase by the amount of extra memory assigned, but it did not. The swap space was still used almost completely. Therefore, I looked at the memory consumption while the problem was present, and the result was astonishing: Both memory and swap were virtually fully used, but about 90 percent of the memory was actually used by the page cache, not by any process.

Investigating the problem closer revelealed that the high usage of swap space always occurred shortly after the nightly backup process started running. To be absolutely sure, I manually started the process during the day and I could immediately see the page cache and swap usage growing rapidly. It was clear that the backup process reading a lot of data was responsible for the problem. If you think about it, this is not the kernel’s fault: It simply cannot know that the data will not be needed again and moving other data out of the way is actually counter productive.

So everything that is needed to fix this kind of problem is a way to tell the kernel “please do not cache the data that I read, I will not need it again anyway”. While this sounds simple, it actually is a huge problem with current kernel releases.

There is no way to tell the kernel “do not cache the data accessed by this process”. In fact, there are only four ways for influencing the caching behavior that I am aware of:

  1. The page cache can be disabled globally for the time of running the backup process. While it might alleviate the impact of the problem, it is still not desirable because the page cache actually is useful and disabling it will have a negative impact on system performance. However, during the backup process, disabling the page cache might still result in better performance than not doing anything at all.
  2. Some suggest mounting the file-system with the “sync” flag. However, this will only circumvent the page cache for write requests, not for read requests. It also has very negative impacts on performance, so I only list it here because it is suggested by some people.
  3. The files can be opened with the O_DIRECT flag. This tells the kernel that I/O should bypass the caches. Unfortunately, using O_DIRECT has a lot of side effects. In particular, the size and offset of the memory buffer used for I/O operations has to match certain alignment restrictions that depend on the Kernel version used and the file-system type. There is no API for querying these restrictions, so one can only choose alignment to a rather large size and hope that it is sufficient. Of couse, this also means that simply modifying the code of an application that opens a file is not sufficient. Every piece of code that reads from or writes to a file has to be changed so that it matches the alignment restrictions. This definitely is not an easy task, so using O_DIRECT is more of a theoretical approach.
  4. Last but not least, there is the posix_fadvise function. This function allows a program in user space to tell the kernel how it is going to use a file (e.g. read it sequentially), so that the kernel can optimize things like the file-system cache. There are two flags that can be passed to this function which sound particularly promising: According to the man page, the POSIX_FADV_NOREUSE flag specifies that “the specified data will be accessed only once” and the POSIX_FADV_DONTNEED flag specifies that “the specified data will not be accessed in the near future”.

So it sounds like posix_fadvise can solve the problem and all we need to do is find a way to tell the program (tar in my case) to call it. As it turns out, there even is a small tool called “nocache” that acts as a wrapper around arbitrary programs and uses the LD_PRELOAD mechanism to catch attempts to open a file and adds POSIX_FADV_NOREUSE right after opening the file.

This would be the perfect solution if the Linux kernel actually cared about POSIX_FADV_NOREUSE. Unfortunately, till this day, it simply ignores this flag. In 2011, there was a patch that tried to add support for this flag to the kernel, but it never made it into the mainline kernel (the reasons are unknown to me).

Actually, the nocache tool is aware of this and adds a workaround: When closing the file, it calls posix_fadvise with the POSIX_FADV_DONTNEED flag. It has to do this when closing the file (instead of when opening it) because POSIX_FADV_DONTNEED only removes data from the page cache, it does not prevent it from being added to the page cache.

So I installed nocache, and wrapped the call to tar with it, expecting that this would finally solve the problem. Surprisingly, it did not. At first, it seemed like the page cache was filling less rapidly, but after a while, everything looked quite like before. I tried to add the “-n” parameter to nocache, telling it to call posix_fadvise multiple times (the documentation suggested to do so), but this did not help either.

After giving this some more thought, I realized why: Using POSIX_FADV_DONTNEED when closing the file works great when working with many small or medium-sized files. For large files, however, it does not work. By the time it is called, all of the file has already been put into the page cache, causing the same problems as when many small files are read without nocache. This means that posix_fadvise has to be called repeatedly while reading from the file to ensure that the amount of data cached never grows too large. While there are only a few API calls for opening files, there are ways for reading from a file (e.g. memory-mapped I/O) that nocache simply cannot catch. This means that the only solution is actually patching the program that is reading or writing data.

This is why I created a patch for GNU tar version 1.29. This patch calls posix_fadvise after reading each block of data and thus ensures that the page cache is never polluted by tar. Unfortunately, this patch is not portable, nor does it provide a command-line argument for enabling or disabling this behavior, so it is not really suitable for inclusion into the general source code of GNU tar. The patch only takes care of not polluting the file-system cache when reading files. It does not do so when writing them, but for me this is sufficient because in my backup script, tar writes to the standard output anyway.

When using this patched version of tar, the memory problems disappear completely. This makes me confident that the change is sufficient, so I am going to use this patched version of tar for all systems that I backup with tar. Unfortunately, I use Bareos for the backup of most systems, so I will have to find a solution for that software, too. Maybe we are lucky, and support for POSIX_FADV_NOREUSE will finally be added to Linux at some point in the future, but until then patching the software involved in the backup process seems like the only feasible way to go.

Zimbra backup broken after upgrade to Ubuntu 16.04 LTS

After upgrading from Zimbra ZCS 7.8.0 on Ubuntu 14.04 LTS to Zimbra ZCS 7.8.1 on Ubuntu 16.04 LTS, backups where suddenly not working any longer. Instead of the usual “SUCCESS” e-mail, I would get two e-mails about a failure, both essentially containing the error message “Error occurred: system failure: LDAP backup failed: system failure: exception during auth {RemoteManager:>}”.

As it turns out, this was caused by an SSH authentication problem. Zimbra was still using an old DSA key for SSH, which is not supported in Ubuntu 16.04 LTS any longer (at least it is deactivated by default). The fix is simple: After running zmsshkeygen and zmupdateauthekys (both must be run as the zimbra user), Zimbra uses an RSA key-pair and authentication works again.

ISC DHCP server not starting on Ubuntu 16.04 LTS

After upgrading two systems running the ISC DHCP server from Ubuntu 14.04 LTS (Trusty) to Ubuntu 16.04 LTS (Xenial), I experienced some trouble with the DHCP server not starting on system boot. The log file contained the following message:

Internet Systems Consortium DHCP Server 4.3.3
Copyright 2004-2015 Internet Systems Consortium.
All rights reserved.
For info, please visit
Config file: /etc/dhcp/dhcpd.conf
Database file: /var/lib/dhcp/dhcpd.leases
PID file: /run/dhcp-server/
Wrote 0 class decls to leases file.
Wrote 0 deleted host decls to leases file.
Wrote 0 new dynamic host decls to leases file.
Wrote 389 leases to leases file.

No subnet declaration for ovsbr0p1 (no IPv4 addresses).
** Ignoring requests on ovsbr0p1.  If this is not what
   you want, please write a subnet declaration
   in your dhcpd.conf file for the network segment
   to which interface ovsbr0p1 is attached. **


Not configured to listen on any interfaces!

If you think you have received this message due to a bug rather
than a configuration issue please read the section on submitting
bugs on either our web page at or in the README file
before submitting a bug.  These pages explain the proper
process and the information we find helpful for debugging..


Obviously the problem was that the DHCP server was started before the network interface ovsbr0p1 (the primary network interface of the host) was brought up. I guessed that this problem was somehow related to the migration from Upstart to Systemd. Still, it was strange because the isc-dhcp-server.service was configured to be started after the

I could not figure out why Systemd though that the had been reached before the interface was configured completely, but I found a simple workaround: I added a file /etc/systemd/system/isc-dhcp-server.service.d/retry.conf with the following content:


This means that the first start of the DHCP server will still fail most of the time, but Systemd will retry 15 seconds later and typically the network interface will be online by then (if it is not, Systemd will try again another 15 seconds later).

Network problems after upgrading to Ubuntu 16.04 LTS

After upgrading a virtual machine from Ubuntu 14.04 LTS to Ubuntu 16.04 LTS, I was getting weird problems with the network configuration. The network would still be brought up, but scripts that were specified in /etc/network/interfaces would not be run when the corresponding interface was brought up.

The logfile /var/log/syslog would contain messages like "ifup: interface eth0 already configured". On other VMs that had also been updated from Ubuntu 14.04 LTS and uses a similar network configuration, this problem would not appear.

I found the solution to this problem in the Ubuntu Forums: There was a file /etc/udev/rules.d/85-ifupdown.rules that caused problems with the network initialization. After deleting this file, the problems went away. I guess that this file was present in a rather old release of Ubuntu and thus the problem only appears when a system has previously been upgraded from that release of Ubuntu.

Issues with SYSVOL share after installing KB3161561

Recently, I got funny issues with group policies on Windows Server 2012 R2. These issues manifested themselves with the following symptoms:

  • When trying to edit a group policy, the Group Policy Management tool would present an error like “Group Policy Error: You do not have permission to perfrom this operation. Details: Access is denied.” The Group Policy Management Editor would still open, but the group policy would not be displayed.
  • Sometimes, the group policy editor would open, but when trying to navigate through the tree, it would display an error message like “Error (0x80070005) occurred parsing file. Access is denied.”  I believe that this error is only present when using the central store for administrative templates.
  • The event log would contain messages like: “The processing of Group Policy failed. Windows attempted to read the file \\domain\sysvol\domain\Policies\uuid\gpt.ini from a domain controller and was not successful. Group Policy settings may not be applied until this event is resolved. This issue may be transient and could be caused by one or more of the following:
    a) Name Resolution/Network Connectivity to the current domain controller.
    b) File Replication Service Latency (a file created on another domain controller has not replicated to the current domain controller).
    c) The Distributed File System (DFS) client has been disabled.”
  • When trying to open \\\SYSVOL in the file brower, a prompt to enter credentials or an “Access is denied” error message would be displayed.

Like suggested in the TechNet forums, disabling the “Hardened UNC paths” feature that was introduced with KB3000483 fixed these issues, but obviously this is not a solution because this will actually reintroduce the vulnerability (MITM-attack on SYSVOL share) that was addressed by KB3000483.

After some time, I realized that these problems had first appeared after installing the June security updates, so I looked through the corresponding knowledge base articles and found KB3161561. This article actually mentions (some of) the issues described earlier in the “Known issues in this security update” section. It also offers a different workaround that works without disabling the “Hardened UNC paths” feature: Setting the “SmbServerNameHardeningLevel” to 0. However, using this workaround has other security implications (described in an MSDN article). Last but not least, MS15-083 describes a third workaround that involves disabling version 1 of the SMB protocol on the server, but this workaround did not solve the problem for me.

Changing the “SmbServerNameHardeningLevel” to 0 might not work when this setting is reset by a group policy (as it was in my case). In this case, the corresponding group policy needs to be changed and the “Computer Configuration\Windows Settings\Local Policies\Security Options\Microsoft network server: Server SPN target name validation level” option needs to be set to “Off”.

Open vSwitch and Multicasting

Recently, I noticed the following messages in the system log of a Ubuntu 14.04 LTS host that is running radvd:

Jun 28 13:15:33 myhost radvd[5782]:    do you need to add the UnicastOnly flag?
Jun 28 13:15:33 myhost radvd[5782]: interface ovsbr0v20p0 does not support multicast

At first, I was surprised, but after writing a small program, that checks for the IFF_MULTICAST flag in the interfaces attributes, I realized that the interface in fact does not support multicasts (or at least says so).

As it turns out, virtual interfaces added to an Open vSwitch bridge do not support multicasts in older versions of Open vSwitch (Ubuntu 14.04 LTS ships with Open vSwitch 2.0.2). I cannot tell for sure, in which version muticast support has been added. Looking at the changelog, it looks like this is present since Open vSwitch 2.4.0. Anyway, the version of Open vSwitch shipped with Ubuntu 16.04 LTS (Open vSwitch 2.5.0) supports multicasts on virtual interfaces and the IFF_MULTICAST flag is set for those interfaces.

This means that radvd should not have any problems when using an Open vSwitch virtual interface on Ubuntu 16.04 LTS.

Bug in the Apache Maven Javadoc Plugin

This afternoon, I spent several hours to figure out a problem that in the end turned out to be a bug in the Apache Maven Javadoc Plugin (version 2.10.3).

I wanted to use a custom stylesheet when building the Javadocs of all modules of a multi-module Maven project, so I generated a JAR that contained the stylesheet file, added it to the dependencies of the plugin and referenced it in the <stylesheetfile> tag.

To my suprise, Maven kept complaining with a message like

[WARNING] Unable to find the resource 'path/to/my/stylesheet.css'. Using default Javadoc resources.

I checked everything, tried various ways to configure the dependency, etc., but I could not get it to work. So I resorted to the last thing one can do when a software does not work as expected. I grabbed the source code of the plugin, found the relevant part that generated the message, and attached to the Maven process with a debugger. As it turned out, the problem was actually caused by a bug in the plugin that lead to resources from dependencies not being resolved correctly.

I filed a bug report and attached a patch to the bug report that fixes the problem for me. I hope that this patch will soon make it into a release version of the plugin. Until then, maybe this article helps someone else by saving the time to look for the cause of the issue.

UDP sockets broken again in Ubuntu 14.04 LTS

Some time ago, a regression was introduced into the 3.13 line kernel used by Ubuntu 14.04 LTS that broke UDP sockets when they were used in a certain way (e.g. like FreeRADIUS does). This bug was fixed in 3.13.0-67 and I hoped to never see it again.

Two days ago, I realized that one of our RADIUS servers was not working correctly any longer. I could not tell how long this problem had existed (the second RADIUS server still worked and in monitoring the primary one also worked, so the problem went undetected for a very long time).

After looking for the cause of the problem for quite some time, I remembered the problem described earlier and tried an old kernel version. Bingo! This fixed the problem. After looking at the changelog of the current 3.13 line kernel from trusty-proposed (that also fixes the problem) I found a reference to another bug report that describes the problem (don't be fooled by the bug's description, it also applies to IPv4).

As it turns out, the first regression had been caused by backporting an optimization regarding UDP checksum calculation from a newer Linux kernel. However, this change exposed a problem that had been fixed in the newer kernel but not in Ubuntu's branch of kernel 3.13. This regression was fixed by simply removing the patch again. This was okay because it was just an optimization.

Some time later, someone (who obviously was not aware of this regression) again thought that backporting the optimzation was a good idea, so it got reintroduced in 3.13.0-69. Now, it looks like they fixed the bug in 3.13.0-78 by actually fixing the underlying problem and not by removing the patch again. Therefore, I hope that we will not see the regression a third time. However, I am a bit annoyed that they did not do better testing when backporting the patch after there had already been a regression around it once. Maybe the Ubuntu team's decision to not use a kernel with long-term support and do maintenance themselves was not so wise after all.

Disabling the annoying "Visit ..." entry in the Firefox address-bar drop-down

I find the new address bar features introduced in a recent Firefox version very annoying. The suggestion of search terms can be disabled easily (does anyone really want to have everything entered into the address bar sent to an external service?), but this still leaves this annoying "Visit ..." entry at the top of the list of visited addresses.

This entry is annoying for several reasons:

  1. It does not add any functionality: Just hitting enter has (nearly) the same effect as selecting this entry.
  2. It can easily be selected accidentally when you actually want the top entry from the list of visited addresses.
  3. Typically, it does not suggest the address you actually want to visit for two reasons: First, it suggests visiting the top URL of the suggested site, even if you always visit a specific path. Second, it suggest a plain (HTTP) URL, even if the site actually only supports HTTPS and you never visited it with HTTP.

I have no idea, why they added such a stupid feature to Firefox at all (I suspect that it was originally designed for a different purpose that really only makes sense if you also enable the other features) and why they did not add an option to the UI for disabling it.

Fortunately, someone found out that it can be disabled by setting the browser.urlbar.unifiedcomplete option to false in about:config. Seeing how often the page with the solution has already been visited, I am definitely not the only person who is annoyed by this stupid new feature.

Time synchronization done right

Time synchronization between computers is important for many applications. For some applications (e.g. Apache Cassandra databases), it is even critical for data consistency.

Still, there are quite a few common misconceptions about how time synchronization in a network should be done correctly. Unfortunately, those misconceptions can easily lead to synchronization schemes that are by far less than optimal.

Accidentally, when looking for something else, I came a cross a series of two articles (part one, part two) that excellently describe the problems of the synchronization schemes that are commonly used and explain how to setup a scheme that actually provides precise synchronization.

In short, the only proper way for getting clock synchronization with the properties that most people want is setting up an internal pool of NTP servers that synchronizes against external references and have all other computers in the network synchronize against this internal pool. This is also the scheme that I have been successfully using for years.

However, you do not have worry: As long as you only need rough synchronization (so that the clock will show about the right time and not drift away more and more), the common scheme of synchronizing each individual computer against an external pool is typically okay, too. You just should be aware that there will allways be some clock skew between the computers and that you the computers will drift apart significantly if the connection to the external pool is interrupted for an extended period of time.


Shortly after writing this article, I found two more articles that are loosely connected to this topic. The first one basically tells us that we cannot rely on synchronized clocks because there are just too many ways how things can go wrong. The second one gives a practical example of how quickly things can go wrong when it comes to time synchronization.

Spring's @RequestMapping annotation works on private methods

Recently, I spent a lot of time on debugging a nasty problem with Spring WebMVC and Spring Security.

I had a class annotated with @Controller and a method annotated with @RequestMapping. I wanted to protected this method using the @Secured annotation. So I turned on global method security by adding @EnabledGlobalMethodSecurity with the right parameters to my @Configuration class, but it did not work. The method could still be called without having the proper privileges (or being authenticated at all).

After hours of debugging, I found out that the AOP advice was not applied to my controller method because it would not find the method when processing the controller class. At that moment I realized that the method had been declared package private. AOP proxies are not applied to non-public methods (for CGLIB proxies this would be possible, but in general it is not desirable and Spring does not do it).

This left me with the question: Why does the request mapping work. The answer is simple: When looking for methods with the @RequestMapping annotation, Spring does not check the method's access modifiers. As the method is invoked using reflection, it will work even if the method has been declared private (unless there is a SecurityManager in charge, but for most Spring applications there will not be one).

This leaves us with a very awkward situation: Private methods might be called by external code and if there is an @Secured annotation on them, it will be ignored. In my opinion, this is a bug: The @RequestMapping annotation should only work on public methods. There are actually four places in Spring where this could be fixed (Spring 4.1.7):

  1. line 172
  2. line 207
  3. line 60
  4. line 187

It would be completely sufficient to check whether the method is public in one of these places. Until this is fixed in Spring (and it might never get fixed because the fix would break backward compatibility), I use my own RequestMappingHandlerMapping which does the check:

public class PublicOnlyRequestMappingHandlerMapping extends
        RequestMappingHandlerMapping {

    protected RequestMappingInfo getMappingForMethod(Method method,
            Class<?> handlerType) {
        RequestMappingInfo info = super
                .getMappingForMethod(method, handlerType);
        if (info != null && !Modifier.isPublic(method.getModifiers())) {
            logger.warn("Ignoring non-public method with @RequestMapping annotation: "
                    + method);
            return null;
        } else {
            return info;


As you can see, the implementation is very simple. I first call the super method and then check whether the method is public so that I can generate a warning message when @RequestMapping has been used on a non-public method. If one does not care about such a message, once can check the method's access modifier fist and only invoke the super method when the investigated method is public.

In order to use the custom RequestMappingHandlerMapping, we have to use a custom implementation of WebMvcConfigurationSupport (when using Java Config):

public class CustomWebMvcConfiguration extends
        DelegatingWebMvcConfiguration {

    public RequestMappingHandlerAdapter requestMappingHandlerAdapter() {
        RequestMappingHandlerAdapter adapter = super
        return adapter;

    public RequestMappingHandlerMapping requestMappingHandlerMapping() {
        RequestMappingHandlerMapping handlerMapping = new PublicOnlyRequestMappingHandlerMapping();

        PathMatchConfigurer configurer = getPathMatchConfigurer();
        if (configurer.isUseSuffixPatternMatch() != null) {
        if (configurer.isUseRegisteredSuffixPatternMatch() != null) {
        if (configurer.isUseTrailingSlashMatch() != null) {
        if (configurer.getPathMatcher() != null) {
        if (configurer.getUrlPathHelper() != null) {

        return handlerMapping;


This implementation copies the implementation of requestMappingHandlerMapping() from the parent class, but replaces the actual implementation used with our own class. In addition to that, this configuration also overrides requestMappingHandlerAdapter() in order to the the ignoreDefaultModelOnRedirect attribute. This is the recommended setting for new Spring WebMVC applications, but it cannot be made the default in Spring because it would break backward compatibility. Of course, the two changes are completely independent, so you can choose to only implement either of them.

Why I don't like checked exceptions

One of the rather obscure features of the Java programming language is the support for checked exceptions. Most other languages running on the Java virtual machine (JVM) do not have them and most non-JVM programming languages do not have them either.

You might be surprised that I call checked exceptions "obscure" even though it is easy to understand their concept and to use them. However, I suspect that most experienced Java developers share my sentiment (if you don't, please speak up in the comments), while it is anything but obvious to beginners why checked exceptions might be problematic.

Actually, I have to admit that when I first learned the Java programming language (which must have been around Java 1.2 or 1.3), I liked the concept of checked exceptions. I typically prefer statically typed languages over dynamically typed ones because I like to have every support that a compiler can give me in statically verifying my code. Checked exceptions seem to be a logical extension of this concept, where the compiler can check whether all error conditions that might occur are actually handled by the code.

Unfortunately, the concept of checked exceptions has rather severe limitations which become apparent in larger projects. In this article, I want to explore why checked exceptions are a good idea that unfortunately fails when being put to practical use. I hope that this might be useful to Java beginners who are writing their first library and have to decide where to use checked and where to use unchecked exceptions.

Before taking a closer look at the problems with checked exceptions, we want to quickly revisit the top level of the exception hierarchy in Java and how the three different types of exceptions are handled differently.

In Java, all exceptions inherit from Throwable. There are three distinct types of exceptions: Unchecked exceptions that signal an error condition in the JVM (for example when a class cannot be loaded or a memory allocation fails) are derived from Error, which in turn is derived from Throwable. These exceptions are unchecked, which means that they can be thrown by any code without having been declared explicitly. Exception, which is also derived from Throwable, is the base class for all checked exceptions. Checked exceptions have to be declared explicitly in a method declaration. If a method declares that it throws a checked exception, the calling code must either catch this exception or must also declare that it throws the exception. Finally, there is the RuntimeException which is derived from Exception, but like Error is a base class for unchecked exceptions.

Even though both Errors and RuntimeExceptions represent unchecked exceptions, they are used for different purposes. Errors are typically thrown by the JVM only and are typically non-recoverable. For example, it is hard to recover from an error when loading a class, because this is typical caused by a problem with the class file. Therefore, Errors are rarely caught but will typically lead to program termination. Even if they are caught, the program will often behave erratically after getting such an exception (everyone who has experienced an OutOfMemoryError in Eclipse knows what I am talking about). RuntimeExceptions, on the other hand, often signal errors in the program's logic. For example, a NoSuchElementException happens when trying to get a non-existing element from a List.

Exception is used for checked exceptions which are typically caused by an exceptional situation (not necessarily considered an error). For example, an IOException is triggered when an I/O operation cannot be finished. Such a situation might not necessarily indicate an error, because it can simply happen when trying to access a resouce that no longer exists (e.g. a network connection might have been closed by a peer).

In summary, exceptions of type Error are typically only thrown by the JVM, exceptions of type RuntimeException are thrown by Java code, but usually do not have to be expected, and exceptions of type Exception (checked exceptions) have to be expected and need to be handled somehow.

In my opinion, there are two flaws in this concept: a minor obvious one and two major less obvious ones. The minor flaw is the class hierarchy. RuntimeException, the base class for unchecked exceptions, is derived from Exception, the base class for checked exceptions. It would be more reasonable to derive RuntimeException from Throwable directly, but this design flaw does not cause any actual trouble.

The first major flaw is that the distinction between exceptions that have to be expected (and thus should be checked exceptions) and exceptions that do not have to be expected (and thus should be unchecked exceptions) is not always clear. What if an exception has to be expected but cannot be reasonably handled locally? For example, a FileNotFoundException might be non-recoverable if an important configuration or database file is missing. There are three potential solutions for such a case: We can let the exception bubble up the stack (which means that now a lot of methods have to declare that they throw a FileNotFoundException), we can wrap it in a different kind of exception (e.g. in a MyLibraryException), or we can wrap it in a RuntimeException.

The first solution is problematic because of the second flaw described in the next paragraph. The second solution (which is the one recommended in the API docs) is not perfect either, because it means that information about the actual cause is lost. The actual cause can still be attached to the new exception (since Java 1.4), however it gets harder to catch the individual cause because a catch clause cannot test the cause of an exception and there is typically no documentation about which kind of exceptions might be wrapped in another exception. The third solution (which is very common) converts the checked exception to an unchecked exception, but like the second one, information about the actual cause is now more difficult to access.

The second major flaw is related to the concept of having checked exceptions bubble up the stack. This approach does not work well when using an inversion of control (IoC) pattern. This pattern is very prominent (for reasons that are outside the scope of this article), but cannot be properly used with checked exceptions. The generic (framework) code has to know which checked exceptions are thrown by user code that is called from the generic code. Obviously, it cannot, so the interface to the user code has to specify no checked exceptions at all or a checked exception specific to the calling code. This means that the user code has to wrap its exceptions in the checked exception specified by the framework or in an unchecked exception. This leaves the problem described earlier, where the code calling the framework code would now need to unwrap exceptions and handle their causes, even though it cannot (always) know which exceptions might be causing the exception that is caught.

Now, let's see how this changes when we consistently use unchecked exceptions instead of checked exceptions. Unchecked exceptions can easily bubble up through framework code, so we can catch them where we want to, but we do not have to catch them where we cannot handle them anyway. We do not lose any information about the exception, meaning we can still catch a very specific exception at a rather high level.

Obviously, it is important to document which unchecked exceptions are thrown by a method, so that calling code can know which exceptions it might want to catch. If there is framework code in between, it might not always be obvious which exceptions can occur, however this is not worse than with an exceptions of an unknown type that is wrapped in a checked exception.

The Spring framework, for example, chooses to use unchecked exceptions for most error conditions. I tend to use the same approach when I write library code.

It is tempting to use checked exceptions in order to force the user of a library method to handle a certain situation, but this rarely works. The InterruptedException thrown by many standard library methods is a good example: This exception is thrown when a thread is interrupted while it is blocked waiting for some event. This makes sense, because the thread should not wait any longer, when it has been interrupted. However, it is very common to see code like the following:

try {
} catch (InterruptedException e) {
    // Ignore the exception

This is very dangerous because Java will clear the interruption status of a thread when throwing an InterruptedException. This means that code on a higher level, that checks whether the thread has been interrupted (you will often find this as a loop condition), will never know that the thread has been interrupted.

Therefore, the correct way to handle an InterruptedException is the following:

try {
} catch (InterruptedException e) {

This will mark the current thread as interrupted, so that code on a higher level will get correct results when checking whether the thread has been interrupted.

Now, imagine that InterruptedException was an unchecked exception: Code that did not want to deal with it simply would not. This would result in the exception bubbling up until it is handled explicitly or the current thread is terminated. In most cases, this is exactly what the programmer wants (ensure that the thread terminates when it is interrupted). In the few cases where one wants to react on the InterruptedException in a different way than terminating the thread, one could still catch the exception explicitly. Everyone doing this would probably be aware of the fact that the thread has to be interrupted again if other code should see that it has been interrupted. So use of an unchecked instead of a checked exception would actually result in better code in most cases.

Actually, the tendency to abandon checked exceptions can also be seen in other languages. C++ had a feature that is simlar to checked exceptions in C++ 98 and 03: A function can explicitly declare which exceptions it throws and the compiler will enforce that it does not throw any other exceptions (when it does, the program terminates). In C++ 11, this feature has been deprecated. Instead, C++ now offers the nothrow keyword in order to specify that a function does never throw any kind of exception. This is a consequence of the experience that it is rarely practical to explicitly specify which exceptions a function might throw.

In summary, checked exceptions are a nice concept that unfortunately is not useful in practice. Therefore, it is my opinion that they should be avoided in general and unchecked exceptions should be preferred where feasible.