Kernel-based virtual machine (KVM)

Last modified by Sebastian Marsching on 2023/05/16 20:12

Migrating Windows Server 2003 from VMware Server 1.x to KVM
Generating a random MAC address
Mounting a Virtual Disk Partition in the Host System
Graceful Shutdown
- Linux
- Windows
Shutdown virtual machines on host system shutdown
- Ubuntu 12.04 LTS (Precise Pangolin)
- Ubuntu 16.04 LTS (Xenial Xerus)
Choosing the disk-cache settings
Problems and their solutions
Using virt-manager with OS X
Feeding entropy from the host to the VM
Installing or upgrading to Windows 10
I/O errors in VM caused by GRUB OS prober on host system
Enabling trim support for SSDs or sparse VM disk images
Create a QCOW2 image that is not sparse
Using qemu-img to make an image available as a device node
Recovering unused space from a QCOW2 image

Migrating Windows Server 2003 from VMware Server 1.x to KVM

Step 1

Create KVM domain with ACPI and disk(s) of same size as origin VM and with the same MAC address. Example configuration (using libvirt):

<domain type='kvm'>
<name>myVirtualMachine</name>
<uuid>345b2956-c610-4a0e-94b4-a96c5ebd4a0f</uuid>
<memory>524288</memory>
<currentMemory>524288</currentMemory>
<vcpu>2</vcpu>
<os>
   <type>hvm</type>
   <boot dev='hd'></boot>
</os>
<features>
   <acpi></acpi>
</features>
<clock offset='localtime'></clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
   <emulator>/usr/bin/kvm</emulator>
   <disk type='block' device='disk'>
     <source dev='/dev/vg0/myVirtualMachine-disk1'></source>
     <target dev='hda' bus='ide'></target>
   </disk>


   <interface type='bridge'>
     <mac address='00:0c:29:3e:9a:d4'></mac>
     <source bridge='br0'></source>
   </interface>
   <input type='tablet' bus='usb'/>
   <input type='mouse' bus='ps2'/>
   <graphics type='vnc' port='-1' listen='127.0.0.1'></graphics>
</devices>
</domain>

Step 2

Prepare the old virtual machine: Download and execute the MergeIDE-Tool (http://www.virtualbox.org/attachment/wiki/Migrate_Windows/MergeIDE.zip). Remove VMware Tools and (if necessary) change the HAL (see http://support.microsoft.com/kb/309283/en) for the configuration presented above (two VCPUs), the halmacpi.dll worked fine for me.

Step 3

Boot the original VM using a Linux live CD and use dd to copy the harddisk data to the new VM:

dd if=/dev/sda | ssh -C root@kvmhost "dd of=/dev/vg0/myVirtualMachine-disk1"

Step 4

Shutdown the original VM and start the new VM. You might need on or two reboots before Windows has installed all new hardware drivers, but then everything should run perfectly.

Generating a random MAC address

Random MAC address generator script

Mounting a Virtual Disk Partition in the Host System

In order to mount a specific partition of a virtual machine's disk image, you have to use the loop option and specify the offset, where the partition begins in the image.

First run parted <disk image> unit B print in order to find the offset of the partition (this is the number in the first column). Then run mount -oloop,offset=<offset> <disk image> <mount point> in order to mount the partition.

I found this solution on linuxwiki.de.

Graceful Shutdown

In order to shutdown a virtual machine using virt-manager, you have to prepare the virtual machine. QEMU-KVM sends an ACPI signal to the virtual machine, which has to be caught and processed.

Linux

Install the acpid package. This daemon will catch the ACPI signal and initiate the shutdown.

Windows

By default, Windows will not shutdown, unless a user is logged in to the local console. In order to make Windows shutdown anytime, you have to open the local security policies (how to open them depends on the Windows version) go the security options and activate the "Shutdown: Allow system to be shut down without having to log on" option (in German versions "Herunterfahren: Herunterfahren des Systems ohne Anmeldung zulassen"). This will allow Windows to be shutdown from the local console or by pushing the (virtual) power button, even if no user is logged on.

However, there is still a problem, if a user is loggen on to the system, when the shutdown is initiated. Windows will present a dialog on the local console asking whether the shutdown should proceed. To get rid of this dialog (which is important if you want to automate shutdown from a script), you have to go to the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows and change the value of ShutdownWarningDialogTimeout from 0xffffffff to 0x00000001. If the value does not exist yet, create a new DWORD value with this name. I found this information here.

Shutdown virtual machines on host system shutdown

Ubuntu 12.04 LTS (Precise Pangolin)

On Ubuntu 12.04 LTS (Precise Pangolin) the shutdown scripts already take care of stopping the virtual machines (at least in the newest version of the libvirt-bin package). However, by default the script will only wait 30 seconds for the VMs to shutdown. Depending on the services running in the VM, this can be too short.

In this case, you have to create a file /etc/init/libvirt-bin.override with the following content:

# extend wait time for vms to shut down to 4 minutes
env libvirtd_shutdown_timeout=240

You could choose a longer timeout here, however the init script /etc/init.d/sendsigs will only wait up to 5 minutes for Upstart services to stop. Therefore, you will have to change the timeout in this script as well, if you want to use a timeout longer than approximately 4.5 minutes in the libvirt-bin.override file.

Ubuntu 16.04 LTS (Xenial Xerus)

In Ubuntu 16.04 LTS, the timeout for shutting down the virtual machines can be changed by editing /etc/default/libvirt-guests and changing SHUTDOWN_TIMEOUT.

However, there is the problem that each VM will only receive one shutdown request. For Linux VMs this is usually fine, but Windows VMs sometimes do not react to the first request, resulting in the VM being killed forcibly after the timeout. The only way to fix this problem is modifying the shutdown script so that it sends shutdown requests again and again until a VM has shutdown or the timeout has been reached.

These changes could be made directly to /usr/lib/libvirt/libvirt-guests.sh, but they would be overwritten on package upgrades. For this reason, I made a copy of this script and placed it in /usr/local/lib/libvirt/libvirt-guests.sh. To this copy, I applied the following patch:

--- /usr/lib/libvirt/libvirt-guests.sh 2016-10-10 09:33:38.000000000 +0200
+++ /usr/local/lib/libvirt/libvirt-guests.sh    2016-11-08 11:58:33.000000000 +0100
@@ -339,6 +339,19 @@
     retval run_virsh "$uri" shutdown "$guest" > /dev/null
}

+# shutdown_guest_retry URI GUEST
+# Start a ACPI shutdown of GUEST on URI. This function returns after the command
+# was issued to libvirt to allow parallel shutdown.
+# This command does the same as shutdown_guest_async, but does not print a
+# message.
+shutdown_guest_retry()
+{
+    uri=$1
+    guest=$2
+
+    retval run_virsh "$uri" shutdown "$guest" > /dev/null
+}
+
# guest_count GUEST_LIST
# Returns number of guests in GUEST_LIST
guest_count()
@@ -407,6 +420,14 @@
         format=$(eval_gettext "Waiting for %d guests to shut down\n")
     fi
     while [ -n "$on_shutdown" ] || [ -n "$guests" ]; do
+        guests_retry=$on_shutdown
+        while [ -n "$guests_retry" ]; do
+            set -- $guests_retry
+            guest=$1
+            shift
+            guests_retry=$*
+            shutdown_guest_retry "$uri" "$guest"
+        done
         while [ -n "$guests" ] &&
               [ $(guest_count "$on_shutdown") -lt "$PARALLEL_SHUTDOWN" ]; do
             set -- $guests

In order to use this script instead of the standard script, the configuration for libvirt-guests.service has to be overridden. This can be done by creating the file /etc/systemd/system/libvirt-guests.service.d/custom-stop-script.conf with the following contents:

[Service]
ExecStop=
ExecStop=/usr/local/lib/libvirt/libvirt-guests.sh stop

Choosing the disk-cache settings

Three different cache modes can be configured for each storage device in libvirt: none, writeback and writethrough. However, these names are a bit misleading.

none: Data will be written to the disk's (or disk controller's) cache before reporting success.
writeback: Data will just be written to the in-memory block-device cache before reporting success.
writethrough: Data will always be written directly to disk.

There are two excellent articles explaining these options in more detail.

In up-to-date versions of KVM, the caches will also be bypassed completely (equivalent to the writethrough) when a synchronous write is explicitly requested for a specific write operation. In order for this to work, the guest's file system and operation system as well as the host's operating system (and if the virtual disk is stored on a file system, also the host's file system) have to support this synchronous write operations. To my knowledge, when using Ubuntu 12.04 LTS, a supported file system, and a drive that that correctly implements FLUSH CACHE, this holds true.

I usually prefer the none caching mode, because it limits the amount of damage that can occur for applications that do not sync correctly when writing data (compared to the writeback mode). The writethrough mode is even safer, but there is a significant penalty on performance and it might still not be a 100% safe, if the hard disk is faking cache flushes. This might sound like a theoretical issue, but on the internet there are stories about SSDs not flushing their cache when they are requested to do so.

On at least one host system, I experienced the problem that the load on the host system would increase significantly (by a factor of 10-20) when setting the caching mode to none instead of using the default (which would be the same as writeback). In this case, it helped to set the I/O mode to native instead of using the default (threads) as suggested on Server Fault. Unlike suggested in the answer on Server Fault, I did not have to use the cfq scheduler on the host system. The default (deadline) worked as well. There is also a presentation from RedHat indicating that the native I/O mode is better than the threads mode for many cases. However, there are a few cases where the threads mode might be better.

Problems and their solutions

Windows complains about parallel port service

If Windows complains that the parallel port service could not be started and you have the following message in the event log

The Parallel port driver service failed to start due to the following error: The service cannot be started, either because it is disabled or because it has no enabled devices associated with it.

or (in the German version)

Der Dienst "Treiber für parallelen Anschluss" wurde aufgrund folgenden Fehlers nicht gestartet: Der angegebene Dienst kann nicht gestartet werden. Er ist deaktiviert oder nicht mit aktivierten Geräten verbunden.

you should just disable the "Parport" service. I found the necessary steps here. You have to change the registry key Start in the registry path HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Parport from 3 to 4. This will disable the service.

DHCP server or client within a VM do not work correctly / UDP Checksum Errors

You might have the problem that a DHCP server or client (or both e.g. DHCP relay service) might not run correctly in a VM. In the log files you will find error messages about wrong UDP checksums (e.g. dhcrelay: 3 bad udp checksums in 5 packets).

I first discovered this problem after upgrading a KVM host to Ubuntu 12.04 LTS Precise Pangolin. When I switched the VM running the DHCP relay agent from virtio network to normal network, the problem disappeared and suddenly the DHCP relay was working correctly. However, later I found out that this effect was very misleading. In fact the problem was that at about the same time the network structure was reconfigured and thus the DHCP relay agent was receiving the answers from the DHCP server on a different interface than before, which was not listed in the configuration file. Surprisingly, when not using virtio the DHCP relay agent was still accepting these packets. Both interfaces were different VLANs on the same physical interface. Therefore I suspect that with the virtio driver the internal handling of VLANs is slightly different and thus the DHCP relay agent does not accept the DHCP Offer packets any longer. When adding the other (VLAN) interface to the list of interfaces, DHCP worked again using the virtio driver.

However, the messages about bad UDP checksums still appear in the log now and then. It seems like dhcrelay3 gets confused by its own UDP packets, which from the perspective of the DHCP relay agent in fact do have a wrong UDP checksum. However, this does not affect operation.

However there is a bug report that actually claims problems with DHCP to be connected to the virtio driver. I cannot tell whether in some situations there is in fact a problem with the virtio driver.

Just be warned: If you see messages from the DHCP relay agent complaining about bad UDP checksums, it might be related to your problem. But it might also be totally unrelated and just lead you in the wrong direction.

The escape character for exiting virsh console does not work when using a German keyboard

When using a German keyboard, pressing CTRL + AltGr + "9" (AltGr + "9" gives "]" on a German keyboard) might not work for exiting the virsh console mode, although the exscape character is "^]". In this case try pressing CTRL + "+" (the plus sign on the normal part of the keyboard, not the numpad). This should work as an escape character. I found this solution here.

Using virt-manager with OS X

While virt-manager cannot be easily run on OS X itself (there are some attempts to build it with Homebrew, but it has a lot of dependencies, so it is not so easy), running it on a Linux host and using X11 forwarding works perfectly fine. However, there is a small problem: Once the mouse pointer has been "caught" by a virtual machine, it cannot be released. Instead, the program has to be killed. The solution is simple: The option "Option keys send Alt_L and Alt_R" in the XQuartz settings has to be enabled (I got the idea here). As a consequence, the option keys do not act as modifier keys any longer. This means, that certain characters are not available. Therefore, you might want to only activate this option while using virt-manager.

Feeding entropy from the host to the VM

Recent versions of KVM and libvirt allow you to feed entropy from the host system into a virtual machine. This can be very useful if a virtual machine needs a lot of entropy (in particular at booting) and on the host you have a source for this entropy (e.g. haveged). In the devices section of the VM configuration file, you can add the following device:

<rng model='virtio'>
     <rate bytes='4096' period='10000'></rate>
     <backend model='random'>/dev/random</backend>
   </rng>

The rate specified how much entropy a VM is allowed to drain from the host. In this example, a VM can drain up to 4096 bytes in an interval of 10 seconds. This option ensures that a single VM cannot drain all entropy from the host.

In the virtual machine, a new device /dev/hwrng turns up through which the entropy can be read. So, you only need a tool that feeds the entropy from this device to the kernel's entropy pool. On Ubuntu, the daemon from the rng-tools package can do this job.

Installing or upgrading to Windows 10

When installing Windows 10 inside a Linux KVM VM (or upgrading an existing system to Windows 10) the installer might crash with an "SYSTEM THREAD EXCEPTION NOT HANDLED" error after rebooting. This problem can be solved by setting the CPU model to "core2duo" as described in the TechNet Forums. For example the following entry in the VM's configuration file might help:

<cpu mode='custom' match='exact'>
   <model fallback='allow'>core2duo</model>
   <vendor>Intel</vendor>
   <feature policy='require' name='tm2'></feature>
   <feature policy='require' name='est'></feature>
   <feature policy='require' name='monitor'></feature>
   <feature policy='require' name='ds'></feature>
   <feature policy='require' name='ss'></feature>
   <feature policy='require' name='vme'></feature>
   <feature policy='require' name='dtes64'></feature>
   <feature policy='require' name='rdtscp'></feature>
   <feature policy='require' name='ht'></feature>
   <feature policy='require' name='dca'></feature>
   <feature policy='require' name='pbe'></feature>
   <feature policy='require' name='tm'></feature>
   <feature policy='require' name='pdcm'></feature>
   <feature policy='require' name='vmx'></feature>
   <feature policy='require' name='ds_cpl'></feature>
   <feature policy='require' name='xtpr'></feature>
   <feature policy='require' name='acpi'></feature>
</cpu>

I/O errors in VM caused by GRUB OS prober on host system

When the GRUB update scripts runs the os-prober utility on the virtual-machine host, this can cause I/O errors in the VMs when the os-prober script tries to mount these VMs logical volumes. This kind of problem will typically manifest itself with symptoms like partitions in the VM being remounted read-only and the log containing messages like end_request: I/O error, dev vda, sector 12345.

There is a Debian bug report for this issue. As suggested in the bug report, a workaround is disabling os-prober by adding the following line to /etc/default/grub on the host system:

# Disable os-prober. Trying to mount VM disks can cause I/O errors in the VM.
# See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788062.
GRUB_DISABLE_OS_PROBER=true

Enabling trim support for SSDs or sparse VM disk images

In order to make sure that a "trim" command on a block device in the guest is forwarded to the respective host device, forwarding has to be enabled specifically. This is done by setting the discard option of the driver tag in the device tag of the domain XML file to unmap (see here). It seems like trim support is currently (May 2018) only supported for VirtIO SCSI devices, not for traditional VirtIO devices.

Of course, the "trim" command still needs to be used in the guest. This can be done either by mounting the filesystem(s) with the discard option or by periodically running fstrim. Ubuntu has chosen the later option and runs fstrim once a week (through a script in /etc/cron.weekly).

Create a QCOW2 image that is not sparse

When creating a new QCOW2 storage volume using virsh or virt-manager, it is by default created as a sparse volume. libvirt does not seem to provide any options for choosing falloc or full allocation. Using a sparse image file can result in a performance impact and might thus not be desirable. There are two ways to avoid this:

The first option is to create the image manually using qemu-img and passing preallocation=full or preallocation=falloc. Using falloc allocation should result in approximately the same performance as using full allocation, but the allocation should happen a lot quicker. An example for creating such a disk image:

qemu-img create -f qcow2 -o preallocation=falloc /var/lib/libvirt/images/example.qcow2 128M

The second option is running fallocate on the image file after the disk image has been created. In this case, the disk image can be created with the libvirt tools. However, one might also change other options (like cluster_size) for the image, so using qemu-img directly might not be so bad after all.

Using qemu-img to make an image available as a device node

modprobe nbd
qemu-nbd --bind=127.0.0.1 --nocache --aio=native --discard=unmap --connect=/dev/nbd0 /path/to/img

Obviously, the options used in the command above are just an example and can be different.

Recovering unused space from a QCOW2 image

A QCOW2 image often grows significantly over time, sometimes even larger than the size of the virtual disk represented by this image. In this case, creating a fresh image and copying the data can help. This can be done with the qemu-img conv command. However, the virt-sparsify command offers a much more convenient way of achieving this goal and can even work in-place, without having to copy the image.