Last modified by Sebastian Marsching on 2023/05/16 20:12

Show last authors
1 {{toc/}}
2
3 # Migrating Windows Server 2003 from VMware Server 1.x to KVM
4
5 ## Step 1
6
7 Create KVM domain with ACPI and disk(s) of same size as origin VM and with the same MAC address. Example configuration (using libvirt):
8
9 ```xml
10 <domain type='kvm'>
11 <name>myVirtualMachine</name>
12 <uuid>345b2956-c610-4a0e-94b4-a96c5ebd4a0f</uuid>
13 <memory>524288</memory>
14 <currentMemory>524288</currentMemory>
15 <vcpu>2</vcpu>
16 <os>
17 <type>hvm</type>
18 <boot dev='hd'></boot>
19 </os>
20 <features>
21 <acpi></acpi>
22 </features>
23 <clock offset='localtime'></clock>
24 <on_poweroff>destroy</on_poweroff>
25 <on_reboot>restart</on_reboot>
26 <on_crash>destroy</on_crash>
27 <devices>
28 <emulator>/usr/bin/kvm</emulator>
29 <disk type='block' device='disk'>
30 <source dev='/dev/vg0/myVirtualMachine-disk1'></source>
31 <target dev='hda' bus='ide'></target>
32 </disk>
33 <!--
34 <disk type='file' device='cdrom'>
35 <source file='/root/ubuntu-8.10-desktop-i386.iso'></source>
36 <target dev='hdc' bus='ide'></target>
37 </disk>
38 -->
39 <!--
40 <disk type='block' device='cdrom'>
41 <source dev='/dev/cdrom'></source>
42 <target dev='hdc' bus='ide'></target>
43 </disk>
44 -->
45 <interface type='bridge'>
46 <mac address='00:0c:29:3e:9a:d4'></mac>
47 <source bridge='br0'></source>
48 </interface>
49 <input type='tablet' bus='usb'/>
50 <input type='mouse' bus='ps2'/>
51 <graphics type='vnc' port='-1' listen='127.0.0.1'></graphics>
52 </devices>
53 </domain>
54 ```
55
56 ## Step 2
57
58 Prepare the old virtual machine: Download and execute the MergeIDE-Tool ([http://www.virtualbox.org/attachment/wiki/Migrate_Windows/MergeIDE.zip](http://www.virtualbox.org/attachment/wiki/Migrate_Windows/MergeIDE.zip)). Remove VMware Tools and (if necessary) change the HAL (see [http://support.microsoft.com/kb/309283/en](http://support.microsoft.com/kb/309283/en)) for the configuration presented above (two VCPUs), the `halmacpi.dll` worked fine for me.
59
60 ## Step 3
61
62 Boot the original VM using a Linux live CD and use dd to copy the harddisk data to the new VM:
63
64 ```bash
65 dd if=/dev/sda | ssh -C root@kvmhost "dd of=/dev/vg0/myVirtualMachine-disk1"
66 ```
67
68 ## Step 4
69
70 Shutdown the original VM and start the new VM. You might need on or two reboots before Windows has installed all new hardware drivers, but then everything should run perfectly.
71
72 # Generating a random MAC address
73
74 [Random MAC address generator script](http://www.marsching.com/2009/mac-address-generator/)
75
76 # Mounting a Virtual Disk Partition in the Host System
77
78 In order to mount a specific partition of a virtual machine's disk image, you have to use the `loop` option and specify the offset, where the partition begins in the image.
79
80 First run `parted <disk image> unit B print` in order to find the offset of the partition (this is the number in the first column). Then run `mount -oloop,offset=<offset> <disk image> <mount point>` in order to mount the partition.
81
82 I found this solution on [linuxwiki.de](http://www.linuxwiki.de/QEMU).
83
84 # Graceful Shutdown
85
86 In order to shutdown a virtual machine using virt-manager, you have to prepare the virtual machine. QEMU-KVM sends an ACPI signal to the virtual machine, which has to be caught and processed.
87
88 ## Linux
89
90 Install the `acpid` package. This daemon will catch the ACPI signal and initiate the shutdown.
91
92 ## Windows
93
94 By default, Windows will not shutdown, unless a user is logged in to the local console. In order to make Windows shutdown anytime, you have to open the local security policies (how to open them depends on the Windows version) go the security options and activate the "Shutdown: Allow system to be shut down without having to log on" option (in German versions "Herunterfahren: Herunterfahren des Systems ohne Anmeldung zulassen"). This will allow Windows to be shutdown from the local console or by pushing the (virtual) power button, even if no user is logged on.
95
96 However, there is still a problem, if a user is loggen on to the system, when the shutdown is initiated. Windows will present a dialog on the local console asking whether the shutdown should proceed. To get rid of this dialog (which is important if you want to automate shutdown from a script), you have to go to the registry key `HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows` and change the value of `ShutdownWarningDialogTimeout` from `0xffffffff` to `0x00000001`. If the value does not exist yet, create a new DWORD value with this name. I found this information [here](http://kerneltrap.org/mailarchive/linux-kvm/2010/1/26/6257297/thread).
97
98 # Shutdown virtual machines on host system shutdown
99
100 ## Ubuntu 12.04 LTS (Precise Pangolin)
101
102 On Ubuntu 12.04 LTS (Precise Pangolin) the shutdown scripts already take care of stopping the virtual machines (at least in the newest version of the libvirt-bin package). However, by default the script will only wait 30 seconds for the VMs to shutdown. Depending on the services running in the VM, this can be too short.
103
104 In this case, you have to create a file `/etc/init/libvirt-bin.override` with the following content:
105
106 ```
107 # extend wait time for vms to shut down to 4 minutes
108 env libvirtd_shutdown_timeout=240
109 ```
110
111 You could choose a longer timeout here, however the init script `/etc/init.d/sendsigs` will only wait up to 5 minutes for Upstart services to stop. Therefore, you will have to change the timeout in this script as well, if you want to use a timeout longer than approximately 4.5 minutes in the `libvirt-bin.override` file.
112
113 ## Ubuntu 16.04 LTS (Xenial Xerus)
114
115 In Ubuntu 16.04 LTS, the timeout for shutting down the virtual machines can be changed by editing `/etc/default/libvirt-guests` and changing `SHUTDOWN_TIMEOUT`.
116
117 However, there is the problem that each VM will only receive one shutdown request. For Linux VMs this is usually fine, but Windows VMs sometimes do not react to the first request, resulting in the VM being killed forcibly after the timeout. The only way to fix this problem is modifying the shutdown script so that it sends shutdown requests again and again until a VM has shutdown or the timeout has been reached.
118
119 These changes could be made directly to `/usr/lib/libvirt/libvirt-guests.sh`, but they would be overwritten on package upgrades. For this reason, I made a copy of this script and placed it in `/usr/local/lib/libvirt/libvirt-guests.sh`. To this copy, I applied the following patch:
120
121 ```patch
122 --- /usr/lib/libvirt/libvirt-guests.sh 2016-10-10 09:33:38.000000000 +0200
123 +++ /usr/local/lib/libvirt/libvirt-guests.sh 2016-11-08 11:58:33.000000000 +0100
124 @@ -339,6 +339,19 @@
125 retval run_virsh "$uri" shutdown "$guest" > /dev/null
126 }
127
128 +# shutdown_guest_retry URI GUEST
129 +# Start a ACPI shutdown of GUEST on URI. This function returns after the command
130 +# was issued to libvirt to allow parallel shutdown.
131 +# This command does the same as shutdown_guest_async, but does not print a
132 +# message.
133 +shutdown_guest_retry()
134 +{
135 + uri=$1
136 + guest=$2
137 +
138 + retval run_virsh "$uri" shutdown "$guest" > /dev/null
139 +}
140 +
141 # guest_count GUEST_LIST
142 # Returns number of guests in GUEST_LIST
143 guest_count()
144 @@ -407,6 +420,14 @@
145 format=$(eval_gettext "Waiting for %d guests to shut down\n")
146 fi
147 while [ -n "$on_shutdown" ] || [ -n "$guests" ]; do
148 + guests_retry=$on_shutdown
149 + while [ -n "$guests_retry" ]; do
150 + set -- $guests_retry
151 + guest=$1
152 + shift
153 + guests_retry=$*
154 + shutdown_guest_retry "$uri" "$guest"
155 + done
156 while [ -n "$guests" ] &&
157 [ $(guest_count "$on_shutdown") -lt "$PARALLEL_SHUTDOWN" ]; do
158 set -- $guests
159 ```
160
161 In order to use this script instead of the standard script, the configuration for `libvirt-guests.service` has to be overridden. This can be done by creating the file `/etc/systemd/system/libvirt-guests.service.d/custom-stop-script.conf` with the following contents:
162
163 ```
164 [Service]
165 ExecStop=
166 ExecStop=/usr/local/lib/libvirt/libvirt-guests.sh stop
167 ```
168
169 # {{id name="disk-cache-settings"/}}Choosing the disk-cache settings
170
171 Three different cache modes can be configured for each storage device in libvirt: _none_, _writeback_ and _writethrough_. However, these names are a bit misleading.
172
173 * _none_: Data will be written to the disk's (or disk controller's) cache before reporting success.
174 * _writeback_: Data will just be written to the in-memory block-device cache before reporting success.
175 * _writethrough_: Data will always be written directly to disk.
176
177 There are [two](http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaatbpkvmguestcache.htm) [excellent](http://www.ilsistemista.net/index.php/virtualization/23-kvm-storage-performance-and-cache-settings-on-red-hat-enterprise-linux-62.html) articles explaining these options in more detail.
178
179 In up-to-date versions of KVM, the caches will also be bypassed completely (equivalent to the writethrough) when a synchronous write is explicitly requested for a specific write operation. In order for this to work, the guest's file system and operation system as well as the host's operating system (and if the virtual disk is stored on a file system, also the host's file system) have to support this synchronous write operations. To my knowledge, when using Ubuntu 12.04 LTS, a supported file system, and a drive that that correctly implements `FLUSH CACHE`, this holds true.
180
181 I usually prefer the _none_ caching mode, because it limits the amount of damage that can occur for applications that do not sync correctly when writing data (compared to the _writeback_ mode). The _writethrough_ mode is even safer, but there is a significant penalty on performance and it might still not be a 100% safe, if the hard disk is faking cache flushes. This might sound like a theoretical issue, but on the internet there are stories about SSDs not flushing their cache when they are requested to do so.
182
183 On at least one host system, I experienced the problem that the load on the host system would increase significantly (by a factor of 10-20) when setting the caching mode to _none_ instead of using the default (which would be the same as _writeback_). In this case, it helped to set the I/O mode to _native_ instead of using the default (_threads_) as [suggested on Server Fault](http://serverfault.com/a/427436/261808). Unlike suggested in the answer on Server Fault, I did not have to use the _cfq_ scheduler on the host system. The default (_deadline_) worked as well. There is also a [presentation from RedHat](http://www.slideshare.net/pradeepkumarsuvce/qemu-disk-io-which-performs-better-native-or-threads) indicating that the _native_ I/O mode is better than the _threads_ mode for many cases. However, there are a few cases where the _threads_ mode might be better.
184
185 # Problems and their solutions
186
187 ## Windows complains about parallel port service
188
189 If Windows complains that the parallel port service could not be started and you have the following message in the event log
190
191 * The Parallel port driver service failed to start due to the following error: The service cannot be started, either because it is disabled or because it has no enabled devices associated with it.
192
193 or (in the German version)
194
195 * Der Dienst "Treiber für parallelen Anschluss" wurde aufgrund folgenden Fehlers nicht gestartet: Der angegebene Dienst kann nicht gestartet werden. Er ist deaktiviert oder nicht mit aktivierten Geräten verbunden.
196
197 you should just disable the "Parport" service. I found the necessary steps [here](http://www.itexperience.net/2009/08/13/fix-the-parallel-port-driver-service-failed-to-start/). You have to change the registry key `Start` in the registry path `HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Parport` from `3` to `4`. This will disable the service.
198
199 ## DHCP server or client within a VM do not work correctly / UDP Checksum Errors
200
201 You might have the problem that a DHCP server or client (or both e.g. DHCP relay service) might not run correctly in a VM. In the log files you will find error messages about wrong UDP checksums (e.g. `dhcrelay: 3 bad udp checksums in 5 packets`).
202
203 I first discovered this problem after upgrading a KVM host to Ubuntu 12.04 LTS Precise Pangolin. When I switched the VM running the DHCP relay agent from virtio network to normal network, the problem disappeared and suddenly the DHCP relay was working correctly. However, later I found out that this effect was very misleading. In fact the problem was that at about the same time the network structure was reconfigured and thus the DHCP relay agent was receiving the answers from the DHCP server on a different interface than before, which was not listed in the configuration file. Surprisingly, when not using virtio the DHCP relay agent was still accepting these packets. Both interfaces were different VLANs on the same physical interface. Therefore I suspect that with the virtio driver the internal handling of VLANs is slightly different and thus the DHCP relay agent does not accept the DHCP Offer packets any longer. When adding the other (VLAN) interface to the list of interfaces, DHCP worked again using the virtio driver.
204
205 However, the messages about bad UDP checksums still appear in the log now and then. It seems like dhcrelay3 gets confused by its own UDP packets, which from the perspective of the DHCP relay agent in fact do have a wrong UDP checksum. However, this does not affect operation.
206
207 However there is a [bug report](https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/930962) that actually claims problems with DHCP to be connected to the virtio driver. I cannot tell whether in some situations there is in fact a problem with the virtio driver.
208
209 Just be warned: If you see messages from the DHCP relay agent complaining about bad UDP checksums, it might be related to your problem. But it might also be totally unrelated and just lead you in the wrong direction.
210
211 ## The escape character for exiting virsh console does not work when using a German keyboard
212
213 When using a German keyboard, pressing CTRL + AltGr + "9" (AltGr + "9" gives "]" on a German keyboard) might not work for exiting the virsh console mode, although the exscape character is "^]". In this case try pressing CTRL + "+" (the plus sign on the normal part of the keyboard, not the numpad). This should work as an escape character. I found this solution [here](http://blog.frosty-geek.net/2010/11/putty-and-better-known-as-escape.html).
214
215 # Using virt-manager with OS X
216
217 While virt-manager cannot be easily run on OS X itself (there are some attempts to build it with [Homebrew](http://brew.sh/), but it has a lot of dependencies, so it is not so easy), running it on a Linux host and using X11 forwarding works perfectly fine. However, there is a small problem: Once the mouse pointer has been "caught" by a virtual machine, it cannot be released. Instead, the program has to be killed. The solution is simple: The option "Option keys send Alt_L and Alt_R" in the XQuartz settings has to be enabled (I got the idea [here](http://maxheapsize.com/2013/06/03/using-xquartz-virt-manager-and-macosx/)). As a consequence, the option keys do not act as modifier keys any longer. This means, that certain characters are not available. Therefore, you might want to only activate this option while using virt-manager.
218
219 # Feeding entropy from the host to the VM
220
221 Recent versions of KVM and libvirt allow you to feed entropy from the host system into a virtual machine. This can be very useful if a virtual machine needs a lot of entropy (in particular at booting) and on the host you have a source for this entropy (e.g. haveged). In the devices section of the VM configuration file, you can add the following device:
222
223 ```xml
224 <rng model='virtio'>
225 <rate bytes='4096' period='10000'></rate>
226 <backend model='random'>/dev/random</backend>
227 </rng>
228 ```
229
230 The rate specified how much entropy a VM is allowed to drain from the host. In this example, a VM can drain up to 4096 bytes in an interval of 10 seconds. This option ensures that a single VM cannot drain all entropy from the host.
231
232 In the virtual machine, a new device `/dev/hwrng` turns up through which the entropy can be read. So, you only need a tool that feeds the entropy from this device to the kernel's entropy pool. On Ubuntu, the daemon from the `rng-tools` package can do this job.
233
234 # Installing or upgrading to Windows 10
235
236 When installing Windows 10 inside a Linux KVM VM (or upgrading an existing system to Windows 10) the installer might crash with an "SYSTEM THREAD EXCEPTION NOT HANDLED" error after rebooting. This problem can be solved by setting the CPU model to "core2duo" as described in the [TechNet Forums](https://social.technet.microsoft.com/Forums/en-US/695c8997-52cf-4c30-a3f7-f26a40dc703a/failed-install-of-build-10041-in-the-kvm-virtual-machine-system-thread-exception-not-handled?forum=WinPreview2014Setup). For example the following entry in the VM's configuration file might help:
237
238 ```xml
239 <cpu mode='custom' match='exact'>
240 <model fallback='allow'>core2duo</model>
241 <vendor>Intel</vendor>
242 <feature policy='require' name='tm2'></feature>
243 <feature policy='require' name='est'></feature>
244 <feature policy='require' name='monitor'></feature>
245 <feature policy='require' name='ds'></feature>
246 <feature policy='require' name='ss'></feature>
247 <feature policy='require' name='vme'></feature>
248 <feature policy='require' name='dtes64'></feature>
249 <feature policy='require' name='rdtscp'></feature>
250 <feature policy='require' name='ht'></feature>
251 <feature policy='require' name='dca'></feature>
252 <feature policy='require' name='pbe'></feature>
253 <feature policy='require' name='tm'></feature>
254 <feature policy='require' name='pdcm'></feature>
255 <feature policy='require' name='vmx'></feature>
256 <feature policy='require' name='ds_cpl'></feature>
257 <feature policy='require' name='xtpr'></feature>
258 <feature policy='require' name='acpi'></feature>
259 </cpu>
260 ```
261
262 # I/O errors in VM caused by GRUB OS prober on host system
263
264 When the GRUB update scripts runs the `os-prober` utility on the virtual-machine host, this can cause I/O errors in the VMs when the `os-prober` script tries to mount these VMs logical volumes. This kind of problem will typically manifest itself with symptoms like partitions in the VM being remounted read-only and the log containing messages like `end_request: I/O error, dev vda, sector 12345`.
265
266 There is a [Debian bug report](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788062) for this issue. As suggested in the bug report, a workaround is disabling `os-prober` by adding the following line to `/etc/default/grub` on the host system:
267
268 ```bash
269 # Disable os-prober. Trying to mount VM disks can cause I/O errors in the VM.
270 # See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788062.
271 GRUB_DISABLE_OS_PROBER=true
272 ```
273
274 # Enabling trim support for SSDs or sparse VM disk images
275
276 In order to make sure that a "trim" command on a block device in the guest is forwarded to the respective host device, forwarding has to be enabled specifically. This is done by setting the `discard` option of the `driver` tag in the `device` tag of the domain XML file to `unmap` (see [here](https://blog.zencoffee.org/2016/05/trim-support-kvm-virtual-machines/)). It seems like trim support is currently (May 2018) only supported for VirtIO SCSI devices, not for traditional VirtIO devices.
277
278 Of course, the "trim" command still needs to be used in the guest. This can be done either by mounting the filesystem(s) with the `discard` option or by periodically running `fstrim`. Ubuntu has chosen the later option and runs `fstrim` once a week (through a script in `/etc/cron.weekly`).
279
280 # Create a QCOW2 image that is not sparse
281
282 When creating a new QCOW2 storage volume using `virsh` or `virt-manager`, it is by default created as a sparse volume. libvirt does not seem to provide any options for choosing `falloc` or `full` allocation. Using a sparse image file can result in a performance impact and might thus not be desirable. There are two ways to avoid this:
283
284 The first option is to create the image manually using `qemu-img` and passing `preallocation=full` or `preallocation=falloc`. Using `falloc` allocation should result in approximately the same performance as using `full` allocation, but the allocation should happen a lot quicker. An example for creating such a disk image:
285
286 ```bash
287 qemu-img create -f qcow2 -o preallocation=falloc /var/lib/libvirt/images/example.qcow2 128M
288 ```
289
290 The second option is running `fallocate` on the image file _after_ the disk image has been created. In this case, the disk image can be created with the libvirt tools. However, one might also change other options (like `cluster_size`) for the image, so using `qemu-img` directly might not be so bad after all.
291
292 # Using qemu-img to make an image available as a device node
293
294 ```bah
295 modprobe nbd
296 qemu-nbd --bind=127.0.0.1 --nocache --aio=native --discard=unmap --connect=/dev/nbd0 /path/to/img
297 ```
298
299 Obviously, the options used in the command above are just an example and can be different.
300
301 # Recovering unused space from a QCOW2 image
302
303 A QCOW2 image often grows significantly over time, sometimes even larger than the size of the virtual disk represented by this image. In this case, creating a fresh image and copying the data can help. This can be done with the `qemu-img conv` command. However, the [[virt-sparsify command offers a much more convenient way|doc:Linux.libvirt.WebHome|anchor="virt-sparsify"]] of achieving this goal and can even work in-place, without having to copy the image.