Migrating a Windows VM from VirtualBox to libvirt/KVM
For building Windows binaries on our CI infrastructure, we are using VirtualBox VMs on an oldish Linux host.
While this works, building takes forever...
Our build fleet consists of a couple of machines:
- several Linux builders (based on
Docker
) on various hosts (the fastest using anAMD Ryzen 5965WX
with 128GB of RAM, the slowest usingIntel i7 870
and 8GB of RAM) - a single Windows builder (based on
VirtualBox
, running on the slowest Linux builder. - a single macOS builder (based on
tart
) running on aMac Studio 13,1
.
Building Pd on on the Apple runner takes about 1½ minutes (fat binaries for amd64 and arm64), on the slowest Linux runner it takes 4 minutes. On the Windows runner (which uses the same hardware as the Linux runner, although on a VM), it takes... 20 minutes.
A typical pipeline for Pd has 4 Linux jobs, 4 Apple jobs, and 7 Windows jobs (on Windows we build for both i386 and amd64, and the build runs in different environments and hence jobs). It's of course rather unfortunate that there are so many Windows jobs running on a single slow runner. Anyhow, a typical pipeline takes about 2 hours to complete. Which is not fun, especially before releases, when multiple branches are updated often.
My gut feeling tells me, that there are two issues:
- the host that runs the Windows VMs is outdated and slow
- VirtualBox is not the most efficient virtualization solution.
So the task was, to get the CI running on a new host (for now that would be my desktop machine, which features a Intel i7-13700K
with 64GB or RAM) using KVM/libvirt (which I'm already using for our virtualized servers).
Requirements
We are using GitLab-CI
as the software to drive the iem-ci.
All builds need to happen in an isolated, reproducible environment (so the build does not depend on some grown and undocumented knowldedge embedded in the build machine).
The setup we were using for the last years is like this:
- A master VM contains a recent snapshot of Windows 10 (the golden image), with some build tools pre-installed.
These build tools include:
MSYS2
(MinGW64-i686
,MinGW64-x86_64
) withpacman
chocolatey
VisualStudio
gitlab-runner
- Whenever a new job is scheduled, the master VM gets cloned to a new throwaway VM
- The throwaway VM is started and runs the build job
- Once the build job completes, the throwaway VM is destroyed
Now the master VM image is rather large (~60 GB), so doing a full clone (which involves copying the entire harddisk) is not very feasible (as it takes >10 minutes). Instead, we want to use some "thin clone" that uses some copy-on-write (COW) storage, so the VM only needs to store the additional data (as a difference image with respect to the golden image).
We used to clone our throwaway VMs from a live snapshot of the master VM (in order to reduce startup time), but this gave us a bit of trouble in the past, as the system clock in the VM would need to be synchronized (the golden image is only updated every year or so). So we switched to doing full boots of the VMs about 6 months ago.
Migration from VirtualBox to libvirt/KVM
Exporting from VirtualBox
In order to migrate away from VirtualBox, the first thing required is to extract the last golden image. With VirtualBox, we use snapshots a lot (allowing us to do upgrades of the machine that can be easily rolled back).
So the first step is to just create a full disk clone from the last (used) snapshot, which I did by creating a VM clone via the VirtualBox GUI (using the "Current State" option).
This took a while, but eventually I got my 57GB W10 Clone-disk1.vmdk
.
Once done, I copied the .vmdk
file to the new host.
Importing to libvirt/KVM
In order to allow thin cloning of the master VM, the disk ought to be in some format that allows COW.
For libvirt/KVM, the natural choice is the qcow2
format.
In the Proxmov VE wiki I read that qcow2
is not a very good format for Windows VMs.
Since we are not running a Microsoft SQL Database, I hope we can get away with the performance penalty.
In any case, I think the penalty of copying full raw
images is always greater than the penalty for using qcow2
.
The disk can be converted using qemu-img
:
1in_img="W10 Clone-disk1.vmdk"
2out_img="W10.qcow2"
3qemu-img convert -f vmdk "${in_img}" -O qcow2 "${out_img}"
I also found some blog
that suggests converting the VirtualBox image first to raw
(using VBoxManage clonehd
)
and then convert the result to qcow2
(using qemu-img
).
I do not really see the point to do that.
In practice, the VirtualBox host was running low on disk space, and the uncompressed image would have been an additional 100GB,
so I took the short route.
After that, we can create a new libvirt/KVM VM.
I'm pretty sure that this can be done with virt-install
or somesuch, but today I was lazy and used the VM creation wizard of virt-manager
:
The options I chose in the wizard were:
option | value |
---|---|
Connection | QEMU/KVM |
Installation method | Import existing disk image |
Architecture options | x86_64 |
Storage path | /path/to/my/W10.qcow |
Operating System | Microsoft Windows 10 |
Memory | 8GB |
CPU | 4 |
Name | win10 |
Network selection | NAT |
I also selected the Customize installation before install option, which opens the more detailed VM configuration window.
In there, I removed a couple of devices that I do not need on a build machine.
device | comment |
---|---|
Tablet | Proxmox VE performance tweaks |
Sound (ich9) | no need for sound when building |
Console 1 | |
USB Redirector 1 | |
USB Redirector 2 |
We also want to be able to use some paravirtualisation features and monitoring, so I added the following new "hardware":
device | config | reason |
---|---|---|
Storage | type: CDROM device |
installing drivers via CDrom |
Controller | type: VirtIO Serial |
using virtio for harddisks |
Channel | name: org.qemu.guest_agent.0 |
monitoring the live VM |
We don't need to insert an ISO image into our virtual CD-ROM drive just yet.
Now let's start the VM. Since this is the first boot of the VM with "new" hardware, it might take a bit longer than usual.
Networking
The VM has a virtualized e1000e
network card, but for whatever reasons, the VM is unable to connect to the network.
The Device Manager claims that all systems are operational, but the network is still unusable.
I finally found a (German) form that
hinted at a possible solution: downgrading the virtual chipset!
There's no option to specify the chipset in virt-manager
, so a low-level edit to the VM definition is required.
Using virsh edit win10
(after the VM has been powered down)
we get a text-editor for modifying the XML-definition of the VM.
There we locate basic definition in the <os>
section and change the virtual machine:
1-<type arch="x86_64" machine="pc-q35-8.2">hvm</type>
2+<type arch="x86_64" machine="pc-q35-6.0">hvm</type>
After that, the e1000e
NIC becomes usable in the VM.
(In the forum, some people say that they were able to change the chipset back to the original value after that.
For me, this doesn't work, and the e1000e
card only works with pc-q35-6.0
).
Fine-tuning the VM
Once the VM is running, there we can fine tune it a bit so (hopefully) improve performance.
Usually the virtualization software will emulate "real" hardware for the VM, but the less emulation is required, the faster the system will be.
For the CPU we use the default: Copy host CPU configuration (host-passthrough)
.
For the graphics card and storage controller, we can use the virtio
drivers.
While the virtio
drivers come with Linux, we have to manually install them for Windows,
following the Proxmox VE docs.
I picked the ISO for the latest stable release,
which at the time of writing is 0.1.240
.
Inserting the ISO into the VM's CD-ROM drive, I ran the virtio-win-guest-tools.exe
to install all the goodies.
We can now switch some hardware to VM-optimized variants
Video
That's the easiest one, we can simply switch the Video Virtio model (from QXL
) to Virtio
.
I've disabled 3D acceleration.
Frankly, I do not worry to much about the virtual video card: the VM is going to run headless most of the time, and is used as a typical CPU-hungry buildbot... no real use for graphics.
Network
That should be a no-brainer, switching the NIC Device model from e1000e
to virtio
.
For reasons unknown to me, it doesn't work at all.
So I'm currently stuck with e1000e
.
Storage
Removing the default SATA drive and replacing it with a VirtIO Disk doesn't work out of the box,
as Windows will then fail to find the boot disk (that contains the virtio
drivers).
Some StackExchange post shed some light on how to fix this by booting the VM into Safe mode.
- While the system still uses the
SATA
drive, configure it to boot into safe mode on the next reboot.- Within an elevated command prompt type:
1bcdedit /set "{current}" safeboot minimal
- Power down the VM
- Change the bus of the disk to
VirtIO
- Power up the VM (it should start into Safe mode)
- Disable the safe-mode, by running the following within an elevated command prompt:
1bcdedit /deletevalue "{current}" safeboot
- Reboot
QEMU guest agent
The virtio
goodies also included the guest agent, which allows the hypervisor to query some extra information from a running instance.
Since we already have setup the relevant channel, nothing more needs to be done here.
Just in case this was omitted above, here's what needs to be added
key | value |
---|---|
Hardware | Channel |
Name | org.qemu.guest_agent.0 |
Device Type | UNIX socket (unix) |
Auto socket | yes |
To check whether everything works, we can query the guestinfo
from the hypervisor:
1$ virsh guestinfo win11 --os
2os.id : mswindows
3os.name : Microsoft Windows
4os.pretty-name : Windows 10 Enterprise 2016 LTSB
5os.version : Microsoft Windows 10
6os.version-id : 10
7os.machine : x86_64
8os.variant : client
9os.variant-id : client
10os.kernel-release : 14393
11os.kernel-version : 10.0
12
13$
Conclusion
That's about it: A Win10 VM that used to run on VirtualBox is now successfully running on libvirt/KVM.
This is not yet a CI-runner, but we are getting closer. In the next days I'm going to add two more blog-posts:
- Doing fast clones with libvirt/KVM
- Using libvirt/KVM for running fully isolated jobs with
gitlab-runner