Fast cloning of a libvirt/KVM Virtual Machine (Python)

May 14, 2024 · 11 min read · virtualization ci code ·

Yesterday's post discussed our first version of virt-clonefast, implemented in POSIX shell. For better error handling and option parsing, here's an implementation in our favourite programming language ...

Python.

To be honest, for my blog post on Fast cloning of a libvirt/KVM Virtual Machine I first did some basic research with shell-snippets (planning to do a first implementation in sh), but then implemented the entire script in Python. Only for the blog post I came back to the shell implementation, which turned out surprisingly small and simple.

However, I still have my Python implementation, and I do trust it somewhat more. Here's the whys and hows.

Why?

I love shell-scripts, but for more complicated stuff that includes parsing input of various sources, i usually prefer Python.

First of all, I somehow imagined my variant for virt-clone to have similar (if not the same) flags as the original version. getopts for shell scripts is great, but it is much more convenient to use Python's argparse module: when you get older, long options become increasingly cool.

More relevant, virt-clonefast requires to get a list of QCOW2-able disk images used by a VM. The shell implementation parses the output of virsh domblklist which comes with a bit of decoration (so is not easily machine-readable) and lacks a considerable amount of information (e.g. whether a given image is indeed a QCOW2-disk). As a consequence, the shell implementation is riddled by a number of assumptions.

virt-clone itself is implemented in Python and uses the python-libvirt bindings. It seemed like a good idea, to do the same.

How?

Connecting to libvirt

In order to talk to a hypervisor, we first need to connect to it:

1import libvirt
2
3conn = libvirt.open()

libvirt.open() takes an optional connection URI (e.g. qemu:///system). If None (or no argument) is given, it uses the LIBVIRT_DEFAULT_URI envvar.

There's an alternative libvirt.openReadOnly() that opens a connection with limited permissions. This is good enough if we just want to query the parameters of a given VM, but obviously we cannot create new VMs in readonly mode.

We can then easily get a handle to any VM (or domain in libvirt lingo):

1# iterate over all domains
2for d in conn.listAllDomains():
3  print(d.name())
4
5# get a specific domain by name
6name="debian12"
7dom = conn.lookupByName(name)

Because we want to allow the user to specify the domain in whatever format they want, we use a little wrapper to lookup the VM:

1def getDomain(conn:libvirt.virConnect, name: str) -> libvirt.virDomain:
2    lookups = [conn.lookupByName, conn.lookupByUUIDString]
3    for lookup in lookups:
4        try:
5            return lookup(name)
6        except:
7            pass
8    # name does not exist, but call it again for the exception
9    return lookups[0](name)

Querying VM info

I had high hopes that the Python API would provide a convenient way to access all kinds of information about a VM. It turns out that this is not the case: the best (only?) way to get inspect a VM is to retrieve its definition as an XML-string, and then work with the XML DOM. I'm not exactly a big fan of such a workflow (but hey, it's been a while since I've been working with XML and a refresher won't hurt).

To ease the pain a bit, I decided to wrap the VM information into a Domain class (domain is libvirt lingo for a VM), which is just a thin wrapper around the XML DOM:

 1class Domain:
 2    def __init__(self, dom:libvirt.virDomain):
 3        xml = dom.XMLDesc()
 4        self.xml = minidom.parseString(xml)
 5
 6    def toXML(self) -> str:
 7        return self.xml.toxml()
 8
 9    def __str__(self):
10        return self.toXML()
11
12dom = getDomain(conn, name)
13domain = Domain(dom)
14print(domain)

Basic cloning attributes

A Domain instance is initialized with XML from an actual domain, but we want to modify it so the XML can be used to create a cloned VM (leaving aside the disk duplication for now).

 1<domain type='kvm'>
 2  <name>debian12</name>
 3  <uuid>d5152f84-f02c-4b38-b3ad-b00328e2e06f</uuid>
 4  <devices>
 5    [...]
 6    <interface type='network'>
 7      <mac address='52:54:00:d6:63:30'/>
 8      <source network='default'/>
 9      <model type='virtio'/>
10      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
11    </interface>
12  </devices>
13</domain>

For a clone, we need to change the name (which ought to be unique). Each VM has a UUID associated (presumably to allow VMs have the same visible name), but we can leave this empty (that is: remove the uuid tag) and let libvirt generate one for us. Also, most VMs will have a network interface attached to them, which should have unique MAC addresses. Again we can let libvirt generate a unique value for us (by removing the mac tag), or we could set the MAC address to some user-supplied value.

 1class Domain:
 2    # ...
 3    def convertToClone(self, newname:str, mac_address:str|None=None):
 4        domain = None
 5        for name in self.xml.getElementsByTagName("name"):
 6            if name.parentNode.parentNode != self.xml:
 7                continue
 8            for n in name.childNodes:
 9                n.data = newname
10                domain = name.parentNode
11                break
12            break
13
14        # clear UUID
15        for uuid in domain.getElementsByTagName("uuid"):
16            if uuid.parentNode != domain:
17                continue
18            domain.removeChild(uuid)
19
20        # reset MAC address
21        for mac in domain.getElementsByTagName("mac"):
22            if mac.parentNode.tagName != "interface":
23                continue
24            if mac.hasAttribute("address"):
25                mac.removeAttribute("address")
26            if mac_address:
27                mac.setAttribute("address", mac_address)

With this, we can already perform some simple cloning (using the same disk images as the reference VM):

1oldname="debian12"
2newname="deb12-clone"
3
4dom = getDomain(conn, oldname)
5domain = Domain(dom)
6domain.convertToClone(newname)
7conn.defineXML(domain.toXML())

Disk cloning

Of course, we do need to create (shallow) copies of our COW disk images.

Which disks are to be cloned, can be derived from the disk tag in the XML definition:

 1<domain>
 2  <devices>
 3    <disk type='file' device='disk'>
 4      <driver name='qemu' type='qcow2'/>
 5      <source file='/var/lib/libvirt/images/debian12.qcow2'/>
 6      <target dev='vda' bus='virtio'/>
 7      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
 8    </disk>
 9    [...]
10  </devices>
11  [...]
12</domain>

The relevant info is spread across several attributes and tags (disk@type, disk@device, driver@type, source@file and target@dev), so we simplify that to a flat dictionary:

 1class Domain:
 2    # ...
 3    def getDisks(self) -> dict[str, str]]:
 4        disks = []
 5        for diskType in self.xml.getElementsByTagName("disk"):
 6            disk = {}
 7            for diskNode in diskType.childNodes:
 8                name = diskNode.nodeName
 9                if name[0:1] == "#":
10                    continue
11                disk[name] = {
12                    diskNode.attributes[attr].name: diskNode.attributes[attr].value
13                    for attr in diskNode.attributes.keys()
14                }
15            disks.append(
16                {
17                    "type": diskType.getAttribute("type"),
18                    "device": diskType.getAttribute("device"),
19                    "file": disk.get("source", {}).get("file"),
20                    "driver": disk.get("driver", {}).get("type"),
21                    "target": disk.get("target", {}).get("dev"),
22                }
23            )
24        return disks

For now, we can only do our shallow clones on file-based disk-devices that use the qcow2 driver. cdrom-devices can be ignored (they are read-only, and can be shared). Any other disks should raise an error.

The following gives us a srcdisks dictionary that maps target devices (e.g. vda) to qcow2-images, throwing an error if some unsupported device is used by the VM.

 1def getClonableDisks(domain:Domain) -> dict[str, str]:
 2    disks = {}
 3    for idx, d in enumerate(domain.getDisks()):
 4        if d["device"] == "cdrom":
 5            continue
 6        if d["device"] != "disk":
 7            raise SystemExit(
 8                "Disk#%d is is an unsupported device '%s'" % (idx, d["device"])
 9            )
10        if d["type"] != "file":
11            raise SystemExit("Disk#%d is not file-based" % (idx,))
12        # check if the disk is qcow2 based
13        driver = d["driver"]
14        if driver != "qcow2":
15            raise SystemExit(
16                "Disk#%d is of type '%s' (only 'qcow2' is supported)"
17                % (idx, driver)
18            )
19        disks[d["target"]] = d["file"]
20    return disks
21
22srcdisks = getClonableDisks(domain)

The actual shallow copying of the COW images, is implemented by the following cloneQCOW2() function. It is just a wrapper around the qemu-img cmdline tool, with some additional logic to create a unique output filename (which is returned by the function on success):

 1def cloneQCOW2(source:str, target:str|None=None) -> str:
 2    import subprocess
 3
 4    # check if source exists and can be opened (otherwise raise a standard error)
 5    with open(source) as f:
 6        pass
 7
 8    if not target:
 9        # use source filename to calculate target-file
10        target = source
11    elif os.path.isdir(target) or not os.path.basename(target):
12        # target is just a directory, assume source-file within the dir
13        target = os.path.join(target, os.path.basename(source))
14
15    # ensure that output directory exists
16    os.makedirs(os.path.dirname(target), exist_ok=True)
17
18    # get a non-existing filename
19    base, ext = os.path.splitext(target)
20    i = ""
21    while True:
22        target = "%s%s%s" % (base, i, ext)
23        try:
24            targetfd = open(target, "x")
25            break
26        except FileExistsError:
27            pass
28        if not i:
29            i = 0
30        i -= 1
31    targetfd.close()
32
33    # finally duplicate the image
34    subprocess.run(
35        [
36            "qemu-img", "create",
37            "-f",  "qcow2",  # target format
38            "-b",  source,   # backing file
39            "-F",  "qcow2",  # backing format
40            target,
41        ],
42        check=True,
43        stdout=sys.stderr,
44    )
45
46    return target

With this and the above dictionary of to-be-cloned disks, we can clone all necessary disks and store the new target device to image mapping in the cloneddisks dict:

1def cloneDisks(disks:dict, outputdir:str) -> dict[str, str]:
2    return {k: cloneQCOW2(v, outputdir) for k, v in disks.items()}
3
4
5cloneddisks = cloneDisks(srcdisks, outputdir)

Finally we need to change the target tag of the disk to point to the new source file. The changeDiskSourceFile() function sets the source@file attribute of the disk identified via the target@dev attribute:

 1class Domain:
 2    # ...
 3    def changeDiskSourceFile(self, target_device:str, source_file:str):
 4        for diskType in self.xml.getElementsByTagName("disk"):
 5            device = None
 6            for target in diskType.getElementsByTagName("target"):
 7                if target.getAttribute("dev") == target_device:
 8                    device = True
 9            if not device:
10                continue
11            for source in diskType.getElementsByTagName("source"):
12                if source.getAttribute("file"):
13                    source.attributes["file"].value = source_file
14                return
15
16# ...
17for k, v in cloneddisks.items():
18    dom.changeDiskSourceFile(k, v, srcdisks[k])

It turns out that this doesn't work at all:

Carefully comparing the XML VM definitions (obtained with virsh dumpxml) between a working VM (created with virt-manager) and the broken one, we notice that for the working VM there's an additional backingStore tag in the disk definition, that declares the backing file:

1    <disk type='file' device='disk'>
2      <driver name='qemu' type='qcow2'/>
3      <source file='/var/lib/libvirt/images/debian12-shallow.qcow2' index='1'/>
4      <backingStore type='file' index='3'>
5        <format type='qcow2'/>
6        <source file='/var/lib/libvirt/images/debian12.qcow2'/>
7        <backingStore/>
8      </backingStore>
9    </disk>

So we extend the changeDiskSourceFile() function to also provide this additional information:

 1class Domain:
 2    # ...
 3    def changeDiskSourceFile(self, target_device:src, source_file:src, backing_file:src=None):
 4        def addChild(parent, tagname):
 5            el = minidom.Element(tagname)
 6            parent.appendChild(el)
 7            el.ownerDocument = parent.ownerDocument
 8            return el
 9
10        for diskType in self.xml.getElementsByTagName("disk"):
11            device = None
12            for target in diskType.getElementsByTagName("target"):
13                if target.getAttribute("dev") == target_device:
14                    device = True
15            if not device:
16                continue
17            for source in diskType.getElementsByTagName("source"):
18                if source.getAttribute("file"):
19                    source.attributes["file"].value = source_file
20                # add backingstore if required
21                if backing_file:
22                    backingStore = source.parentNode.getElementsByTagName(
23                        "backingStore"
24                    )
25                    if backingStore:
26                        backingStore = backingStore[0]
27                    else:
28                        backingStore = addChild(source.parentNode, "backingStore")
29                    for el in backingStore.childNodes:
30                        backingStore.removeChild(el)
31                    backingStore.setAttribute("type", "file")
32                    addChild(backingStore, "format").setAttribute("type", "qcow2")
33                    addChild(backingStore, "source").setAttribute("file", backing_file)
34                return True
35        return False

Now the VM boots from the COW disk copies!

Ephemeral VMs

Instead of defining a persistent VM with conn.defineXML(), we can create and start a vM with conn.createXML(). This VM is ephemeral, as it will be destroyed once the VM powers down. Unfortunately, the (shallow) disk images will not be destroyed, as libvirt considers them externally managed.

If the disk images are created within a managed storage pool (e.g. by creating the shallow clones in the same directory as their reference images), we can simply refresh the pool to make the new images known (after creating the copies):

1def rescanStoragePools(conn:libvirt.virConnect):
2    for p in conn.listAllStoragePools():
3        p.refresh()
4
5rescanStoragePools(conn)

With this, libvirt considers the new images as managed, and will clean them up if we undefine the VM with virsh undefine --remove-all-storage.

It does not automatically remove the storage volumes for ephemeral VMs created via conn.createXML()) though. However, we can achieve the same if we delete the volumes while they are being used. Since they are in use, the data will not be removed immediately, but rather when the files go out of use (that is: once the VM is shut down):

 1def cleanupDiskImages(conn:libvirt.virConnect, diskimages:list):
 2    images = set(diskimages)
 3    for p in conn.listAllStoragePools():
 4        for v in p.listAllVolumes():
 5            if v.path() in images:
 6                v.delete()
 7                images.discard(v.path())
 8            if not images:
 9                break
10        if not images:
11            break
12
13cleanupDiskImages(conn, cloneddisks.values())

If we want to also cleanup images for temporary VMs that were stored outside of existing storage pools (e.g. we specified an output directory on a RAM-disk), we need to create an ephemeral storage pool first, which can then be destroyed (while the VM is running):

 1def makeStoragePoolXML(name:str, path:str) -> str:
 2    def addText(parent, text):
 3        txt = minidom.Text()
 4        txt.data = text
 5        parent.appendChild(txt)
 6        return txt
 7
 8    raw_xml = """<pool type='dir'><name/><target><path/></target></pool>"""
 9    xml = minidom.parseString(raw_xml)
10    for n in xml.getElementsByTagName("name"):
11        addText(n, name)
12        break
13    for p in xml.getElementsByTagName("path"):
14        addText(p, path)
15        break
16    return xml.toxml()
17
18def makeStoragePool(path:str) -> libvirt.virStoragePool | None:
19    import uuid
20
21    poolname = "tmp-%s" % uuid.uuid4()
22    try:
23        return conn.storagePoolCreateXML(
24            makeStoragePoolXML(poolname, path)
25        )
26    except:
27        # couldn't create pool, presumably because outdir is already in some other pool
28        pass
29
30def cleanupStoragePool(pool:libvirt.virStoragePool) -> None:
31    if pool and not pool.listAllVolumes():
32        pool.destroy()
33
34
35pool = makeStoragePool(outdir)
36conn.createXML(domain.toXML())
37cleanupDiskImages(conn, cloneddisks.values())
38cleanupStoragePool(pool)

putting it all together

With all the functions and classes defined above, the core of our virt-clonefast implementation looks like this:

 1def main(srcdomain:str, dstdomain:str,
 2         outdir:str|None=None, start:bool=False, connectURI:str|None=None, MACaddress:str|None=None):
 3    with libvirt.open(connectURI) as conn:
 4        domain = Domain(getDomain(conn, srcdomain))
 5        domain.convertToClone(dstdomain, MACaddress)
 6
 7        srcdisks = getClonableDisks(domain)
 8        cloneddisks = cloneDisks(srcdisks, outdir)
 9        pool = None
10        if start and outdir:
11            pool = makeStoragePool(conn, outdir)
12        rescanStoragePools(conn)
13        for k, v in cloneddisks.items():
14            domain.changeDiskSourceFile(k, v, srcdisks[k])
15
16        if start:
17            conn.createXML(domain.toXML())
18            cleanupDiskImages(conn, cloneddisks.values())
19            cleanupStoragePool(pool)
20        else:
21            conn.defineXML(domain.toXML())

The full source code for virt-clonefast can be found at https://git.iem.at/zmoelnig/gitlab-libvirt-executor