Fast cloning of a libvirt/KVM Virtual Machine (Python)
Yesterday's post discussed our first version of virt-clonefast
, implemented in POSIX shell.
For better error handling and option parsing, here's an implementation in our favourite programming language ...
Python.
To be honest, for my blog post on Fast cloning of a libvirt/KVM Virtual Machine
I first did some basic research with shell-snippets (planning to do a first implementation in sh
),
but then implemented the entire script in Python.
Only for the blog post I came back to the shell implementation, which turned out surprisingly small and simple.
However, I still have my Python implementation, and I do trust it somewhat more. Here's the whys and hows.
Why?
I love shell-scripts, but for more complicated stuff that includes parsing input of various sources, i usually prefer Python.
First of all, I somehow imagined my variant for virt-clone
to have similar (if not the same) flags as the original version.
getopts
for shell scripts is great, but it is much more convenient to use Python's argparse
module:
when you get older, long options become increasingly cool.
More relevant, virt-clonefast
requires to get a list of QCOW2-able disk images used by a VM.
The shell implementation parses the output of virsh domblklist
which comes with a bit of decoration
(so is not easily machine-readable) and lacks a considerable amount of information (e.g. whether
a given image is indeed a QCOW2-disk).
As a consequence, the shell implementation is riddled by a number of assumptions.
virt-clone
itself is implemented in Python and uses the python-libvirt
bindings.
It seemed like a good idea, to do the same.
How?
Connecting to libvirt
In order to talk to a hypervisor, we first need to connect to it:
1import libvirt
2
3conn = libvirt.open()
libvirt.open()
takes an optional connection URI (e.g. qemu:///system
).
If None
(or no argument) is given, it uses the LIBVIRT_DEFAULT_URI
envvar.
There's an alternative libvirt.openReadOnly()
that opens a connection with limited permissions.
This is good enough if we just want to query the parameters of a given VM,
but obviously we cannot create new VMs in readonly mode.
We can then easily get a handle to any VM (or domain in libvirt lingo):
1# iterate over all domains
2for d in conn.listAllDomains():
3 print(d.name())
4
5# get a specific domain by name
6name="debian12"
7dom = conn.lookupByName(name)
Because we want to allow the user to specify the domain in whatever format they want, we use a little wrapper to lookup the VM:
1def getDomain(conn:libvirt.virConnect, name: str) -> libvirt.virDomain:
2 lookups = [conn.lookupByName, conn.lookupByUUIDString]
3 for lookup in lookups:
4 try:
5 return lookup(name)
6 except:
7 pass
8 # name does not exist, but call it again for the exception
9 return lookups[0](name)
Querying VM info
I had high hopes that the Python API would provide a convenient way to access all kinds of information about a VM. It turns out that this is not the case: the best (only?) way to get inspect a VM is to retrieve its definition as an XML-string, and then work with the XML DOM. I'm not exactly a big fan of such a workflow (but hey, it's been a while since I've been working with XML and a refresher won't hurt).
To ease the pain a bit, I decided to wrap the VM information into a Domain
class (domain
is libvirt lingo for a VM),
which is just a thin wrapper around the XML DOM:
1class Domain:
2 def __init__(self, dom:libvirt.virDomain):
3 xml = dom.XMLDesc()
4 self.xml = minidom.parseString(xml)
5
6 def toXML(self) -> str:
7 return self.xml.toxml()
8
9 def __str__(self):
10 return self.toXML()
11
12dom = getDomain(conn, name)
13domain = Domain(dom)
14print(domain)
Basic cloning attributes
A Domain
instance is initialized with XML from an actual domain,
but we want to modify it so the XML can be used to create a cloned VM
(leaving aside the disk duplication for now).
1<domain type='kvm'>
2 <name>debian12</name>
3 <uuid>d5152f84-f02c-4b38-b3ad-b00328e2e06f</uuid>
4 <devices>
5 [...]
6 <interface type='network'>
7 <mac address='52:54:00:d6:63:30'/>
8 <source network='default'/>
9 <model type='virtio'/>
10 <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
11 </interface>
12 </devices>
13</domain>
For a clone, we need to change the name (which ought to be unique).
Each VM has a UUID associated (presumably to allow VMs have the same visible name),
but we can leave this empty (that is: remove the uuid
tag)
and let libvirt generate one for us.
Also, most VMs will have a network interface attached to them, which should have unique MAC addresses.
Again we can let libvirt generate a unique value for us (by removing the mac
tag),
or we could set the MAC address to some user-supplied value.
1class Domain:
2 # ...
3 def convertToClone(self, newname:str, mac_address:str|None=None):
4 domain = None
5 for name in self.xml.getElementsByTagName("name"):
6 if name.parentNode.parentNode != self.xml:
7 continue
8 for n in name.childNodes:
9 n.data = newname
10 domain = name.parentNode
11 break
12 break
13
14 # clear UUID
15 for uuid in domain.getElementsByTagName("uuid"):
16 if uuid.parentNode != domain:
17 continue
18 domain.removeChild(uuid)
19
20 # reset MAC address
21 for mac in domain.getElementsByTagName("mac"):
22 if mac.parentNode.tagName != "interface":
23 continue
24 if mac.hasAttribute("address"):
25 mac.removeAttribute("address")
26 if mac_address:
27 mac.setAttribute("address", mac_address)
With this, we can already perform some simple cloning (using the same disk images as the reference VM):
1oldname="debian12"
2newname="deb12-clone"
3
4dom = getDomain(conn, oldname)
5domain = Domain(dom)
6domain.convertToClone(newname)
7conn.defineXML(domain.toXML())
Disk cloning
Of course, we do need to create (shallow) copies of our COW disk images.
Which disks are to be cloned, can be derived from the disk
tag in the XML definition:
1<domain>
2 <devices>
3 <disk type='file' device='disk'>
4 <driver name='qemu' type='qcow2'/>
5 <source file='/var/lib/libvirt/images/debian12.qcow2'/>
6 <target dev='vda' bus='virtio'/>
7 <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
8 </disk>
9 [...]
10 </devices>
11 [...]
12</domain>
The relevant info is spread across several attributes and tags
(disk@type
, disk@device
, driver@type
, source@file
and target@dev
),
so we simplify that to a flat dictionary:
1class Domain:
2 # ...
3 def getDisks(self) -> dict[str, str]]:
4 disks = []
5 for diskType in self.xml.getElementsByTagName("disk"):
6 disk = {}
7 for diskNode in diskType.childNodes:
8 name = diskNode.nodeName
9 if name[0:1] == "#":
10 continue
11 disk[name] = {
12 diskNode.attributes[attr].name: diskNode.attributes[attr].value
13 for attr in diskNode.attributes.keys()
14 }
15 disks.append(
16 {
17 "type": diskType.getAttribute("type"),
18 "device": diskType.getAttribute("device"),
19 "file": disk.get("source", {}).get("file"),
20 "driver": disk.get("driver", {}).get("type"),
21 "target": disk.get("target", {}).get("dev"),
22 }
23 )
24 return disks
For now, we can only do our shallow clones on file
-based disk
-devices that use the qcow2
driver.
cdrom
-devices can be ignored (they are read-only, and can be shared).
Any other disks should raise an error.
The following gives us a srcdisks
dictionary that maps target devices (e.g. vda
) to qcow2-images,
throwing an error if some unsupported device is used by the VM.
1def getClonableDisks(domain:Domain) -> dict[str, str]:
2 disks = {}
3 for idx, d in enumerate(domain.getDisks()):
4 if d["device"] == "cdrom":
5 continue
6 if d["device"] != "disk":
7 raise SystemExit(
8 "Disk#%d is is an unsupported device '%s'" % (idx, d["device"])
9 )
10 if d["type"] != "file":
11 raise SystemExit("Disk#%d is not file-based" % (idx,))
12 # check if the disk is qcow2 based
13 driver = d["driver"]
14 if driver != "qcow2":
15 raise SystemExit(
16 "Disk#%d is of type '%s' (only 'qcow2' is supported)"
17 % (idx, driver)
18 )
19 disks[d["target"]] = d["file"]
20 return disks
21
22srcdisks = getClonableDisks(domain)
The actual shallow copying of the COW images, is implemented by the following cloneQCOW2()
function.
It is just a wrapper around the qemu-img
cmdline tool,
with some additional logic to create a unique output filename (which is returned by the function on success):
1def cloneQCOW2(source:str, target:str|None=None) -> str:
2 import subprocess
3
4 # check if source exists and can be opened (otherwise raise a standard error)
5 with open(source) as f:
6 pass
7
8 if not target:
9 # use source filename to calculate target-file
10 target = source
11 elif os.path.isdir(target) or not os.path.basename(target):
12 # target is just a directory, assume source-file within the dir
13 target = os.path.join(target, os.path.basename(source))
14
15 # ensure that output directory exists
16 os.makedirs(os.path.dirname(target), exist_ok=True)
17
18 # get a non-existing filename
19 base, ext = os.path.splitext(target)
20 i = ""
21 while True:
22 target = "%s%s%s" % (base, i, ext)
23 try:
24 targetfd = open(target, "x")
25 break
26 except FileExistsError:
27 pass
28 if not i:
29 i = 0
30 i -= 1
31 targetfd.close()
32
33 # finally duplicate the image
34 subprocess.run(
35 [
36 "qemu-img", "create",
37 "-f", "qcow2", # target format
38 "-b", source, # backing file
39 "-F", "qcow2", # backing format
40 target,
41 ],
42 check=True,
43 stdout=sys.stderr,
44 )
45
46 return target
With this and the above dictionary of to-be-cloned disks, we can clone all necessary disks
and store the new target device to image mapping in the cloneddisks
dict:
1def cloneDisks(disks:dict, outputdir:str) -> dict[str, str]:
2 return {k: cloneQCOW2(v, outputdir) for k, v in disks.items()}
3
4
5cloneddisks = cloneDisks(srcdisks, outputdir)
Finally we need to change the target
tag of the disk
to point to the new source
file.
The changeDiskSourceFile()
function sets the source@file
attribute of the disk identified
via the target@dev
attribute:
1class Domain:
2 # ...
3 def changeDiskSourceFile(self, target_device:str, source_file:str):
4 for diskType in self.xml.getElementsByTagName("disk"):
5 device = None
6 for target in diskType.getElementsByTagName("target"):
7 if target.getAttribute("dev") == target_device:
8 device = True
9 if not device:
10 continue
11 for source in diskType.getElementsByTagName("source"):
12 if source.getAttribute("file"):
13 source.attributes["file"].value = source_file
14 return
15
16# ...
17for k, v in cloneddisks.items():
18 dom.changeDiskSourceFile(k, v, srcdisks[k])
It turns out that this doesn't work at all:
Carefully comparing the XML VM definitions (obtained with virsh dumpxml
)
between a working VM (created with virt-manager
) and the broken one,
we notice that for the working VM there's an additional backingStore
tag in the disk definition,
that declares the backing file:
1 <disk type='file' device='disk'>
2 <driver name='qemu' type='qcow2'/>
3 <source file='/var/lib/libvirt/images/debian12-shallow.qcow2' index='1'/>
4 <backingStore type='file' index='3'>
5 <format type='qcow2'/>
6 <source file='/var/lib/libvirt/images/debian12.qcow2'/>
7 <backingStore/>
8 </backingStore>
9 </disk>
So we extend the changeDiskSourceFile()
function to also provide this additional information:
1class Domain:
2 # ...
3 def changeDiskSourceFile(self, target_device:src, source_file:src, backing_file:src=None):
4 def addChild(parent, tagname):
5 el = minidom.Element(tagname)
6 parent.appendChild(el)
7 el.ownerDocument = parent.ownerDocument
8 return el
9
10 for diskType in self.xml.getElementsByTagName("disk"):
11 device = None
12 for target in diskType.getElementsByTagName("target"):
13 if target.getAttribute("dev") == target_device:
14 device = True
15 if not device:
16 continue
17 for source in diskType.getElementsByTagName("source"):
18 if source.getAttribute("file"):
19 source.attributes["file"].value = source_file
20 # add backingstore if required
21 if backing_file:
22 backingStore = source.parentNode.getElementsByTagName(
23 "backingStore"
24 )
25 if backingStore:
26 backingStore = backingStore[0]
27 else:
28 backingStore = addChild(source.parentNode, "backingStore")
29 for el in backingStore.childNodes:
30 backingStore.removeChild(el)
31 backingStore.setAttribute("type", "file")
32 addChild(backingStore, "format").setAttribute("type", "qcow2")
33 addChild(backingStore, "source").setAttribute("file", backing_file)
34 return True
35 return False
Now the VM boots from the COW disk copies!
Ephemeral VMs
Instead of defining a persistent VM with conn.defineXML()
,
we can create and start a vM with conn.createXML()
.
This VM is ephemeral, as it will be destroyed once the VM powers down.
Unfortunately, the (shallow) disk images will not be destroyed, as libvirt considers them externally managed.
If the disk images are created within a managed storage pool (e.g. by creating the shallow clones in the same directory as their reference images), we can simply refresh the pool to make the new images known (after creating the copies):
1def rescanStoragePools(conn:libvirt.virConnect):
2 for p in conn.listAllStoragePools():
3 p.refresh()
4
5rescanStoragePools(conn)
With this, libvirt considers the new images as managed,
and will clean them up if we undefine the VM with virsh undefine --remove-all-storage
.
It does not automatically remove the storage volumes for ephemeral VMs created via conn.createXML())
though.
However, we can achieve the same if we delete the volumes while they are being used.
Since they are in use, the data will not be removed immediately,
but rather when the files go out of use (that is: once the VM is shut down):
1def cleanupDiskImages(conn:libvirt.virConnect, diskimages:list):
2 images = set(diskimages)
3 for p in conn.listAllStoragePools():
4 for v in p.listAllVolumes():
5 if v.path() in images:
6 v.delete()
7 images.discard(v.path())
8 if not images:
9 break
10 if not images:
11 break
12
13cleanupDiskImages(conn, cloneddisks.values())
If we want to also cleanup images for temporary VMs that were stored outside of existing storage pools (e.g. we specified an output directory on a RAM-disk), we need to create an ephemeral storage pool first, which can then be destroyed (while the VM is running):
1def makeStoragePoolXML(name:str, path:str) -> str:
2 def addText(parent, text):
3 txt = minidom.Text()
4 txt.data = text
5 parent.appendChild(txt)
6 return txt
7
8 raw_xml = """<pool type='dir'><name/><target><path/></target></pool>"""
9 xml = minidom.parseString(raw_xml)
10 for n in xml.getElementsByTagName("name"):
11 addText(n, name)
12 break
13 for p in xml.getElementsByTagName("path"):
14 addText(p, path)
15 break
16 return xml.toxml()
17
18def makeStoragePool(path:str) -> libvirt.virStoragePool | None:
19 import uuid
20
21 poolname = "tmp-%s" % uuid.uuid4()
22 try:
23 return conn.storagePoolCreateXML(
24 makeStoragePoolXML(poolname, path)
25 )
26 except:
27 # couldn't create pool, presumably because outdir is already in some other pool
28 pass
29
30def cleanupStoragePool(pool:libvirt.virStoragePool) -> None:
31 if pool and not pool.listAllVolumes():
32 pool.destroy()
33
34
35pool = makeStoragePool(outdir)
36conn.createXML(domain.toXML())
37cleanupDiskImages(conn, cloneddisks.values())
38cleanupStoragePool(pool)
putting it all together
With all the functions and classes defined above, the core of our virt-clonefast
implementation looks like this:
1def main(srcdomain:str, dstdomain:str,
2 outdir:str|None=None, start:bool=False, connectURI:str|None=None, MACaddress:str|None=None):
3 with libvirt.open(connectURI) as conn:
4 domain = Domain(getDomain(conn, srcdomain))
5 domain.convertToClone(dstdomain, MACaddress)
6
7 srcdisks = getClonableDisks(domain)
8 cloneddisks = cloneDisks(srcdisks, outdir)
9 pool = None
10 if start and outdir:
11 pool = makeStoragePool(conn, outdir)
12 rescanStoragePools(conn)
13 for k, v in cloneddisks.items():
14 domain.changeDiskSourceFile(k, v, srcdisks[k])
15
16 if start:
17 conn.createXML(domain.toXML())
18 cleanupDiskImages(conn, cloneddisks.values())
19 cleanupStoragePool(pool)
20 else:
21 conn.defineXML(domain.toXML())
The full source code for virt-clonefast
can be found at https://git.iem.at/zmoelnig/gitlab-libvirt-executor