Live migration from a QCOW2 source
When using certain QCOW2 images as the source format for live migration, I encounter errors. With some QCOW2 images, the live migration is successful and with some it fails. Perhaps this has something to do with QCOW2 features that are not supported by live migration, but I haven't been able to identify a pattern.
- qemu-img info cirros-0.4.0-x86_64-disk.img
file format: qcow2
virtual size: 44 MiB (46137344 bytes)
disk size: 12.1 MiB
Format specific information:
lazy refcounts: false
refcount bits: 16
- cat source-qcow2.json
- rbd migration prepare --import-only --source-spec "`cat source-qcow2.json`" rbd/testing
- rbd migration execute rbd/testing
2022-11-29T08:14:31.048+0000 7f76c67fc700 -1 librbd::deep_copy::ObjectCopyRequest: 0x7f76a40624e0 handle_read: failed to read from source object: (22) Invalid argument
2022-11-29T08:14:31.048+0000 7f76c67fc700 -1 librbd::io::CopyupRequest: 0x7f76a4062720 handle_deep_copy: rbd_data.218d26fd56497.000000000000000a error encountered during deep-copy: (22) Invalid argument
2022-11-29T08:14:31.048+0000 7f76c67fc700 -1 librbd::io::ObjectRequest: 0x7f76a8013350 handle_copyup: rbd_data.218d26fd56497.000000000000000a failed to copyup object: (22) Invalid argument
2022-11-29T08:14:31.048+0000 7f76c67fc700 -1 librbd::MigrateRequest: 0x7f76b8026420 handle_migrate_objects: failed to migrate objects: (22) Invalid argument
2022-11-29T08:14:31.048+0000 7f76c67fc700 -1 librbd::MigrateRequest: 0x7f76b8026420 should_complete: encountered error: (22) Invalid argument
There are several issues:
librbd does not detect that a qcow2 is actually zlib compressed. There's no evidence in the QCOW header that zlib compression is actually used. Only when parsing an L2 table (which is done lazily, not when opening the qcow2, but only when actually reading from it) you can see a bit indicating compressed data.
This means that migration prepare succeeds, and only in migration execute, when you actually read from the qcow2, you can detect that the qcow2 is compressed. But even then, this bit is not checked, and as a result the parsing fails (when it tries to seek to some ~1<<62 offset).
2. As said, the parsing fails in migration execute, returning EINVAL. However, for some reason Operations<I>::migrate explicitly ignores this kind of error, and from user side perspective the operation seems to have succeed.