Bug #63966: deleting snapshot from disk with unfinished migration causes corruption / qemu crash - Ceph - Ceph

Actions

Copy link

Bug #63966

open

deleting snapshot from disk with unfinished migration causes corruption / qemu crash

Added by Nikola Ciprich 4 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

librbd

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

v15.2.14

ceph-qa-suite:

rbd

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

we've discovered scenario in which data can get corrupted (and VM using RBD volume crashes).

steps to reproduce:

virsh create VM.xml
rbd snap create sas/WINdisk@snap1
virsh shutdown VM
rbd migration prepare sas/WINdisk sata/WINdisk_mig
virsh create VM_edited.xml
rbd snap create sata/WINdisk_mig@snap2
rbd snap rm sata/WINdisk_mig@snap1

few seconds later, qemu process crashes:
/usr/src/redhat/BUILD/ceph-15.2.14/src/librbd/deep_copy/ObjectCopyRequest.cc: In function 'void librbd::deep_copy::ObjectCopyRequest<ImageCtxT>::compute_zero_ops() [with ImageCtxT =
+librbd::ImageCtx]' thread 7fe2121fd700 time 2023-12-14T0
7:08:02.506675+0100
/usr/src/redhat/BUILD/ceph-15.2.14/src/librbd/deep_copy/ObjectCopyRequest.cc: 943: FAILED ceph_assert(dst_may_exist_it != m_dst_object_may_exist.end())
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x143) [0x7fe222da9213]
2: (()+0x26040e) [0x7fe222da940e]
3: (()+0x1ee529) [0x7fe230d42529]
4: (()+0x1f30e6) [0x7fe230d470e6]
5: (()+0x1f421a) [0x7fe230d4821a]
6: (()+0x1f5575) [0x7fe230d49575]
7: (()+0x1f4934) [0x7fe230d48934]
8: (()+0xc7f09) [0x7fe230c1bf09]
9: (()+0xa8223) [0x7fe2367eb223]
10: (()+0x66ba9) [0x7fe2367a9ba9]
11: (Finisher::finisher_thread_entry()+0x18d) [0x7fe222e5bc6d]
12: (()+0x7ea5) [0x7fe231788ea5]
13: (clone()+0x6d) [0x7fe22f7d9b0d]
2023-12-14 06:08:02.849+0000: shutting down, reason=crashed

didn't try reproducing with newer versions yet. I'll try and report further. If I could
provide more information, please let me know.
BR
nikola ciprich

No data to display

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #63966

deleting snapshot from disk with unfinished migration causes corruption / qemu crash