Project

General

Profile

Actions

Bug #36580

closed

jewel: Snapshots changing "underneath"

Added by Christian Theune over 5 years ago. Updated almost 5 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

we're in the process of finally upgrading our old Hammer cluster to Jewel. I know that Jewel is already EOL and we'll be updating to the next newer version soon, too, however, we have a blocking issue that we can't move on without fixing it.

Our backup leverages RBD snapshots. Our backup scrubbing noticed broken images because the backup wouldn't be identical to the origin in some places.

I was able to pin this down to this happening:

root@patty /srv/backy # rbd export rbd.hdd/test04.root@backy-7FxpY53tLdfLjEuahVxTBN - | md5sum
Exporting image: 100% complete...done.
c9e398b525a6d4c6f906e8af296c9e2f  -

This is the cluster we are using in development to test our upgrade and it did see a couple of weird intermediate situations (like accidentally downgrading half the cluster to Hammer). The snapshot seems to become stable once I restart the Qemu process and a friend who is using the same mechanisms on Jewel has not reported an issue like this.

I wonder how to track this down further and repair the cluster. We'll be rolling this out to our staging environment in a bit and I'll check there again, but maybe something comes to mind that you could recommend us to check ...

PS: I can't select "10.2.11" as the affected version.


Files

debug-snapshot.log (109 KB) debug-snapshot.log Christian Theune, 10/25/2018 10:24 AM
osd.0.log (496 KB) osd.0.log Christian Theune, 10/31/2018 07:46 AM
Actions

Also available in: Atom PDF