Project

General

Profile

Bug #19486

Rebalancing can propagate corrupt copy of replicated object

Added by Mark Houghton about 4 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Backfill/Recovery
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

With 4 OSDs in a replication pool, with the replication count set to 3, I stored an object and found copies on osd0, osd1 and osd3.

I manually changed the primary copy (on osd0) to simulate corruption.

osd0 - corrupt copy (primary)
osd1 - good copy
osd2 -
osd3 - good copy

After that, I did "ceph osd out 3", taking out one of the good replicas, and waited for Ceph to rebalance. After that, I had copies on osd0, osd1 and osd2 as expected.

I had hoped that Ceph would have chosen the good copy as the canonical replica. Instead, it chose the corrupted primary copy, and created the new copy from that. So I ended up with:

osd0 - corrupt copy
osd1 - good copy
osd2 - corrupt copy
osd3 - out

Now I have two corrupt copies when previously I had only one. If Ceph rebalances again before anyone notices the corruption and repairs it, I could well end up with 3 corrupt copies.

If I run a scrub and a repair, Ceph correctly identifies the corrupt copies (as shown by "data_digest_mismatch" in the output from "rados list-inconsistent-obj") and restores them from the single good copy. Rebalancing should do a similar integrity check of each copy before choosing one as a canonical copy when rebalancing.

History

#1 Updated by Sage Weil about 4 years ago

  • Status changed from New to 12

Yes. The new scrub tools (in progress) will give you more control over which copy is propagated. And bluestore's checksums will make it clear which one is bad. Until then, there isn't much to be done here!

#2 Updated by Mark Houghton about 4 years ago

Thanks. I thought it might be the case that Bluestore would fix or improve this, but I haven't found a way to test that because I'm not sure how to simulate corrupting one copy of an object in Bluestore - I can't just edit the file when there's no filesystem. Can you confirm what Ceph would do if using Bluestore in this situation?

Are there any tickets I can track for the new scrub tools you mentioned?

#3 Updated by Greg Farnum almost 4 years ago

  • Project changed from Ceph to RADOS
  • Category changed from OSD to Backfill/Recovery
  • Component(RADOS) OSD added

Hat is an interesting point about BlueStore; it will detect corruption but not manual edits...

#4 Updated by Patrick Donnelly over 1 year ago

  • Status changed from 12 to New

Also available in: Atom PDF