Bug #19486
Rebalancing can propagate corrupt copy of replicated object
0%
Description
With 4 OSDs in a replication pool, with the replication count set to 3, I stored an object and found copies on osd0, osd1 and osd3.
I manually changed the primary copy (on osd0) to simulate corruption.
osd0 - corrupt copy (primary)
osd1 - good copy
osd2 -
osd3 - good copy
After that, I did "ceph osd out 3", taking out one of the good replicas, and waited for Ceph to rebalance. After that, I had copies on osd0, osd1 and osd2 as expected.
I had hoped that Ceph would have chosen the good copy as the canonical replica. Instead, it chose the corrupted primary copy, and created the new copy from that. So I ended up with:
osd0 - corrupt copy
osd1 - good copy
osd2 - corrupt copy
osd3 - out
Now I have two corrupt copies when previously I had only one. If Ceph rebalances again before anyone notices the corruption and repairs it, I could well end up with 3 corrupt copies.
If I run a scrub and a repair, Ceph correctly identifies the corrupt copies (as shown by "data_digest_mismatch" in the output from "rados list-inconsistent-obj") and restores them from the single good copy. Rebalancing should do a similar integrity check of each copy before choosing one as a canonical copy when rebalancing.
History
#1 Updated by Sage Weil almost 7 years ago
- Status changed from New to 12
Yes. The new scrub tools (in progress) will give you more control over which copy is propagated. And bluestore's checksums will make it clear which one is bad. Until then, there isn't much to be done here!
#2 Updated by Mark Houghton almost 7 years ago
Thanks. I thought it might be the case that Bluestore would fix or improve this, but I haven't found a way to test that because I'm not sure how to simulate corrupting one copy of an object in Bluestore - I can't just edit the file when there's no filesystem. Can you confirm what Ceph would do if using Bluestore in this situation?
Are there any tickets I can track for the new scrub tools you mentioned?
#3 Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category changed from OSD to Backfill/Recovery
- Component(RADOS) OSD added
Hat is an interesting point about BlueStore; it will detect corruption but not manual edits...
#4 Updated by Patrick Donnelly over 4 years ago
- Status changed from 12 to New