Bug #19486
openRebalancing can propagate corrupt copy of replicated object
0%
Description
With 4 OSDs in a replication pool, with the replication count set to 3, I stored an object and found copies on osd0, osd1 and osd3.
I manually changed the primary copy (on osd0) to simulate corruption.
osd0 - corrupt copy (primary)
osd1 - good copy
osd2 -
osd3 - good copy
After that, I did "ceph osd out 3", taking out one of the good replicas, and waited for Ceph to rebalance. After that, I had copies on osd0, osd1 and osd2 as expected.
I had hoped that Ceph would have chosen the good copy as the canonical replica. Instead, it chose the corrupted primary copy, and created the new copy from that. So I ended up with:
osd0 - corrupt copy
osd1 - good copy
osd2 - corrupt copy
osd3 - out
Now I have two corrupt copies when previously I had only one. If Ceph rebalances again before anyone notices the corruption and repairs it, I could well end up with 3 corrupt copies.
If I run a scrub and a repair, Ceph correctly identifies the corrupt copies (as shown by "data_digest_mismatch" in the output from "rados list-inconsistent-obj") and restores them from the single good copy. Rebalancing should do a similar integrity check of each copy before choosing one as a canonical copy when rebalancing.