Bug #35542
closedBackfill and recovery should validate all checksums
0%
Description
From the thread "Copying without crc check when peering may lack reliability" on ceph-devel, it appears that backfill does not validate a read object against the checksums that may be stored in hobject. We should not silently transfer data which will be detected as bad on read — we should detect it as bad and behave rationally in response! (Crash out on primary, trigger recovery, something.)
Next we put an object into the pool. -> # cat txt 123 -> # rados -p test put test_copy txt -> # rados -p test get test_copy - 123 Then we make OSD.0 down, and change its data of object test_copy. -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 test_copy get-bytes 123 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 test_copy set-bytes 120txt Next we start OSD.0 and do data migration. -> # ceph osd pool set test crush_rule root1_rule Finally we try to get the object by rados and ceph-objectstore-tool -> # rados -p test get test_copy - error getting test/test_copy: (5) Input/output error -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3 test_copy get-bytes 120 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4 test_copy get-bytes 120 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 test_copy get-bytes 120 The data of test_copy on OSD.3 OSD.4 OSD.5 is from OSD.0 which has the silent data corruption.
Updated by Greg Farnum over 5 years ago
Oh, this may just be 12.2.5 being broken? In which case we can close.
Updated by Greg Farnum over 5 years ago
Nope, 12.2.6 was the one that didn't handle checksums properly. So this looks like a real issue, although I think we also tried to move away from the double-checksumming...?
Updated by Sage Weil over 5 years ago
I'm unclear what checksum is not being checked. There is only sometimes a full object checksum that we can validate against--unclear from this whether one was set. The bluestore checksum is always checked, but it won't be "corrupt" here because ceph-objectstore-tool writes via bluestore and the crc at that layer will be updated to remain consistent.
Updated by Greg Farnum over 5 years ago
Sage Weil wrote:
I'm unclear what checksum is not being checked. There is only sometimes a full object checksum that we can validate against--unclear from this whether one was set. The bluestore checksum is always checked, but it won't be "corrupt" here because ceph-objectstore-tool writes via bluestore and the crc at that layer will be updated to remain consistent.
Sure, but if on a read we're getting EIO then there must in fact be the full-object checksum. And since it was just recovered on backfill, that means the read is being more careful than the backfill op was. It's easy for me as a developer to see how that happened, but from a user perspective it's quite confusing and it doesn't fit our general "as-safe-as-possible" strategy.
Updated by David Zafman over 4 years ago
- Status changed from New to Won't Fix
Bluestore makes this unnecessary and it is only possible on a pull of the complete object.