Actions
Bug #35542
closedBackfill and recovery should validate all checksums
Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
Scrub/Repair
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From the thread "Copying without crc check when peering may lack reliability" on ceph-devel, it appears that backfill does not validate a read object against the checksums that may be stored in hobject. We should not silently transfer data which will be detected as bad on read — we should detect it as bad and behave rationally in response! (Crash out on primary, trigger recovery, something.)
Next we put an object into the pool. -> # cat txt 123 -> # rados -p test put test_copy txt -> # rados -p test get test_copy - 123 Then we make OSD.0 down, and change its data of object test_copy. -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 test_copy get-bytes 123 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 test_copy set-bytes 120txt Next we start OSD.0 and do data migration. -> # ceph osd pool set test crush_rule root1_rule Finally we try to get the object by rados and ceph-objectstore-tool -> # rados -p test get test_copy - error getting test/test_copy: (5) Input/output error -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3 test_copy get-bytes 120 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4 test_copy get-bytes 120 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 test_copy get-bytes 120 The data of test_copy on OSD.3 OSD.4 OSD.5 is from OSD.0 which has the silent data corruption.
Actions