Project

General

Profile

Actions

Bug #35542

closed

Backfill and recovery should validate all checksums

Added by Greg Farnum over 5 years ago. Updated over 4 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From the thread "Copying without crc check when peering may lack reliability" on ceph-devel, it appears that backfill does not validate a read object against the checksums that may be stored in hobject. We should not silently transfer data which will be detected as bad on read — we should detect it as bad and behave rationally in response! (Crash out on primary, trigger recovery, something.)

Next we put an object into the pool.

-> # cat txt
123
-> # rados -p test put test_copy txt
-> # rados -p test get test_copy -
123

Then we make OSD.0 down, and change its data of object test_copy.

-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0
test_copy get-bytes
123
-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0
test_copy set-bytes 120txt

Next we start OSD.0 and do data migration.

-> # ceph osd pool set test crush_rule root1_rule

Finally we try to get the object by rados and ceph-objectstore-tool

-> # rados -p test get test_copy -
error getting test/test_copy: (5) Input/output error
-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3
test_copy get-bytes
120
-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4
test_copy get-bytes
120
-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5
test_copy get-bytes
120

The data of test_copy on OSD.3 OSD.4 OSD.5 is from OSD.0 which has the
silent data corruption.
Actions

Also available in: Atom PDF