Actions
Bug #20863
closedCRC error does not mark PG as inconsistent or queue for repair
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Administration/Usability
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
5 - suggestion
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
While testing bitrot detection it was found that even when OSD process has detected CRC mismatch and returned an error to client, the cluster state remains HEALTH_OK.
Steps to reproduce:
- ceph -v
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) # - cat testobject
somedata # - rados --cluster mn --pool mn_test01 put testobject ./testobject #
- ceph osd map mn_test01 testobject
osdmap e22984 pool 'mn_test01' (16) object 'testobject' -> pg 16.98824931 (16.31) -> up ([20,44], p20) acting ([20,44], p20) # - systemctl stop ceph-osd@20 #
- echo CORRUPTED > /var/lib/ceph/osd/mn-20/current/16.31_head/testobject__head_98824931__10 #
- getfattr -d -e hex var/lib/ceph/osd/mn-20/current/16.31_head/testobject__head_98824931__10
- file: var/lib/ceph/osd/mn-20/current/16.31_head/testobject__head_98824931__10
user.ceph._=0x0f08ef00000004032b000000000000000a000000746573746f626a656374feffffffffffffff314982980000000000100000000000000006031c0000001000000000000000ffffffff0000000000000000ffffffffffffffff000000000100000000000000c859000000000000000000000000000002021500000008663a0600000000000100000000000000000000000900000000000000e4437f59c293ca04020215000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000034000000e4437f591b89e204e08fe5f3ffffffff
user.ceph.snapset=0x02021900000000000000000000000100000000000000000000000000000000
user.cephos.spill_out=0x3000 # - systemctl start ceph-osd@20 #
- rados --cluster mn --pool mn_test01 get testobject testobject
error getting mn_test01/testobject: (5) Input/output error # - grep ERR /var/log/ceph/mn-osd.20.log
2017-07-31 18:23:48.437679 7f496d418700 -1 log_channel(cluster) log [ERR] : 16.31 full-object read crc 0x2259dfb0 != expected 0xf3e58fe0 on 16:8c924119:::testobject:head # - ceph -s
...
health HEALTH_OK
... #
Running then deep-scrub on PG triggers HEALTH_ERR. PG repair successfully repairs damaged file.
Other strange issues:- removing xattrs on /var/lib/ceph/osd/mn-20/current/16.31_head/testobject__head_98824931__10 (by changing it using vi) returns to client "error getting mn_test01/testobject: (2) No such file or directory" without any errors in OSD log files
- adding garbage to the end of /var/lib/ceph/osd/mn-20/current/16.31_head/testobject__head_98824931__10 does not triggers CRC checksum error (xattr has object size which is not checked against real file size?)
Updated by Greg Farnum over 6 years ago
- Project changed from Ceph to RADOS
- Subject changed from CRC error while reading an object does not mark PG as inconsistent to CRC error does not mark PG as inconsistent or queue for repair
- Category changed from OSD to Administration/Usability
- Component(RADOS) OSD added
Updated by David Zafman over 6 years ago
- Is duplicate of Feature #19657: An EIO from a single device should not be a client-visible failure. added
Updated by David Zafman over 6 years ago
This will be available in Luminous, see http://tracker.ceph.com/issues/19657
Actions