Project

General

Profile

Actions

Bug #24875

closed

OSD: still returning EIO instead of recovering objects on checksum errors

Added by Greg Farnum almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A report came in on the mailing list of an MDS journal which couldn't be read and was throwing errors:

2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.00000000:head

And indeed, when you search for that log message it pops up in PrimaryLogPG::do_read() and do_sparse_read() (and also struct FillInVerifyExtent). When it pops up, the function returns -EIO, and do_osd_ops() (which is the only caller) turns that into a direct client return.
There's a comment "try repair later" which makes me think the author expected the EIO to get turned into a read-repair, but tracing back through git history there's no indication of any work done to enable that in this path.


Related issues 3 (0 open3 closed)

Related to RADOS - Bug #25084: Attempt to read object that can't be repaired loops foreverResolvedDavid Zafman07/24/2018

Actions
Copied to RADOS - Backport #25226: mimic: OSD: still returning EIO instead of recovering objects on checksum errorsResolvedDavid ZafmanActions
Copied to RADOS - Backport #25227: luminous: OSD: still returning EIO instead of recovering objects on checksum errorsResolvedDavid ZafmanActions
Actions

Also available in: Atom PDF