Project

General

Profile

Actions

Bug #24875

closed

OSD: still returning EIO instead of recovering objects on checksum errors

Added by Greg Farnum almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A report came in on the mailing list of an MDS journal which couldn't be read and was throwing errors:

2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 10:292cf221:::200.00000000:head

And indeed, when you search for that log message it pops up in PrimaryLogPG::do_read() and do_sparse_read() (and also struct FillInVerifyExtent). When it pops up, the function returns -EIO, and do_osd_ops() (which is the only caller) turns that into a direct client return.
There's a comment "try repair later" which makes me think the author expected the EIO to get turned into a read-repair, but tracing back through git history there's no indication of any work done to enable that in this path.


Related issues 3 (0 open3 closed)

Related to RADOS - Bug #25084: Attempt to read object that can't be repaired loops foreverResolvedDavid Zafman07/24/2018

Actions
Copied to RADOS - Backport #25226: mimic: OSD: still returning EIO instead of recovering objects on checksum errorsResolvedDavid ZafmanActions
Copied to RADOS - Backport #25227: luminous: OSD: still returning EIO instead of recovering objects on checksum errorsResolvedDavid ZafmanActions
Actions #1

Updated by David Zafman almost 6 years ago

The do_sparse_read() path doesn't attempt to repair a checksum error. Could that be the real issue?

The do_read() path looks fine since it calls rep_repair_primary_object() whether the EIO came from the disk or the crc check.

      if (oi.data_digest != crc) {
        osd->clog->error() << info.pgid << std::hex
                           << " full-object read crc 0x" << crc
                           << " != expected 0x" << oi.data_digest
                           << std::dec << " on " << soid;
        r = -EIO; // try repair later
      }
    }
    if (r == -EIO) {
      r = rep_repair_primary_object(soid, ctx->op);
    }
Actions #2

Updated by Greg Farnum almost 6 years ago

Ah, the error was reported on luminous, which doesn't do the repair, and I guess I missed it on master. Sorry for the mis-diagnosis.

(Looks like the MDS doesn't use sparse-read, but we should definitely still fix that path too!)

Actions #3

Updated by Josh Durgin almost 6 years ago

  • Priority changed from Normal to High
Actions #4

Updated by Dan van der Ster almost 6 years ago

Is this the relevant fix? https://github.com/ceph/ceph/commit/4667280f8afe6cd68dfffea61d7530581f3dd0eb

Alessandro's OSDs are bluestore, and he doesn't get any bluestore block checksum errors. So the crc can be wrong when the bluestore data is correct?

Will the above patch correct this type of crc error?

Also, we deep-scrubbed the PG and it didn't find any inconsistent objects.

Actions #5

Updated by Dan van der Ster almost 6 years ago

FTR, this crc issue is probably due to an incomplete backport to 12.2.6 of the skip_digest changes for bluestore:

[12:56:59]  <dvanders> regarding the 12.2.6 cephfs crc errors, could it be `b519a0b1c1 osd/PrimaryLogPG: do not generate data digest for BlueStore by default`  or one of the other omap/data_digest changes that landed in 12.2.6 ?
[12:59:29]  <dvanders> Seems similar to https://tracker.ceph.com/issues/23871 ... which was fixed in mimic but not luminous: `fe5038c7f9 osd/PrimaryLogPG: clear data digest on WRITEFULL if skip_data_digest`
[14:10:16]  <sage>    dvanders: yeah does seem similar, but i'm not sure why it would manifest during a .5 to .6 upgrade.  looking...
[14:10:37]  <sage>    dvanders: it's definitely bluestore-only?
[14:11:20]  <dvanders>    i didn't try filestore, but all the clusters i've seen were bluestore
[14:14:56]  <sage>    ah, it's becaues teh other skip_digest handlng code was just backporting/changed
[14:14:59]  <sage>    backported/changed

This issue is related to https://tracker.ceph.com/issues/23871

Actions #6

Updated by David Zafman almost 6 years ago

  • Related to Bug #25084: Attempt to read object that can't be repaired loops forever added
Actions #7

Updated by David Zafman almost 6 years ago

  • Status changed from New to 12
  • Assignee set to David Zafman
Actions #8

Updated by David Zafman almost 6 years ago

  • Backport set to mimic, luminous
Actions #9

Updated by David Zafman almost 6 years ago

  • Copied to Backport #25226: mimic: OSD: still returning EIO instead of recovering objects on checksum errors added
Actions #10

Updated by David Zafman almost 6 years ago

  • Copied to Backport #25227: luminous: OSD: still returning EIO instead of recovering objects on checksum errors added
Actions #12

Updated by David Zafman almost 6 years ago

  • Status changed from 12 to In Progress
Actions #13

Updated by Kefu Chai almost 6 years ago

  • Status changed from In Progress to Pending Backport
Actions #14

Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF