Project

General

Profile

Bug #41383

scrub object count mismatch on device_health_metrics pool

Added by Sage Weil over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

jenglisch on irc reports multiple scrub errors (error, repaired, reappeared a few days later) on metrics pool.

2019-08-19 11:23:09.794 7fc4645f7700  0 log_channel(cluster) log [DBG] : 54.0 scrub starts
2019-08-19 11:23:09.852 7fc4645f7700 -1 log_channel(cluster) log [ERR] : 54.0 scrub : stat mismatch, got 410/411 objects, 0/0 clones, 410/411 dirty, 410/411 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/0 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-08-19 11:23:09.852 7fc4645f7700 -1 log_channel(cluster) log [ERR] : 54.0 scrub 1 errors
2019-08-19 11:25:05.772 7fc4625f3700  0 log_channel(cluster) log [DBG] : 1.f72 scrub starts
2019-08-19 11:25:15.566 7fc4625f3700  0 log_channel(cluster) log [DBG] : 1.f72 scrub ok
2019-08-19 13:14:53.350 7fc4645f7700  0 log_channel(cluster) log [DBG] : 54.0 repair starts
2019-08-19 13:14:53.481 7fc4645f7700 -1 log_channel(cluster) log [ERR] : 54.0 repair : stat mismatch, got 410/411 objects, 0/0 clones, 410/411 dirty, 410/411 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/0 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-08-19 13:14:53.481 7fc4645f7700 -1 log_channel(cluster) log [ERR] : 54.0 repair 1 errors, 1 fixed

version is 14.2.2


Related issues

Copied to RADOS - Backport #42739: nautilus: scrub object count mismatch on device_health_metrics pool Resolved

History

#1 Updated by Greg Farnum over 4 years ago

This may be the empty object names that the device health manager was inappropriately creating? See the thread "[ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error".

That would be fixed on the RADOS side by https://github.com/ceph/ceph/pull/27929 and back ported for nautilus 14.2.3 (but not the 14.2.2 this report is from). The manager plugin issue was discussed but perhaps not fixed back in May.

#2 Updated by Greg Farnum over 4 years ago

  • Status changed from 12 to Need More Info

#3 Updated by Sage Weil over 4 years ago

  • Status changed from Need More Info to Fix Under Review
  • Backport set to nautilus
  • Pull request ID set to 31474

exercise an abundance of caution!

#4 Updated by Sage Weil over 4 years ago

  • Status changed from Fix Under Review to Pending Backport

#5 Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #42739: nautilus: scrub object count mismatch on device_health_metrics pool added

#6 Updated by Nathan Cutler about 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF