Bug #41383
closedscrub object count mismatch on device_health_metrics pool
0%
Description
jenglisch on irc reports multiple scrub errors (error, repaired, reappeared a few days later) on metrics pool.
2019-08-19 11:23:09.794 7fc4645f7700 0 log_channel(cluster) log [DBG] : 54.0 scrub starts 2019-08-19 11:23:09.852 7fc4645f7700 -1 log_channel(cluster) log [ERR] : 54.0 scrub : stat mismatch, got 410/411 objects, 0/0 clones, 410/411 dirty, 410/411 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/0 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes. 2019-08-19 11:23:09.852 7fc4645f7700 -1 log_channel(cluster) log [ERR] : 54.0 scrub 1 errors 2019-08-19 11:25:05.772 7fc4625f3700 0 log_channel(cluster) log [DBG] : 1.f72 scrub starts 2019-08-19 11:25:15.566 7fc4625f3700 0 log_channel(cluster) log [DBG] : 1.f72 scrub ok 2019-08-19 13:14:53.350 7fc4645f7700 0 log_channel(cluster) log [DBG] : 54.0 repair starts 2019-08-19 13:14:53.481 7fc4645f7700 -1 log_channel(cluster) log [ERR] : 54.0 repair : stat mismatch, got 410/411 objects, 0/0 clones, 410/411 dirty, 410/411 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/0 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes. 2019-08-19 13:14:53.481 7fc4645f7700 -1 log_channel(cluster) log [ERR] : 54.0 repair 1 errors, 1 fixed
version is 14.2.2
Updated by Greg Farnum over 4 years ago
This may be the empty object names that the device health manager was inappropriately creating? See the thread "[ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error".
That would be fixed on the RADOS side by https://github.com/ceph/ceph/pull/27929 and back ported for nautilus 14.2.3 (but not the 14.2.2 this report is from). The manager plugin issue was discussed but perhaps not fixed back in May.
Updated by Greg Farnum over 4 years ago
- Status changed from 12 to Need More Info
Updated by Sage Weil over 4 years ago
- Status changed from Need More Info to Fix Under Review
- Backport set to nautilus
- Pull request ID set to 31474
exercise an abundance of caution!
Updated by Sage Weil over 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42739: nautilus: scrub object count mismatch on device_health_metrics pool added
Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".