Project

General

Profile

Support #54315

1 fsck error per osd during nautilus -> octopus upgrade (S3 cluster)

Added by Dan van der Ster 12 months ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

At the end of the conversion to per-pool omap, around half of our OSDs had 1 error, but the log didn't show the error.

2022-02-15T16:02:16.554+0100 7fdfde8d8f00  0 bluestore(/var/lib/ceph/osd/ceph-1247) _fsck_check_objects partial offload, done myself 7925084 of 7942492objects, threads 2
2022-02-15T16:02:16.678+0100 7fdfde8d8f00  1 bluestore(/var/lib/ceph/osd/ceph-1247) _fsck_on_open checking shared_blobs
2022-02-15T16:02:16.693+0100 7fdfde8d8f00  1 bluestore(/var/lib/ceph/osd/ceph-1247) _fsck_on_open checking pool_statfs
2022-02-15T16:17:37.407+0100 7fdfde8d8f00  1 bluestore(/var/lib/ceph/osd/ceph-1247) _fsck_on_open <<<FINISH>>> with 1 errors, 318 warnings, 319 repaired, 0 remaining in 1672.130946 seconds

Full log is posted: ceph-post-file: 82f661a7-b10f-4a80-acaf-37f1268f275e

History

#1 Updated by Dan van der Ster 12 months ago

The only place in the fsck code we found that increments `errors` but doesn't `derr` was this:

        if (vstatfs.is_empty()) {
          // we don't consider that as an error since empty pool statfs
          // are left in DB for now
          dout(20) << "fsck inf: found empty stray Pool StatFS record for pool id 0x" 
                    << std::hex << pool_id << std::dec << dendl;
          if (repairer) {
            // but we need to increment error count in case of repair
            // to have proper counters at the end
            // (as repairer increments recovery counter anyway).
            ++errors;
          }
        } else {

#2 Updated by Dan van der Ster 12 months ago

  • Affected Versions v15.2.15 added

Also available in: Atom PDF