Project

General

Profile

Actions

Bug #48212

closed

poollast_epoch_clean floor is stuck after pg merging

Added by Dan van der Ster over 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

100%

Source:
Community (user)
Tags:
Backport:
nautilus octopus pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We just merged a pool (id 36) from 1024 to 64 PGs, and after this was done the cluster osdmaps were no longer trimmed.

I found that this was because the merged pool's last_epoch_clean floor (in OSDMonitor.cc) was stuck at the epoch before merging started (e163735).

# ceph report | grep committed
    "osdmap_first_committed": 163735,
    "osdmap_last_committed": 168376,

# ceph report | jq .osdmap_clean_epochs.last_epoch_clean
  "osdmap_clean_epochs": {
    "min_last_epoch_clean": 168375,
    "last_epoch_clean": {
      "per_pool": [
        {
          "poolid": 3,
          "floor": 168375
        },
...
        {
          "poolid": 35,
          "floor": 168375
        },
        {
          "poolid": 36,
          "floor": 163735
        }
      ]
    },

To workaround I restarted the mon leader, after which the pool 36 min_epoch_clean caught up with the other pools and osdmaps were trimmed.

Is this a bug? Or was I perhaps too impatient after the merge. (Maybe all the merged PGs need a deep scrub or something like that before the l_e_c will catch up?)

Thanks!


Files

kvm7_mon_usage.png (20.9 KB) kvm7_mon_usage.png David Herselman, 01/09/2021 11:59 AM
ceph-mon_free.png (71.3 KB) ceph-mon_free.png David Herselman, 01/11/2021 07:58 AM

Related issues 3 (0 open3 closed)

Copied to RADOS - Backport #51568: pacific: pool last_epoch_clean floor is stuck after pg mergingResolvedCory SnyderActions
Copied to RADOS - Backport #51569: octopus: pool last_epoch_clean floor is stuck after pg mergingResolvedActions
Copied to RADOS - Backport #52644: nautilus: pool last_epoch_clean floor is stuck after pg mergingRejectedKonstantin ShalyginActions
Actions

Also available in: Atom PDF