Bug #23117

PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been exceeded once

Added by Oliver Freyermuth almost 3 years ago. Updated 7 months ago.

2 - major
Monitor, OSD
In the following setup:
  • 6 OSD hosts
  • Each host with 32 disks = 32 OSDs
  • Pool with 2048 PGs, EC, k=4, m=2, crush failure domain host

When (re)installing the 6th host and creating the first OSD on it, PG overdose protection kicks in shortly,
since all PGs need to have shards on the 6th host.
For this reason, PGs enter "activating" state and get stuck there.

However, even when all 32 OSDs are added on the 6th host, the PGs are still stuck in activating and data stays unavailable (even though ODSs were added).
This situation does not resolve by itself.

This issue can be resolved by setting:

osd_max_pg_per_osd_hard_ratio = 32

before the redeployment of a host, thus effectively turning off overdose protection.

For one example PG in the stuck state:
# ceph pg dump all | grep 2.7f6
dumped all
2.7f6     38086                  0    38086         0       0 2403961148 1594     1594           activating+undersized+degraded+remapped 2018-02-24 19:50:01.654185 39755'134350  39946:274873  [153,6,42,95,115,167]        153 [153,NONE,42,95,115,167]            153 39559'109078 2018-02-24 04:01:57.991376     36022'53756 2018-02-22 18:03:40.386421             0 

I have uploaded OSD logs from all involved OSDs:
  • c3953bf7-b482-4705-a7a3-df354453a933 for OSD 6 (which was reinstalled, so maybe this is irrelevant)
  • 833c07e2-09ff-409c-b68f-1a87e7bfc353 for OSD 4, which was the first OSD reinstalled on the new OSD host, so it should have been affected by overdose protection
  • cb146d33-e6cb-4c84-8b15-543728bbc5dd for OSD.42
  • f716a2d1-e7ef-46d7-b4fc-dfc440e6fe59 for OSD.95
  • fc7ec27a-82c9-4fb4-94dc-5dd64335e3b4 for OSD.115
  • 51213f5f-1b91-42b0-8c0c-8acf3622195f for OSD.153
  • 3d67f227-4dba-4c93-9fe1-7951d3d32f30 for OSD 167

I have also uploaded the ceph.conf of osd001 which was the reinstalled OSD host:
All other OSD hosts have

osd_max_pg_per_osd_hard_ratio = 32

set (which prevents the issue).

Additionally, I have uploaded all OSD logs of the reinstalled osd001 machine:
(so this includes osd.4 and osd.6 already linked above).


#1 Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to RADOS
  • Category set to Administration/Usability
  • Priority changed from Normal to High
  • Component(RADOS) Monitor, OSD added

#2 Updated by Gaudenz Steinlin over 1 year ago

We also hit this problem with a cluster which had replicated pools with a replication factor of 3 and a CRUSH rule wich mapped those pools to only 3 hosts. We reinstalled one host as part of a migration from filestore to bluestore. During the reinstallation we removed all the OSDs on the host from the cluster (ceph osd purge). When adding the first bluestore OSD all PGs tried to create a replica on this OSD and PG overdose protection (osd_max_pg_per_osd_hard_ratio) kicked in. The PGs affected then stayed in "activating" state. They did not peer again when adding additional OSDs (which would have brought the number of PGs on all OSDs below the hard limit). They only left activating state when we manually restarted all OSDs on the host.

IMHO this is a bug and PGs should have restarted peering after adding additional OSDs.

#3 Updated by Gaudenz Steinlin over 1 year ago

Ceph version was 13.2.5 on the reinstalled host and 13.2.4 on the other hosts.

#4 Updated by Neha Ojha 7 months ago

  • Priority changed from High to Normal

We should try to make it more obvious when this limit is hit. I thought we added something in the cluster logs about this, need to verify.

