Bug #23117: PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been exceeded once - RADOS - Ceph

Actions

Copy link

Bug #23117

open

PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been exceeded once

Added by Oliver Freyermuth about 6 years ago. Updated over 1 year ago.

Status:

Fix Under Review

Priority:

High

Assignee:

Prashant D

Category:

Administration/Usability

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v12.2.3

ceph-qa-suite:

Component(RADOS):

Monitor, OSD

Pull request ID:

44962

Crash signature (v1):

Crash signature (v2):

Description

In the following setup:

6 OSD hosts
Each host with 32 disks = 32 OSDs
Pool with 2048 PGs, EC, k=4, m=2, crush failure domain host

When (re)installing the 6th host and creating the first OSD on it, PG overdose protection kicks in shortly,
since all PGs need to have shards on the 6th host.
For this reason, PGs enter "activating" state and get stuck there.

However, even when all 32 OSDs are added on the 6th host, the PGs are still stuck in activating and data stays unavailable (even though ODSs were added).
This situation does not resolve by itself.

This issue can be resolved by setting:

osd_max_pg_per_osd_hard_ratio = 32

before the redeployment of a host, thus effectively turning off overdose protection.

For one example PG in the stuck state:

# ceph pg dump all | grep 2.7f6
dumped all
2.7f6     38086                  0    38086         0       0 2403961148 1594     1594           activating+undersized+degraded+remapped 2018-02-24 19:50:01.654185 39755'134350  39946:274873  [153,6,42,95,115,167]        153 [153,NONE,42,95,115,167]            153 39559'109078 2018-02-24 04:01:57.991376     36022'53756 2018-02-22 18:03:40.386421             0

I have uploaded OSD logs from all involved OSDs:

c3953bf7-b482-4705-a7a3-df354453a933 for OSD 6 (which was reinstalled, so maybe this is irrelevant)
833c07e2-09ff-409c-b68f-1a87e7bfc353 for OSD 4, which was the first OSD reinstalled on the new OSD host, so it should have been affected by overdose protection
cb146d33-e6cb-4c84-8b15-543728bbc5dd for OSD.42
f716a2d1-e7ef-46d7-b4fc-dfc440e6fe59 for OSD.95
fc7ec27a-82c9-4fb4-94dc-5dd64335e3b4 for OSD.115
51213f5f-1b91-42b0-8c0c-8acf3622195f for OSD.153
3d67f227-4dba-4c93-9fe1-7951d3d32f30 for OSD 167

I have also uploaded the ceph.conf of osd001 which was the reinstalled OSD host:
64744f9a-e136-40f9-a392-4a6f1b34a74e
All other OSD hosts have

osd_max_pg_per_osd_hard_ratio = 32

set (which prevents the issue).

Additionally, I have uploaded all OSD logs of the reinstalled osd001 machine:
38ddd08f-6c66-4a88-8e83-f4eff0ae5d10
(so this includes osd.4 and osd.6 already linked above).

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by Greg Farnum about 6 years ago

Project changed from Ceph to RADOS
Category set to Administration/Usability
Priority changed from Normal to High
Component(RADOS) Monitor, OSD added

Actions

Copy link

Updated by Gaudenz Steinlin almost 5 years ago

We also hit this problem with a cluster which had replicated pools with a replication factor of 3 and a CRUSH rule wich mapped those pools to only 3 hosts. We reinstalled one host as part of a migration from filestore to bluestore. During the reinstallation we removed all the OSDs on the host from the cluster (ceph osd purge). When adding the first bluestore OSD all PGs tried to create a replica on this OSD and PG overdose protection (osd_max_pg_per_osd_hard_ratio) kicked in. The PGs affected then stayed in "activating" state. They did not peer again when adding additional OSDs (which would have brought the number of PGs on all OSDs below the hard limit). They only left activating state when we manually restarted all OSDs on the host.

IMHO this is a bug and PGs should have restarted peering after adding additional OSDs.

Actions

Copy link

Updated by Gaudenz Steinlin almost 5 years ago

Ceph version was 13.2.5 on the reinstalled host and 13.2.4 on the other hosts.

Actions

Copy link

Updated by Neha Ojha almost 4 years ago

Priority changed from High to Normal

We should try to make it more obvious when this limit is hit. I thought we added something in the cluster logs about this, need to verify.

Actions

Copy link

Updated by Ross Martyn about 3 years ago

We also his this issue last week on Ceph Version 12.2.11.

Cluster configured with a replication factor of 3, issue hit during the addition of a few OSD's that are much larger than previous ones (3x). Original disks had ~100PGs per OSD. We hit the 750 hard limit on three OSDs and all three needed restarting before they would activate. One PG transitioned to unknown but was also fixed by the restart of the OSD process.

We also feel this is a bug and PG's should have peered once the number of PG's had reduced below the hard limit.

Actions

Copy link

Updated by Neha Ojha about 3 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by Vikhyat Umrao about 3 years ago

I am aware of one place where we do log withholding pg creation, the following log message in the OSD logs.
https://github.com/ceph/ceph/pull/22839/files

Actions

Copy link