Bug #23117
PGs stuck in "activating" after osd_max_pg_per_osd_hard_ratio has been exceeded once
0%
Description
- 6 OSD hosts
- Each host with 32 disks = 32 OSDs
- Pool with 2048 PGs, EC, k=4, m=2, crush failure domain host
When (re)installing the 6th host and creating the first OSD on it, PG overdose protection kicks in shortly,
since all PGs need to have shards on the 6th host.
For this reason, PGs enter "activating" state and get stuck there.
However, even when all 32 OSDs are added on the 6th host, the PGs are still stuck in activating and data stays unavailable (even though ODSs were added).
This situation does not resolve by itself.
This issue can be resolved by setting:
osd_max_pg_per_osd_hard_ratio = 32
before the redeployment of a host, thus effectively turning off overdose protection. For one example PG in the stuck state:
# ceph pg dump all | grep 2.7f6 dumped all 2.7f6 38086 0 38086 0 0 2403961148 1594 1594 activating+undersized+degraded+remapped 2018-02-24 19:50:01.654185 39755'134350 39946:274873 [153,6,42,95,115,167] 153 [153,NONE,42,95,115,167] 153 39559'109078 2018-02-24 04:01:57.991376 36022'53756 2018-02-22 18:03:40.386421 0
I have uploaded OSD logs from all involved OSDs:
- c3953bf7-b482-4705-a7a3-df354453a933 for OSD 6 (which was reinstalled, so maybe this is irrelevant)
- 833c07e2-09ff-409c-b68f-1a87e7bfc353 for OSD 4, which was the first OSD reinstalled on the new OSD host, so it should have been affected by overdose protection
- cb146d33-e6cb-4c84-8b15-543728bbc5dd for OSD.42
- f716a2d1-e7ef-46d7-b4fc-dfc440e6fe59 for OSD.95
- fc7ec27a-82c9-4fb4-94dc-5dd64335e3b4 for OSD.115
- 51213f5f-1b91-42b0-8c0c-8acf3622195f for OSD.153
- 3d67f227-4dba-4c93-9fe1-7951d3d32f30 for OSD 167
I have also uploaded the ceph.conf of osd001 which was the reinstalled OSD host:
64744f9a-e136-40f9-a392-4a6f1b34a74e
All other OSD hosts have
osd_max_pg_per_osd_hard_ratio = 32
set (which prevents the issue).
Additionally, I have uploaded all OSD logs of the reinstalled osd001 machine:
38ddd08f-6c66-4a88-8e83-f4eff0ae5d10
(so this includes osd.4 and osd.6 already linked above).
Related issues
History
#1 Updated by Greg Farnum over 5 years ago
- Project changed from Ceph to RADOS
- Category set to Administration/Usability
- Priority changed from Normal to High
- Component(RADOS) Monitor, OSD added
#2 Updated by Gaudenz Steinlin over 4 years ago
We also hit this problem with a cluster which had replicated pools with a replication factor of 3 and a CRUSH rule wich mapped those pools to only 3 hosts. We reinstalled one host as part of a migration from filestore to bluestore. During the reinstallation we removed all the OSDs on the host from the cluster (ceph osd purge). When adding the first bluestore OSD all PGs tried to create a replica on this OSD and PG overdose protection (osd_max_pg_per_osd_hard_ratio) kicked in. The PGs affected then stayed in "activating" state. They did not peer again when adding additional OSDs (which would have brought the number of PGs on all OSDs below the hard limit). They only left activating state when we manually restarted all OSDs on the host.
IMHO this is a bug and PGs should have restarted peering after adding additional OSDs.
#3 Updated by Gaudenz Steinlin over 4 years ago
Ceph version was 13.2.5 on the reinstalled host and 13.2.4 on the other hosts.
#4 Updated by Neha Ojha over 3 years ago
- Priority changed from High to Normal
We should try to make it more obvious when this limit is hit. I thought we added something in the cluster logs about this, need to verify.
#5 Updated by Ross Martyn over 2 years ago
We also his this issue last week on Ceph Version 12.2.11.
Cluster configured with a replication factor of 3, issue hit during the addition of a few OSD's that are much larger than previous ones (3x). Original disks had ~100PGs per OSD. We hit the 750 hard limit on three OSDs and all three needed restarting before they would activate. One PG transitioned to unknown but was also fixed by the restart of the OSD process.
We also feel this is a bug and PG's should have peered once the number of PG's had reduced below the hard limit.
#6 Updated by Neha Ojha over 2 years ago
- Priority changed from Normal to Urgent
#7 Updated by Vikhyat Umrao over 2 years ago
I am aware of one place where we do log withholding pg creation, the following log message in the OSD logs.
https://github.com/ceph/ceph/pull/22839/files
#8 Updated by Neha Ojha about 2 years ago
- Assignee set to Prashant D
#9 Updated by Neha Ojha almost 2 years ago
- Related to Bug #48298: hitting mon_max_pg_per_osd right after creating OSD, then decreases slowly added
#10 Updated by Neha Ojha over 1 year ago
- Priority changed from Urgent to High
#11 Updated by Prashant D over 1 year ago
- Status changed from New to In Progress
#12 Updated by Prashant D over 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 44962
#13 Updated by Vikhyat Umrao about 1 year ago
- Duplicates Bug #57185: EC 4+2 PG stuck in activating+degraded+remapped added
#14 Updated by Vikhyat Umrao about 1 year ago
- Status changed from Fix Under Review to Duplicate
#15 Updated by Vikhyat Umrao about 1 year ago
- Status changed from Duplicate to Fix Under Review