Actions
Bug #38930
closedceph osd safe-to-destroy wrongly approves any out osd
Status:
Duplicate
Priority:
High
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
With v12.2.11, we found that ceph osd safe-to-destroy is wrongly reporting that all out osds are safe to destroy.
E.g.: osd.461 has 43 PGs, and this is good so far:
> # ceph osd ok-to-stop 461 OSD(s) 461 are ok to stop without reducing availability, provided there are no other concurrent failures or interventions. 43 PGs are likely to be degraded (but remain available) as a result. # ceph osd safe-to-destroy 461 Error EBUSY: OSD(s) 461 have 86 pgs currently mapped to them
Now we stop the OSD, and 43 PGs are degraded:
# systemctl stop ceph-osd@461 43 active+undersized+degraded
At this point the safe-to-destroy output is still somewhat ok:
# ceph osd safe-to-destroy 461 Error EBUSY: OSD(s) 461 last reported they still store some PG data, and not all PGs are active+clean; we cannot be sure they aren't still needed.
Now we mark the OSD out, and it immediately becomes safe to destroy:
[11:51][root@cepherin-mon-7cb9b591e1 (production:ceph/erin/mon*2:leader) ~]# ceph osd out 461 marked out osd.461. [11:51][root@cepherin-mon-7cb9b591e1 (production:ceph/erin/mon*2:leader) ~]# ceph osd safe-to-destroy 461 OSD(s) 461 are safe to destroy without reducing data durability.
IMHO it would be better if it reported EBUSY if there are any degraded PGs related to the OSD.
Actions