Project

General

Profile

Actions

Bug #38930

closed

ceph osd safe-to-destroy wrongly approves any out osd

Added by Dan van der Ster about 5 years ago. Updated almost 5 years ago.

Status:
Duplicate
Priority:
High
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

With v12.2.11, we found that ceph osd safe-to-destroy is wrongly reporting that all out osds are safe to destroy.

E.g.: osd.461 has 43 PGs, and this is good so far:

> # ceph osd ok-to-stop 461
OSD(s) 461 are ok to stop without reducing availability, provided there are no other concurrent failures or interventions. 43 PGs are likely to be degraded (but remain available) as a result.

# ceph osd safe-to-destroy 461
Error EBUSY: OSD(s) 461 have 86 pgs currently mapped to them

Now we stop the OSD, and 43 PGs are degraded:

# systemctl stop ceph-osd@461

             43    active+undersized+degraded

At this point the safe-to-destroy output is still somewhat ok:

# ceph osd safe-to-destroy 461
Error EBUSY: OSD(s) 461 last reported they still store some PG data, and not all PGs are active+clean; we cannot be sure they aren't still needed.

Now we mark the OSD out, and it immediately becomes safe to destroy:

[11:51][root@cepherin-mon-7cb9b591e1 (production:ceph/erin/mon*2:leader) ~]# ceph osd out 461
marked out osd.461. 

[11:51][root@cepherin-mon-7cb9b591e1 (production:ceph/erin/mon*2:leader) ~]# ceph osd safe-to-destroy 461
OSD(s) 461 are safe to destroy without reducing data durability.

IMHO it would be better if it reported EBUSY if there are any degraded PGs related to the OSD.


Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #39099: Give recovery for inactive PGs a higher priorityResolvedDavid Zafman04/03/2019

Actions
Actions

Also available in: Atom PDF