Bug #38930: ceph osd safe-to-destroy wrongly approves any out osd - RADOS - Ceph

Actions

Copy link

Bug #38930

closed

ceph osd safe-to-destroy wrongly approves any out osd

Added by Dan van der Ster about 5 years ago. Updated almost 5 years ago.

Status:

Duplicate

Priority:

High

Assignee:

David Zafman

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v12.2.11

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

27503

Crash signature (v1):

Crash signature (v2):

Description

With v12.2.11, we found that ceph osd safe-to-destroy is wrongly reporting that all out osds are safe to destroy.

E.g.: osd.461 has 43 PGs, and this is good so far:

> # ceph osd ok-to-stop 461
OSD(s) 461 are ok to stop without reducing availability, provided there are no other concurrent failures or interventions. 43 PGs are likely to be degraded (but remain available) as a result.

# ceph osd safe-to-destroy 461
Error EBUSY: OSD(s) 461 have 86 pgs currently mapped to them

Now we stop the OSD, and 43 PGs are degraded:

# systemctl stop ceph-osd@461

             43    active+undersized+degraded

At this point the safe-to-destroy output is still somewhat ok:

# ceph osd safe-to-destroy 461
Error EBUSY: OSD(s) 461 last reported they still store some PG data, and not all PGs are active+clean; we cannot be sure they aren't still needed.

Now we mark the OSD out, and it immediately becomes safe to destroy:

[11:51][root@cepherin-mon-7cb9b591e1 (production:ceph/erin/mon*2:leader) ~]# ceph osd out 461
marked out osd.461. 

[11:51][root@cepherin-mon-7cb9b591e1 (production:ceph/erin/mon*2:leader) ~]# ceph osd safe-to-destroy 461
OSD(s) 461 are safe to destroy without reducing data durability.

IMHO it would be better if it reported EBUSY if there are any degraded PGs related to the OSD.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Greg Farnum about 5 years ago

Hmm, maybe the pg_map is purged of any OSD marked out? Although you can have up OSDs that are out so that shouldn't be the case.

Are you sure this wasn't just a victim of bad timing and the PGs all went clean in that interval? ;)

Actions

Copy link

Updated by David Zafman about 5 years ago

Assignee set to David Zafman

Actions

Copy link

Updated by David Zafman about 5 years ago

Related to Bug #39099: Give recovery for inactive PGs a higher priority added

Actions

Copy link

Updated by Sage Weil about 5 years ago

Status changed from New to In Progress
Assignee changed from David Zafman to Sage Weil
Priority changed from Normal to High

Okay, reproduced this with vstart. When I mark an OSD out, I get

Error EBUSY: OSD(s) 0 last reported they still store some PG data, and not all PGs are active+clean; we cannot be sure they aren't still needed.

If I stop osd.0 and try again, then I get

OSD(s) 0 are safe to destroy without reducing data durability.

I think the problem is that we're relying on osd_stat_t's num_pgs, but that is cleared out if the OSD is down. So I think if the OSD is down we need a different check... which is basically whether any PGs are degraded, perhaps? There isn't a good way to tell what PGs are sitting on the down OSD and whether we might want them. Maybe we should include a full spg_t list in osd_stat_t?

Actions

Copy link

Updated by David Zafman about 5 years ago

The message below outputs too many PGs. It counts active + up from pg_count as if the actingset and upset are disjoint, but in an active+clean pg they are the same OSDs, so these counts are doubled:

OSD ### have ## pgs currently mapped to them

Actions

Copy link