Project

General

Profile

Actions

Bug #38930

closed

ceph osd safe-to-destroy wrongly approves any out osd

Added by Dan van der Ster about 5 years ago. Updated almost 5 years ago.

Status:
Duplicate
Priority:
High
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

With v12.2.11, we found that ceph osd safe-to-destroy is wrongly reporting that all out osds are safe to destroy.

E.g.: osd.461 has 43 PGs, and this is good so far:

> # ceph osd ok-to-stop 461
OSD(s) 461 are ok to stop without reducing availability, provided there are no other concurrent failures or interventions. 43 PGs are likely to be degraded (but remain available) as a result.

# ceph osd safe-to-destroy 461
Error EBUSY: OSD(s) 461 have 86 pgs currently mapped to them

Now we stop the OSD, and 43 PGs are degraded:

# systemctl stop ceph-osd@461

             43    active+undersized+degraded

At this point the safe-to-destroy output is still somewhat ok:

# ceph osd safe-to-destroy 461
Error EBUSY: OSD(s) 461 last reported they still store some PG data, and not all PGs are active+clean; we cannot be sure they aren't still needed.

Now we mark the OSD out, and it immediately becomes safe to destroy:

[11:51][root@cepherin-mon-7cb9b591e1 (production:ceph/erin/mon*2:leader) ~]# ceph osd out 461
marked out osd.461. 

[11:51][root@cepherin-mon-7cb9b591e1 (production:ceph/erin/mon*2:leader) ~]# ceph osd safe-to-destroy 461
OSD(s) 461 are safe to destroy without reducing data durability.

IMHO it would be better if it reported EBUSY if there are any degraded PGs related to the OSD.


Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #39099: Give recovery for inactive PGs a higher priorityResolvedDavid Zafman04/03/2019

Actions
Actions #1

Updated by Greg Farnum about 5 years ago

Hmm, maybe the pg_map is purged of any OSD marked out? Although you can have up OSDs that are out so that shouldn't be the case.

Are you sure this wasn't just a victim of bad timing and the PGs all went clean in that interval? ;)

Actions #2

Updated by David Zafman about 5 years ago

  • Assignee set to David Zafman
Actions #3

Updated by David Zafman about 5 years ago

  • Related to Bug #39099: Give recovery for inactive PGs a higher priority added
Actions #4

Updated by Sage Weil about 5 years ago

  • Status changed from New to In Progress
  • Assignee changed from David Zafman to Sage Weil
  • Priority changed from Normal to High

Okay, reproduced this with vstart. When I mark an OSD out, I get

Error EBUSY: OSD(s) 0 last reported they still store some PG data, and not all PGs are active+clean; we cannot be sure they aren't still needed.

If I stop osd.0 and try again, then I get

OSD(s) 0 are safe to destroy without reducing data durability.

I think the problem is that we're relying on osd_stat_t's num_pgs, but that is cleared out if the OSD is down. So I think if the OSD is down we need a different check... which is basically whether any PGs are degraded, perhaps? There isn't a good way to tell what PGs are sitting on the down OSD and whether we might want them. Maybe we should include a full spg_t list in osd_stat_t?

Actions #5

Updated by David Zafman about 5 years ago

The message below outputs too many PGs. It counts active + up from pg_count as if the actingset and upset are disjoint, but in an active+clean pg they are the same OSDs, so these counts are doubled:

OSD ### have ## pgs currently mapped to them

Actions #6

Updated by David Zafman about 5 years ago

  • Assignee changed from Sage Weil to David Zafman
Actions #7

Updated by David Zafman about 5 years ago

  • Pull request ID set to 27503

The fix checks for down OSD when all PGs aren't active+clean and doesn't trust num_pgs which is 0 after marking a down OSD out.

This is the result:

$ ceph osd safe-to-destroy 1
Error EAGAIN: OSD 1 have no reported stats, and not all PGs are active+clean; we cannot draw any conclusions.

Actions #8

Updated by David Zafman about 5 years ago

  • Backport set to luminous, mimic, nautilus
Actions #9

Updated by David Zafman almost 5 years ago

  • Status changed from In Progress to Pending Backport
Actions #10

Updated by David Zafman almost 5 years ago

  • Status changed from Pending Backport to Duplicate

We can backport pull request https://github.com/ceph/ceph/pull/27503 for http://tracker.ceph.com/issues/39099 which includes this change. Reopen this if we need to do it separately.

Actions #11

Updated by Nathan Cutler almost 5 years ago

  • Backport deleted (luminous, mimic, nautilus)
Actions #12

Updated by Nathan Cutler almost 5 years ago

  • Is duplicate of Bug #39099: Give recovery for inactive PGs a higher priority added
Actions #13

Updated by Nathan Cutler almost 5 years ago

  • Related to deleted (Bug #39099: Give recovery for inactive PGs a higher priority)
Actions

Also available in: Atom PDF