Project

General

Profile

Bug #22673

osd checks out-of-date osdmap for DESTROYED flag on start

Added by J Mozdzen 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
Start date:
01/12/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):

Description

When trying an in-place migration of a filestore to bluestore OSD, we encountered a situation where ceph-osd would report "osdmap says I am destroyed, exiting" and then abort start-up, although the OSD had been recreated via "ceph osd new". Analyzing it's log file and available osdmap epochs, it does look like ceph-osd is accessing an old epoch:

--- cut here ---
  1. dumping the mentioned osdmap epochs reports the following "modified" timestamps:
  2. 110587: modified 2018-01-09 23:43:13.202381
  3. 110591: modified 2018-01-10 00:30:44.850078 #
  4. 110587 is old and correctly reports the osd as "destroyed" (as a result of "ceph osd destroy"), while the current osdmap 110591 reports the OSD
  5. as "autoout,exists,new" because of a recently issued "ceph osd new".
  6. The OSD start occurs after the latest osdmap modification. Epoch 110587 was current when the OSD was previously started, which failed due to errors.)

2018-01-10 00:31:15.453871 7f1c57c58d00 0 osd.999 110587 crush map has features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:31:15.453882 7f1c57c58d00 0 osd.999 110587 crush map has features 288232610642264064 was 8705, adjusting msgr requires for mons
2018-01-10 00:31:15.453887 7f1c57c58d00 0 osd.999 110587 crush map has features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:31:15.453940 7f1c57c58d00 0 osd.999 110587 load_pgs
2018-01-10 00:31:15.453945 7f1c57c58d00 0 osd.999 110587 load_pgs opened 0 pgs
2018-01-10 00:31:15.453952 7f1c57c58d00 0 osd.999 110587 using weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:31:15.454862 7f1c57c58d00 1 osd.999 110587 log_to_monitors {default=true}
2018-01-10 00:31:15.520533 7f1c57c58d00 0 osd.999 110587 done with init, starting boot process
2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says I am destroyed, exiting
--
cut here ---

The cluster was under no high load during all that, so shouldn't the osd catch up to the current osdmap epoch on start?

Version reported by ceph-osd: ceph version v12.2.2-356-gb87ca3c12e (b87ca3c12e27c5950edb7970044c318365dd91a8) luminous (stable)

Command sequence that lead to this:

- ceph osd destroy 999 --yes-i-really-mean-it
- (tearing down the old filestore OSD)
- ceph-volume lvm zap /dev/sdzz
- (2018-01-09 23:54:01) ceph-volume lvm create --bluestore --osd-id 999 --data /dev/sdzz
- (start of ceph-osd failed because of missing auth, known problem with ceph-volume)
- ceph auth add osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' i /var/lib/ceph/osd/ceph-999/keyring
restart failed because of "osdmap says I am destroyed, exiting" (which was correct at that time)
- (2018-01-10 00:30:44) ceph osd new $(cat /var/lib/ceph/osd/ceph-999/fs_id) 999 (leading to osdmap epoch 110591 with the OSD marked as "autoout,exists,new"

In an earlier attempt during that command sequence, where the OSD was still marked as destroyed but auth was already set up ("ceph-volume lvm create", "ceph auth add osd.999", but no "ceph osd new ..."), the ceph-osd log revealed that different osdmaps where checked when starting the OSD:

--- cut here ---
  1. 109892: much too old to offer any details
  2. 110587: modified 2018-01-09 23:43:13.202381

2018-01-10 00:08:00.945507 7fc55905bd00 0 osd.999 0 crush map has features 288232575208783872, adjusting msgr requires for clients
2018-01-10 00:08:00.945514 7fc55905bd00 0 osd.999 0 crush map has features 288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:08:00.945521 7fc55905bd00 0 osd.999 0 crush map has features 288232575208783872, adjusting msgr requires for osds
2018-01-10 00:08:00.945588 7fc55905bd00 0 osd.999 0 load_pgs
2018-01-10 00:08:00.945594 7fc55905bd00 0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:08:00.945599 7fc55905bd00 0 osd.999 0 using weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:08:00.946544 7fc55905bd00 1 osd.999 0 log_to_monitors {default=true}
2018-01-10 00:08:00.951720 7fc55905bd00 0 osd.999 0 done with init, starting boot process
2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for initial osdmap
2018-01-10 00:08:00.970644 7fc546614700 0 osd.999 109892 crush map has features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:08:00.970653 7fc546614700 0 osd.999 109892 crush map has
features 288232610642264064 was 288232575208792577, adjusting msgr requires for mons
2018-01-10 00:08:00.970660 7fc546614700 0 osd.999 109892 crush map has features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says I am destroyed, exiting
--
cut here ---

I cannot tell where it got epoch 109892 from, which looks "ancient". 110587 was the then-current epoch, so the OSD seems to catch up on start, at least under some conditions.

Why is the start at 00:31:15 still using epoch 110587, rather than catching up to then-current 110591 (or, if this is a timing issue, at least to osdmap 110590, modified 2018-01-10 00:13:05.824314, so 18 minutes old)?

Am I seeing a bug or is this caused by operator error?


Related issues

Copied to RADOS - Backport #22761: luminous: osd checks out-of-date osdmap for DESTROYED flag on start Resolved

History

#1 Updated by Sage Weil 3 months ago

  • Project changed from bluestore to RADOS
  • Status changed from New to Verified
  • Priority changed from Normal to Urgent
  • Backport set to luminous

This is a real bug, should be straightforward to fix. Thanks for the report!

#2 Updated by Sage Weil 3 months ago

It looks like the _preboot destroyed check should go after we catch up on maps.

#3 Updated by Sage Weil 3 months ago

  • Status changed from Verified to Need Review

note: you can work around this by waiting a bit until some osd maps trim from the monitor.

https://github.com/ceph/ceph/pull/19988

#4 Updated by Sage Weil 3 months ago

  • Subject changed from bluestore osd uses out-of-date osdmap on start to osd checks out-of-date osdmap for DESTROYED flag on start

#5 Updated by Kefu Chai 3 months ago

  • Status changed from Need Review to Pending Backport

#6 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #22761: luminous: osd checks out-of-date osdmap for DESTROYED flag on start added

#7 Updated by Nathan Cutler 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF