Project

General

Profile

Actions

Bug #7237

closed

replaced osd is silently prevented from joining the cluster

Added by Alexandre Oliva about 10 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The disk holding one of my OSDs died. After replacing the disk and creating a new osd fs on it, the osd would start and get osdmaps, but it wouldn't join the cluster. No information about it was present in the monitor or osd logs, so I thought it was just slow to join, but after a while it became clear that was not it. This used to work, but the recent fix for bug #6605 make the mon reject a replacement osd with a different uuid.

I looked for docs on how to enable a replacement osd to join with a different uuid, but I couldn't find any. I thought “ceph osd lost” might do it, but I didn't actually try that; I'd rather stick with the original uuid, so that my earlier snapshots of that osd (preserved elsewhere) would still be usable. I ended up editing the osd\\usuperblock object, but I wish there was an easier way to do this.

So I guess this bug report calls for 3 changes:

  1. document, maybe under “single ceph-osd failure”, what should be done after recreating an osd from scratch, for it to join the cluster (ceph osd lost, or create it anew with the same uuid)
  1. get the mon to tell the osd when it rejects the boot request, so that the osd can report the error and suggest what to do
  1. introduce an option to ceph-osd for the user to specify the uuid, and/or have the osd ask the monitor for the uuid of the named osd
Actions #1

Updated by Sage Weil about 7 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF