Project

General

Profile

Actions

Bug #41784

open

ceph-volume lvm create leaves half-built OSDs lying around

Added by Matthew Vernon over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We keep finding part-made OSDs (they appear not attached to any host,
and down and out; but still counting towards the number of OSDs); we
never saw this with ceph-disk. On investigation, this is because
ceph-volume lvm create makes the OSD (ID and auth at least) too early in
the process and is then unable to roll-back cleanly (because the
bootstrap-osd credential isn't allowed to remove OSDs).

As an example (very truncated):

Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
i - osd new 20cea174-4c1b-4330-ad33-505a03156c33
Running command: vgcreate --force --yes
ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh
stderr: Device /dev/sdbh not found (or ignored by filtering).
Unable to add physical volume '/dev/sdbh' to volume group
'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'.
-
> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.828 --yes-i-really-mean-it
stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find
a keyring on
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
(2) No such file or directory
stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient:
authenticate NOTE: no keyring found; disabled cephx authentication
2019-09-10 15:07:53.397334 7fbca2caf700 0 librados: client.admin
authentication error (95) Operation not supported

This is annoying to have to clear up, and it seems to me could be
avoided by either:

i) ceph-volume should (attempt to) set up the LVM volumes &c before
making the new OSD id
or
ii) adjust the bootstrap-osd credential to grant it the ability to purge OSDs

i) seems like clearly the better answer...?

No data to display

Actions

Also available in: Atom PDF