Project

General

Profile

Actions

Bug #53846

closed

ceph-volume should ignore /dev/rbd* devices

Added by Tim Serong over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If rbd devices are mapped on ceph cluster nodes (as they may be if you're running an iSCSI gateway for example), then ceph-volume inventory will list those RBD devices, and quite possibly list them as being "available". This causes a couple of problems:

1) Because /dev/rbd0 appears in the list of available devices, the orchestrator will actually try to deploy OSDs on top of those RBD devices. Luckily, this will fail, because the various LVM invocations will die with "Device /dev/rbd0 excluded by a filter", but really we shouldn't even be trying to do this in the first place. Let's not rely on luck ;-)
2) It's possible for /dev/rbd* devices to be locked/stuck in such a way that when ceph-volume invokes blkid, it hangs indefinitely (the process ends up in D-state). This can actually block the entire orchestrator, because the orchestrator calls out to cephadm periodically to inventory devices, and the latter tries to acquire a lock, which it can't get because a prior invocation is stuck running ceph-volume inventory.

I suggest we make ceph-volume completely ignore /dev/rbd* when doing a device inventory. I know we had a similar discussion on dev@ceph.io regarding ceph-volume listing, or not listing, GPT devices (see https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/N3TK4IO2QYHXIZMQTZ4AMPU5BE56J5MP/#T7UM53WCW2MDD62DDH6KLI4EZXKBXZBY) but the difference here is that mapped RBD volumes really aren't part of the host inventory, so IMO should be excluded.


Related issues 2 (0 open2 closed)

Copied to ceph-volume - Backport #53961: octopus: ceph-volume should ignore /dev/rbd* devicesResolvedGuillaume AbriouxActions
Copied to ceph-volume - Backport #53962: pacific: ceph-volume should ignore /dev/rbd* devicesResolvedGuillaume AbriouxActions
Actions #1

Updated by Michael Fritch over 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Michael Fritch
  • Pull request ID set to 44604
Actions #2

Updated by Michael Fritch over 2 years ago

attempting to open, blkid, etc. on a stale RBD (gone) device, that has not been unmapped, will cause c-v to hang in an uninterruptible "D" state.

Actions #3

Updated by Michael Fritch over 2 years ago

can occur during inventory:

root      114583  0.0  0.3 144488 29188 ?        Ss    2021   0:00  \_ /usr/bin/python3.6 /usr/sbin/ceph-volume inventory --format=json --filter-for-batch
root      114616  0.0  0.0  11988  1036 ?        D     2021   0:00      \_ /usr/sbin/blkid -p /dev/rbd0

and lvm batch:

root       48440  0.0  0.3 144492 29504 ?        Ss   Jan10   0:00  \_ /usr/bin/python3.6 /usr/sbin/ceph-volume lvm batch --no-auto /dev/rbd0 /dev/vdb /dev/vdc /dev/vdd /dev/vde /dev/vdf --yes --no-systemd
root       48477  0.0  0.0  11988   936 ?        D    Jan10   0:00      \_ /usr/sbin/blkid -p /dev/rbd0

which results in a cephadm instance never releasing it's global lock ... which in turn causes stuck/stale operations from the ceph orchestrator

Actions #4

Updated by Guillaume Abrioux over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Guillaume Abrioux over 2 years ago

  • Copied to Backport #53961: octopus: ceph-volume should ignore /dev/rbd* devices added
Actions #6

Updated by Guillaume Abrioux over 2 years ago

  • Copied to Backport #53962: pacific: ceph-volume should ignore /dev/rbd* devices added
Actions #7

Updated by Guillaume Abrioux over 2 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF