Project

General

Profile

Actions

Bug #42692

open

ceph-volume lvm zap handles drive failure badly when device node vanishes

Added by gerald yang over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
5 - suggestion
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Sometimes when OSD drives are starting to fail, they will "disappear" from the system - the kernel logs its removing the logical enclosure, and then the relevant /dev/sdXX device entry disappears.

ceph-volume lvm handles this situation badly - the OSD no longer appears in ceph-volume lvm list output, if you try and remove the device it fails:
root@test:~# ceph-volume lvm zap --destroy --osd-id 828
--> RuntimeError: Unable to find any LV for zapping OSD: 828

Further, it's harder because of the LVM layers to find which /dev/sdXX device was involved; and you also have to work out what the affected /dev/dm-XX entry is and remove it by hand

In above case, ceph-volume uses --osd-id to find out which LV/VG to remove, in the code path it needs to access the disappeared device to get information but fails, so it prints out the error messages

There should be some improvement that ceph-volume could find out which LVM and /dev/dm-XX to remove by LVM utilities

No data to display

Actions

Also available in: Atom PDF