Project

General

Profile

Actions

Bug #37492

open

ceph-volume lvm is unreliable if the system contains broken disks

Added by Paul Emmerich over 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Almost all ceph-volume lvm commands invoke some lvm command which access all disks. A simple example is just lvs which will always read LVM data from all disks (even if you explicitly specify only the lv you are interested in). Almost all the lvm commands make at least one call to lvs.

Now the problem is that lvs can just hang indefintely if you have a disk in a bad broken state until you remove the disk or reboot the server. This is, of course, rare. More common is that it hangs for a minute or is just really slow. So ceph-volume can become unusable if a unrelated disk is broken.

ceph-disk didn't have this problem because it didn't access all other disks when operating on one disk.

I don't think there is a feasible way around this since it's unfortunately a LVM problem :(

Actions #1

Updated by Jan Fajerski over 4 years ago

We're currently looking at avoiding calls to list all LVs in a system by relying on LVMs select feature (lvs -S <filter>). We are unsure however if that actually solves issues like that (not easy to reproduce in a small lab). Any chance you could test that when you have the opportunity? Something like comparing "lvs" vs "lvs -S lv_name=<name>" for a known undamaged lv.

Actions

Also available in: Atom PDF