Bug #37492: ceph-volume lvm is unreliable if the system contains broken disks - ceph-volume - Ceph

Actions

Copy link

Bug #37492

open

ceph-volume lvm is unreliable if the system contains broken disks

Added by Paul Emmerich over 5 years ago. Updated over 4 years ago.

Status:

New

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Almost all ceph-volume lvm commands invoke some lvm command which access all disks. A simple example is just lvs which will always read LVM data from all disks (even if you explicitly specify only the lv you are interested in). Almost all the lvm commands make at least one call to lvs.

Now the problem is that lvs can just hang indefintely if you have a disk in a bad broken state until you remove the disk or reboot the server. This is, of course, rare. More common is that it hangs for a minute or is just really slow. So ceph-volume can become unusable if a unrelated disk is broken.

ceph-disk didn't have this problem because it didn't access all other disks when operating on one disk.

I don't think there is a feasible way around this since it's unfortunately a LVM problem :(

Actions

Copy link

Updated by Jan Fajerski over 4 years ago

We're currently looking at avoiding calls to list all LVs in a system by relying on LVMs select feature (lvs -S <filter>). We are unsure however if that actually solves issues like that (not easy to reproduce in a small lab). Any chance you could test that when you have the opportunity? Something like comparing "lvs" vs "lvs -S lv_name=<name>" for a known undamaged lv.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » ceph-volume

Custom queries

Bug #37492

ceph-volume lvm is unreliable if the system contains broken disks

Updated by Jan Fajerski over 4 years ago