Bug #37492
openceph-volume lvm is unreliable if the system contains broken disks
0%
Description
Almost all ceph-volume lvm
commands invoke some lvm command which access all disks. A simple example is just lvs
which will always read LVM data from all disks (even if you explicitly specify only the lv you are interested in). Almost all the lvm commands make at least one call to lvs
.
Now the problem is that lvs
can just hang indefintely if you have a disk in a bad broken state until you remove the disk or reboot the server. This is, of course, rare. More common is that it hangs for a minute or is just really slow. So ceph-volume
can become unusable if a unrelated disk is broken.
ceph-disk didn't have this problem because it didn't access all other disks when operating on one disk.
I don't think there is a feasible way around this since it's unfortunately a LVM problem :(
Updated by Jan Fajerski over 4 years ago
We're currently looking at avoiding calls to list all LVs in a system by relying on LVMs select feature (lvs -S <filter>). We are unsure however if that actually solves issues like that (not easy to reproduce in a small lab). Any chance you could test that when you have the opportunity? Something like comparing "lvs" vs "lvs -S lv_name=<name>" for a known undamaged lv.