Bug #37502
closed
lvm batch potentially creates multi-pv volume groups
Added by Jan Fajerski over 5 years ago.
Updated over 3 years ago.
Description
Both bluestore and filestore MixedStrategy create one volume group if multiple free SSDs are detected. This can create scenarios where a single bad ssd device takes down significantly more OSDs than necessary.
Consider a machine with 2 SSDs and 10 spinners. A batch call with all drives will create one vg on both SSDs and place wal/db volume on this vg. If one SSD goes bad, the single vg is inaccessible and in turn all OSDs on the machine go down.
The better implementation would be to create one vg per pv/device.
I guess the advantage of the current implementation is that a single vg is easier to manage?
That is correct, we did try to implement this with one VG per backing device and it was incredibly difficult. Doing a single VG allows a far simpler implementation (although quite complex still).
So in case we have a ratio of i.e. 24:4 for spinner vs. NVMe setup - it is expected that a single NVMe failure takes the whole OSD host out of business?
In this case - is it possible to fall back to the previous non-LVM deployment method?
Martin, you can create the LVs in any way that might be preferable for you, and then pass those onto ceph-volume (no batch):
ceph-volume lvm create --data /path/to/data-lv --db /path/to/data-db
The caveat being that it will involve more work to sort that out (and whatever failure domain you need).
Alfredo Deza wrote:
Martin, you can create the LVs in any way that might be preferable for you, and then pass those onto ceph-volume (no batch):
[...]
The caveat being that it will involve more work to sort that out (and whatever failure domain you need).
Thanks for the quick reply!
So if I understand this right we must not use the batch mode in productive environments as it creates the single point of failure VG and due to that do not use the value of ceph-volume handled LVM management?
Instead of this we should run sequentially through a self created process for VG and LV creation and then run "ceph-volume lvm create" sequentially?
- Description updated (diff)
I started looking into this last week and already testing some changes we could propose in order to create one VG per OSD. Any further ideas/thoughts on this we should take into consideration now, while still testing/prosing these changes?
Andrew and I chatted about this at some point. Andrew had the idea to push the lv creation into the the code of the create subcommand. This would then also create pv and vg if not already present. This seems like a good approach if you're not having plans already.
- Status changed from New to Fix Under Review
- Pull request ID set to 34740
- Status changed from Fix Under Review to Resolved
Also available in: Atom
PDF