Bug #37502: lvm batch potentially creates multi-pv volume groups - ceph-volume - Ceph

Actions

Copy link

Bug #37502

closed

lvm batch potentially creates multi-pv volume groups

Added by Jan Fajerski over 5 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

34740

Crash signature (v1):

Crash signature (v2):

Description

Both bluestore and filestore MixedStrategy create one volume group if multiple free SSDs are detected. This can create scenarios where a single bad ssd device takes down significantly more OSDs than necessary.

Consider a machine with 2 SSDs and 10 spinners. A batch call with all drives will create one vg on both SSDs and place wal/db volume on this vg. If one SSD goes bad, the single vg is inaccessible and in turn all OSDs on the machine go down.

The better implementation would be to create one vg per pv/device.

Actions

Copy link

Updated by Jan Fajerski over 5 years ago

I guess the advantage of the current implementation is that a single vg is easier to manage?

Actions

Copy link

Updated by Alfredo Deza over 5 years ago

That is correct, we did try to implement this with one VG per backing device and it was incredibly difficult. Doing a single VG allows a far simpler implementation (although quite complex still).

Actions

Copy link

Updated by Martin Weiss about 5 years ago

So in case we have a ratio of i.e. 24:4 for spinner vs. NVMe setup - it is expected that a single NVMe failure takes the whole OSD host out of business?
In this case - is it possible to fall back to the previous non-LVM deployment method?

Actions

Copy link

Updated by Alfredo Deza about 5 years ago

Martin, you can create the LVs in any way that might be preferable for you, and then pass those onto ceph-volume (no batch):

ceph-volume lvm create --data /path/to/data-lv --db /path/to/data-db

The caveat being that it will involve more work to sort that out (and whatever failure domain you need).

Actions

Copy link

Updated by Martin Weiss about 5 years ago

Alfredo Deza wrote:

Martin, you can create the LVs in any way that might be preferable for you, and then pass those onto ceph-volume (no batch):

[...]

The caveat being that it will involve more work to sort that out (and whatever failure domain you need).

Thanks for the quick reply!

So if I understand this right we must not use the batch mode in productive environments as it creates the single point of failure VG and due to that do not use the value of ceph-volume handled LVM management?

Instead of this we should run sequentially through a self created process for VG and LV creation and then run "ceph-volume lvm create" sequentially?

Actions

Copy link

Updated by Jan Fajerski almost 5 years ago

Description updated (diff)

Actions

Copy link

Updated by Daniel Oliveira over 4 years ago

I started looking into this last week and already testing some changes we could propose in order to create one VG per OSD. Any further ideas/thoughts on this we should take into consideration now, while still testing/prosing these changes?

Actions

Copy link

Updated by Jan Fajerski over 4 years ago

Andrew and I chatted about this at some point. Andrew had the idea to push the lv creation into the the code of the create subcommand. This would then also create pv and vg if not already present. This seems like a good approach if you're not having plans already.

Actions

Copy link