Actions
Bug #23001
openceph-volume should destroy vgs and lvs on OSD creation failure
Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
root@reesi001:~# ceph-volume lvm prepare --bluestore --data /dev/sda --journal /dev/journals/lvol0 Running command: ceph-authtool --gen-print-key Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 43d2cd4c-af19-4e0f-bb2e-816cb4c5bcf4 Running command: vgcreate --force --yes ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a /dev/sda stdout: Volume group "ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a" successfully created Running command: lvcreate --yes -l 100%FREE -n osd-block-43d2cd4c-af19-4e0f-bb2e-816cb4c5bcf4 ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a stdout: Logical volume "osd-block-43d2cd4c-af19-4e0f-bb2e-816cb4c5bcf4" created. Running command: ceph-authtool --gen-print-key --> Was unable to complete a new OSD, will rollback changes --> OSD will be fully purged from the cluster, because the ID was generated Running command: ceph osd purge osd.95 --yes-i-really-mean-it stderr: purged osd.95 --> RuntimeError: "ceph" user is not available in the current system root@reesi001:~# lvdisplay /dev/ceph* --- Logical volume --- LV Path /dev/ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a/osd-block-43d2cd4c-af19-4e0f-bb2e-816cb4c5bcf4 LV Name osd-block-43d2cd4c-af19-4e0f-bb2e-816cb4c5bcf4 VG Name ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a LV UUID PWZkYb-odTe-mhOt-fCp9-1qvc-fL1n-dsfn5Y LV Write Access read/write LV Creation host, time reesi001, 2018-02-14 11:21:36 -0500 LV Status available # open 0 LV Size 3.64 TiB Current LE 953861 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:12
If creating a new OSD and there's no data on the drive, wouldn't it make sense to remove the logical volume and volume group that was created so the device(s) can be reused?
Updated by David Galloway about 6 years ago
I get the following when attempting to reuse a device that already has a lv and vg
[2018-02-14 11:19:39,536][ceph_volume.process][INFO ] Running command: ceph-authtool --gen-print-key [2018-02-14 11:19:39,564][ceph_volume.process][INFO ] stdout AQCbYYRaMyt+IRAAqHLEjNhqfUwL2nqZUOOcPA== [2018-02-14 11:19:39,564][ceph_volume.process][INFO ] Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 123a232f-c500-424b-8886-0b5152440fe7 [2018-02-14 11:19:39,850][ceph_volume.process][INFO ] stdout 95 [2018-02-14 11:19:39,851][ceph_volume.process][INFO ] Running command: lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /dev/sda [2018-02-14 11:19:39,856][ceph_volume.process][INFO ] stdout NAME="sda" KNAME="sda" MAJ:MIN="8:0" FSTYPE="LVM2_member" MOUNTPOINT="" LABEL="" UUID="2veNK8-aBMq-g6HU-B184-ShOc-BDsj-AVHCzV" RO="0" RM="0" MODEL="ST4000NM0025 " SIZE="3.7T" STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw----" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL="" [2018-02-14 11:19:39,856][ceph_volume.process][INFO ] Running command: lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /dev/sda [2018-02-14 11:19:39,862][ceph_volume.process][INFO ] stdout NAME="sda" KNAME="sda" MAJ:MIN="8:0" FSTYPE="LVM2_member" MOUNTPOINT="" LABEL="" UUID="2veNK8-aBMq-g6HU-B184-ShOc-BDsj-AVHCzV" RO="0" RM="0" MODEL="ST4000NM0025 " SIZE="3.7T" STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw----" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL="" [2018-02-14 11:19:39,863][ceph_volume.process][INFO ] Running command: vgs --noheadings --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free [2018-02-14 11:19:39,878][ceph_volume.process][INFO ] stdout ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a";"1";"1";"0";"wz--n-";"3.64t";"0 [2018-02-14 11:19:39,879][ceph_volume.process][INFO ] stdout journals";"1";"12";"0";"wz--n-";"372.60g";"616.00m [2018-02-14 11:19:39,879][ceph_volume.process][INFO ] stdout osd";"1";"0";"0";"wz--n-";"365.15g";"365.15g [2018-02-14 11:19:39,879][ceph_volume.process][INFO ] Running command: vgcreate --force --yes ceph-6848c821-6673-41e3-a91e-dc5d61434728 /dev/sda [2018-02-14 11:19:39,894][ceph_volume.process][INFO ] stderr Physical volume '/dev/sda' is already in volume group 'ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a' Unable to add physical volume '/dev/sda' to volume group 'ceph-6848c821-6673-41e3-a91e-dc5d61434728'. [2018-02-14 11:19:39,894][ceph_volume.devices.lvm.prepare][ERROR ] lvm prepare was unable to complete [2018-02-14 11:19:39,895][ceph_volume.devices.lvm.prepare][INFO ] will rollback OSD ID creation [2018-02-14 11:19:39,895][ceph_volume.process][INFO ] Running command: ceph osd purge osd.95 --yes-i-really-mean-it [2018-02-14 11:19:40,320][ceph_volume.process][INFO ] stderr purged osd.95 [2018-02-14 11:19:40,335][ceph_volume][ERROR ] exception caught by decorator Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python2.7/dist-packages/ceph_volume/main.py", line 152, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python2.7/dist-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/main.py", line 38, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python2.7/dist-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 365, in main self.safe_prepare(args) File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 216, in safe_prepare self.prepare(args) File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 282, in prepare block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid) File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", line 196, in prepare_device api.create_vg(vg_name, arg) File "/usr/lib/python2.7/dist-packages/ceph_volume/api/lvm.py", line 209, in create_vg name] + list(devices) File "/usr/lib/python2.7/dist-packages/ceph_volume/process.py", line 138, in run raise RuntimeError(msg) RuntimeError: command returned non-zero exit status: 5
Updated by Alfredo Deza almost 6 years ago
- Status changed from New to 12
We can't destroy a vg/lv automatically because it is entirely possible to have other OSDs living in lvs that come from the same vg.
However, what should've been better in this situation is to report back when LVM found this:
[2018-02-14 11:19:39,894][ceph_volume.process][INFO ] stderr Physical volume '/dev/sda' is already in volume group 'ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a'
A pre-check that catches this would be good.
Actions