Project

General

Profile

Actions

Bug #23645

closed

hot plug disk might not work ceph-volume

Added by Vasu Kulkarni about 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From the list:

Sata hdds, this happen on running server, without reboot.
Due to hardware problem, vibration, human factor, anything, sata host loose connection with drive and /dev/sda disappears. Than operator unplug/plug it and without lvm it can appear with same node /dev/sda and with different one, it does not matrer - osd will be started.
But in case of lvm /dev/dm-0 holds lvm objects and sda node, so disk got next letter (/dev/sdt for example), but lvm can't create lv with same uid, so lsblk does not see logical volume on this disk
Yes everything will be fixed after reboot, but I don't think it is a solution.

Workaround as done by user:
And I have to perform a list of manual actions to start osd:

- remove device mapper device:
sudo dmsetup remove /dev/dm-8

- disable new block device and rescan scsi to make lvm volume appear:
echo 1 | sudo tee /sys/block/sdb/device/delete
echo "- - -" | sudo tee /sys/class/scsi_host/host0/scan

- maybe umount osd direcroty (I'm not sure if it is required):
sudo umount /var/lib/ceph/osd/ceph-12

- list osd disks to get lv name (osd fsid):
sudo ceph-volume lvm list

- And finally start osd:
sudo ceph-volume lvm trigger 12-92b66a98-1c35-40a8-bf5b-ac123c366166


Files

23645.txt (50 KB) 23645.txt Aleksei Gutikov, 05/02/2018 09:19 AM
Actions #1

Updated by Alfredo Deza about 6 years ago

  • Status changed from New to Need More Info

This is still not very clear to me. You mention a plug/unplug of disks
that make the device path change, but
then that "lvm can't create lv with same uid". So this is before the
OSD is running? or where exactly in the process is this?

In any case, you could just refresh LVM's cache by running: vgscan

The docs explains this better:

LVM runs the vgscan command automatically at system startup and at other times during LVM operation, such as when you execute a vgcreate command or when LVM detects an inconsistency. You may need to run the vgscan > command manually when you change your hardware configuration, causing new devices to be visible to the system that were not present at system bootup.
This may be necessary, for example, when you add new disks to the > system on a SAN or hotplug a new disk that has been labeled as a physical volume.

If you run that, do you have issues still?

If the problem goes away with vgscan, we could just add it to the unit
that activates/starts the OSD. But without some confirmation on your end, we can't really try to fix this

Actions #2

Updated by Alfredo Deza almost 6 years ago

  • Status changed from Need More Info to Can't reproduce
Actions #3

Updated by Aleksei Gutikov almost 6 years ago

Please, find in attach output of some useful commands (lsblk, ceph-volume, vgscan, etc...)

What happened (in my understanding):

1) OSD was running on /dev/dm-2 on /dev/sdk with serial WD-WMC6M0D3AZMS

1.1) OSD crashed after io error.

2) /dev/sdk disappeared and this disk appeared as /dev/sdl (see output of sudo ls /sys/block/dm-2/slaves/ -lah)

2.1) logical volume /dev/ceph-fda7bc3b-0047-45c8-8f16-5a8764664a9f/osd-block-52032eb1-698d-409c-94ee-385c76825638
created on /dev/sdl

2.2) /dev/dm-2 now uses /dev/sdl

2.3) OSD was not starting during timeout (ceph osd metadata not updated)

3) /dev/sdl disappeared and this disk appeared as /dev/sdj

3.1) logical volume was not created, /dev/dm-2 still holds /dev/sdl

3.2) lvm does not see logical volume on /dev/sdj
3.3) OSD tries use /dev/dm-2 but get IO error (see dmesg and OSD log)

4) Can be fixed by reboot or with sequence of command I've listed in first email (also see in attachment)

Actions #4

Updated by Alfredo Deza almost 6 years ago

  • Status changed from Can't reproduce to New
Actions #5

Updated by Alfredo Deza almost 6 years ago

Thanks Aleksei for providing extra information. Would it be possible for you to try another thing here?

sudo lvchange --refresh vg/lv

According to the LVM docs, it should help update the device mapper to point to the right device.

Actions #6

Updated by Aleksei Gutikov almost 6 years ago

Sure, I'll reproduce it again and try this command.

Actions #7

Updated by Alfredo Deza almost 6 years ago

Any luck trying out the suggestion?

Actions #8

Updated by Alfredo Deza almost 6 years ago

  • Status changed from New to Need More Info

Aleksei, any luck?

Actions #9

Updated by Alfredo Deza over 5 years ago

  • Status changed from Need More Info to Closed
Actions

Also available in: Atom PDF