Feature #11881
closedceph-disk support for multipath
0%
Description
Need review¶
- ceph : https://github.com/ceph/ceph/pull/5699
- ceph-qa-suite : https://github.com/ceph/ceph-qa-suite/pull/546
- teuthology : https://github.com/ceph/teuthology/pull/606
What needs to be fixed in ceph-disk¶
We have these udev rules
that call ceph-disk when they see GPT labels indicating the partition is
used for ceph OSD data or journal. When connecting a device with multipath, what probably happens is that the original device shows up twice (for both paths), and then there is a dm device that we should be consuming. Instead, we're triggering ceph-disk on both of the underlying devices (and not on the dm one).
We can make ceph-disk ignore a device if it appears to be a slave (i.e.,
it appears as /sys/block/*/slaves/$DEV for some other device). This may race if the dm setup is also udev driven.
What other udev mechanism we need to use ?¶
- DM_MULTIPATH_DEVICE_PATH udev environment variable. This is set by 62-multipath.rules for all devices that will become paths in a multipath device, and all their partitions. There are a couple gotchas, however.
- It won't get set the very first time a piece of storage hardware is seen by the system. With find_multipaths enabled, there's no way for multipath to know ahead of time if a new piece of storage will be multipathable when it first appears.
- If the storage gets discovered in the initramfs, multipath won't be able to update it's list of multipathable devices there. This means it will fail to get correctly labelled in the initramfs every time, until the initramfs is remade. That's why it is advisable to remake the initramfs after adding new hardware to a multipath system (there are ways of working around this if that's incredibly important).
But aside from new hardware (which probably won't be labelled for
ceph-disk the first time it appears anyway), this should work to keep
you from messing with multipath's path devices.
Set up multipath without actually having any multipath hardware¶
- modprobe scsi_debug vpd_use_hostno=0 add_host=2 dev_size_mb=100
will get you 2 100MB scsi_debug devices that multipath will happily set
itself up on. But I don't know if this will help you, since you can't
choose what's on the device when it gets discovered. The device usually
(always?) comes up with the same WWID, so after the first time, the
multipath udev rules will flag it correctly, but I doubt that the
95-ceph-osd.rules would pick it up.
With an unused scsi device¶
- multipath <scsi_device>
Multipath will make a multipath device with only one path on top of it,
and will remember that it is supposed to be multipathed in the future.
Once you're all done with it, run:
- multipath -f <multipath_device>
to remove the multipath device, and
- multipath -w <scsi_device>
To wipe the information that it's supposed to be a multipath device (so
that it won't automatically get multipathed the next time you start up).
If the unused scsi is a normal disk (say /dev/sdb) this allows testing everything except that another device (say, /dev/sdf) won't appear that has the same disk.
Testing with multiple devices¶
Setup an iscsi session, duplicate it by
- Getting the sessionid: iscsiadm -m session -P1 | grep SID
- Create a new session with the same data: iscsiadm -m session -r <sessionid> -o new
This should create another iscsi device for each of the existing
session, and multipathd will notice that there are two paths to the
same disk, and automatically create a multipath device on it.
Additional information¶
Files
Updated by Loïc Dachary almost 9 years ago
- Description updated (diff)
- Status changed from 12 to In Progress
- Assignee set to Loïc Dachary
Updated by Loïc Dachary over 8 years ago
Would it be an option to have ceph-disk prepare /dev/mapper/mpath create a file /var/lib/ceph/osd/ceph-0/multipath to record the fact that it is intended to be used via multipath ? If udev runs ceph-disk to activate such an osd when receiving an event unrelated to a multipath device, it would do nothing.
Alternatively we could have a set of UUIDs ( https://github.com/ceph/ceph/blob/master/src/ceph-disk#L80 ) meaning "this is accessed via multipath, skip if not multipath !". Which would make it possible for https://github.com/ceph/ceph/blob/master/udev/95-ceph-osd.rules to decide weither a device should be used or not without calling ceph-disk nor mounting the device.
Updated by Loïc Dachary over 8 years ago
To find out if mpatha as found in /dev/mapper is managed by multipath:
test $(dmsetup info --columns --noheading --options subsystem --select name=mpatha) = "mpath"
Updated by Loïc Dachary over 8 years ago
- File prepare-sda-udev.log prepare-sda-udev.log added
- File prepare-mpatha-udev.log prepare-mpatha-udev.log added
- File prepare-activate-vdb-udev.log prepare-activate-vdb-udev.log added
On an OpenStack instance with a /dev/vdb disk did something like:
[centos@try ~]$ sudo multipath -f /dev/mapper/mpatha [centos@try ~]$ sudo rmmod scsi_debug [centos@try ~]$ sudo modprobe scsi_debug vpd_use_hostno=0 add_host=1 dev_size_mb=200 [centos@try ~]$ sudo ceph-disk --verbose prepare /dev/mapper/mpatha
For /dev/vdb, /dev/sda (from scsi_debug) and /dev/mapper/mpatha and the logs of
sudo systemctl stop systemd-udevd sudo /usr/lib/systemd/systemd-udevd --debug
are attached for each of them. The /dev/vdb required an additional activate, the /dev/sda auto-activated and /dev/mapper/mpatha did not activate and failed to trigger the necessary udev rules.
- ceph-disk was hacked to accept /dev/mapper/mpatha
- /lib/udev/rules.d/60-ceph-partuuid-workaround.rules changed to not blacklist dm-* and create the part-uuid
Updated by Loïc Dachary over 8 years ago
The scsi_debug kernel module on Ubuntu 14.04 is with the linux-image-extra-3.13.0-61-generic package
Updated by Loïc Dachary over 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Loïc Dachary over 8 years ago
To manually try from sources:
$ sudo ceph-deploy install --testing try $ git clone -b wip-11881-multipath http://github.com/dachary/ceph $ cd ceph $ sudo ln -sf $(pwd)/src/ceph-disk /usr/sbin/ceph-disk $ sudo ln -sf $(pwd)/udev/60-ceph-partuuid-workaround.rules /lib/udev/rules.d/60-ceph-partuuid-workaround.rules $ sudo ln -sf $(pwd)/udev/95-ceph-osd.rules /lib/udev/rules.d/95-ceph-osd.rules
Updated by Loïc Dachary over 8 years ago
Running the ceph-deploy suite will use ceph-disk on various distributions
teuthology-openstack --verbose --key-name myself --key-filename ~/Downloads/myself --suite ceph-deploy --suite-branch wip-11881-multipath --email loic@dachary.org --distro ubuntu --filter=ubuntu_14.04 --ceph wip-11881-multipath
- OpenStack ceph-deploy related pull request : https://github.com/ceph/ceph-qa-suite/pull/536
Updated by Chaitanya Huilgol over 8 years ago
Loic,
couple of issue we have seen with multipath experiments.
1. the parttuid links are not very reliable for multipath device as these are usually hotplug. We have seen the parttypeuuid and partuuid links getting reset to underlying device nodes as the order of udev calling the rule scripts is not fixed
2. The holders based partition detection may not hold true in case of dm-crypt on mpath (or any other layering)
We are working on a port of ceph-disk that uses pyudev (udev database), the changes are rather extensive and I am not sure how much of that can be consumed here.
Once I am done with the changes, I will commit it in my github. Probably you can have a look and advice.
Thanks
Chaitanya
Chaitanya.huilgol@sandisk.com
Updated by Loïc Dachary over 8 years ago
The holders based partition detection may not hold true in case of dm-crypt on mpath (or any other layering)
Yes. The dm-crypt logic should be tested when the underlying device is dm-*, that should be addressed in a separate pull request.
Updated by Loïc Dachary over 8 years ago
the parttuid links are not very reliable for multipath device as these are usually hotplug. We have seen the parttypeuuid and partuuid links getting reset to underlying device nodes as the order of udev calling the rule scripts is not fixed
I would be very interested to have a way to reproduce that. As far as I understand all should be fine because:
- either partuuid etc. are handled by the native udev rules in which case the ceph udev rule cannot override them because it does not have precedence
- ceph udev rules (workaround) is creating the symlinks and there is no conflict
Am I missing something ?
Updated by Loïc Dachary over 8 years ago
Once I am done with the changes, I will commit it in my github. Probably you can have a look and advice.
Extremely interesting ! Would you mind pushing the code to a public location right now so I can take a look ? Even if it is not finished, it might be a great source of inspiration. It is also a lot easier to comment on an extensive change during the early stages of development.
Updated by Chaitanya Huilgol over 8 years ago
Actually we were unable to create the parttype uuid links using the udev rules.
The part-uuid and type-uuid do not seem to be populated on the 'add' event for DM mpath devices. The blkid also does not return these values either?
Does this work for you on real scsi devices? (not the scsi_debug ones)
We ended up adding a rule to call a script on mpath add which would read the GPT partition and create links for the device. But this was not reliable and the links we created with the script would get reset by the rule invocation for the underlying block device.
Updated by Loïc Dachary over 8 years ago
We ended up adding a rule to call a script on mpath add which would read the GPT partition and create links for the device
I'm writing tests to verify all works as expected. I've also noticed that udev events were not called when multipath creates the dm devices and I've not yet come up with a workaround. Would you mind publishing your script somewhere for me to take a look ? It would help a lot.
Thanks !
Updated by Loïc Dachary over 8 years ago
Partitioning via multipath on Ubuntu 14.04 fails in a strange way, reported bug at https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1488688
Updated by Loïc Dachary over 8 years ago
Integration tests for multipath now pass on CentOS 7 non regression tests pass on Ubuntu 14.04. They should all pass on both once unrelated issues are resolved.
A workaround is added (explicit call to ceph-disk activate) until the CentOS activation bug http://tracker.ceph.com/issues/12786 is fixed.
The multipath tests do not run on Ubuntu because of the multipath / device mapper bug https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1488688 and it has not been tested on Debian.
Updated by Loïc Dachary over 8 years ago
teuthology-openstack --verbose --upload --key-name myself --key-filename ~/Downloads/myself --suite ceph-disk --suite-branch wip-11881-multipath --distro ubuntu --filter=centos_7 --ceph wip-11881-multipath debug/remove-me.yaml
teuthology-openstack --upload --verbose --key-name myself --key-filename ~/Downloads/myself --suite ceph-disk --suite-branch wip-11881-multipath --distro ubuntu --filter=ubuntu_14 --ceph wip-11881-multipath debug/remove-me.yaml
Updated by Loïc Dachary over 8 years ago
- Status changed from Fix Under Review to Resolved
Updated by Chaitanya Huilgol over 8 years ago
Loic,
Sorry for the delay, here is the git will all the changes (This is giant code, however the changes are generic)
The README provides few details on the changes and the issues we have hit. Some of our SSDs have 4K sector sizes and this requires some special handling while creating partitions.
https://github.com/chaitanyahuilgol/ceph-disk-udev.git
Also, the udev rule for activation of OSDs is in a different rules file. For some reason we are not able to determine the partition uuids on a DM device via blkid and hence we have resorted to reading these from the disk ourselves.
Let me know your thoughts.
Regards,
Chaitanya
Updated by Christian Hüning over 8 years ago
I really appreciate this ticket and all the work gone into it, since I just ran into issues with multipathing and Ceph.
Can you tell me whether the support for multipathing will be included in the 0.94.4 release of HAMMER?
Updated by Loïc Dachary over 8 years ago
It is a new feature and I don't think there is plan to backport this feature to Hammer.
Updated by Christian Hüning over 8 years ago
So this'll be in 9.x.x ?
So is there any version of CEPH to use right now, which is production ready (stable) and supports device mapper multipathing?
Updated by Loïc Dachary over 8 years ago
It will be part of the Infernalis stable release. It is not currently available in a stable release.
Updated by Christian Hüning over 8 years ago
Ok. any idea when Infernalis stable might be out?
Updated by Loïc Dachary over 8 years ago
Infernalis is feature freeze and release critical bugs are being fixed. The release will be published when all release critical fixes have been dealt with and the integration tests are stable.
Updated by Nathan Cutler about 7 years ago
- Related to Bug #19489: ceph-disk: failing to activate osd with multipath added