Project

General

Profile

Actions

Feature #11881

closed

ceph-disk support for multipath

Added by Loïc Dachary almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Need review

What needs to be fixed in ceph-disk

We have these udev rules

that call ceph-disk when they see GPT labels indicating the partition is
used for ceph OSD data or journal. When connecting a device with multipath, what probably happens is that the original device shows up twice (for both paths), and then there is a dm device that we should be consuming. Instead, we're triggering ceph-disk on both of the underlying devices (and not on the dm one).

We can make ceph-disk ignore a device if it appears to be a slave (i.e.,
it appears as /sys/block/*/slaves/$DEV for some other device). This may race if the dm setup is also udev driven.

What other udev mechanism we need to use ?

  • DM_MULTIPATH_DEVICE_PATH udev environment variable. This is set by 62-multipath.rules for all devices that will become paths in a multipath device, and all their partitions. There are a couple gotchas, however.
    • It won't get set the very first time a piece of storage hardware is seen by the system. With find_multipaths enabled, there's no way for multipath to know ahead of time if a new piece of storage will be multipathable when it first appears.
    • If the storage gets discovered in the initramfs, multipath won't be able to update it's list of multipathable devices there. This means it will fail to get correctly labelled in the initramfs every time, until the initramfs is remade. That's why it is advisable to remake the initramfs after adding new hardware to a multipath system (there are ways of working around this if that's incredibly important).

But aside from new hardware (which probably won't be labelled for
ceph-disk the first time it appears anyway), this should work to keep
you from messing with multipath's path devices.

Set up multipath without actually having any multipath hardware

  • modprobe scsi_debug vpd_use_hostno=0 add_host=2 dev_size_mb=100

will get you 2 100MB scsi_debug devices that multipath will happily set
itself up on. But I don't know if this will help you, since you can't
choose what's on the device when it gets discovered. The device usually
(always?) comes up with the same WWID, so after the first time, the
multipath udev rules will flag it correctly, but I doubt that the
95-ceph-osd.rules would pick it up.

With an unused scsi device

  • multipath <scsi_device>

Multipath will make a multipath device with only one path on top of it,
and will remember that it is supposed to be multipathed in the future.

Once you're all done with it, run:

  • multipath -f <multipath_device>

to remove the multipath device, and

  • multipath -w <scsi_device>

To wipe the information that it's supposed to be a multipath device (so
that it won't automatically get multipathed the next time you start up).

If the unused scsi is a normal disk (say /dev/sdb) this allows testing everything except that another device (say, /dev/sdf) won't appear that has the same disk.

Testing with multiple devices

Setup an iscsi session, duplicate it by

  • Getting the sessionid: iscsiadm -m session -P1 | grep SID
  • Create a new session with the same data: iscsiadm -m session -r <sessionid> -o new

This should create another iscsi device for each of the existing
session, and multipathd will notice that there are two paths to the
same disk, and automatically create a multipath device on it.

Additional information

See also https://github.com/ceph/ceph/pull/1229


Files

prepare-sda-udev.log (134 KB) prepare-sda-udev.log Loïc Dachary, 08/14/2015 10:07 PM
prepare-mpatha-udev.log (72.3 KB) prepare-mpatha-udev.log Loïc Dachary, 08/14/2015 10:07 PM
prepare-activate-vdb-udev.log (53.7 KB) prepare-activate-vdb-udev.log Loïc Dachary, 08/14/2015 10:07 PM

Related issues 1 (0 open1 closed)

Related to devops - Bug #8160: multipath-tools does not co-exist with cephDuplicate04/19/2014

Actions
Actions #1

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #2

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #3

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #4

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #5

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #6

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #7

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
Actions #8

Updated by Loïc Dachary almost 9 years ago

  • Description updated (diff)
  • Status changed from 12 to In Progress
  • Assignee set to Loïc Dachary
Actions #9

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #10

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #11

Updated by Loïc Dachary over 8 years ago

Would it be an option to have ceph-disk prepare /dev/mapper/mpath create a file /var/lib/ceph/osd/ceph-0/multipath to record the fact that it is intended to be used via multipath ? If udev runs ceph-disk to activate such an osd when receiving an event unrelated to a multipath device, it would do nothing.

Alternatively we could have a set of UUIDs ( https://github.com/ceph/ceph/blob/master/src/ceph-disk#L80 ) meaning "this is accessed via multipath, skip if not multipath !". Which would make it possible for https://github.com/ceph/ceph/blob/master/udev/95-ceph-osd.rules to decide weither a device should be used or not without calling ceph-disk nor mounting the device.

Actions #12

Updated by Loïc Dachary over 8 years ago

To find out if mpatha as found in /dev/mapper is managed by multipath:

test $(dmsetup info --columns --noheading --options subsystem --select name=mpatha) = "mpath" 

Updated by Loïc Dachary over 8 years ago

On an OpenStack instance with a /dev/vdb disk did something like:

[centos@try ~]$ sudo multipath -f /dev/mapper/mpatha
[centos@try ~]$ sudo rmmod scsi_debug
[centos@try ~]$ sudo modprobe scsi_debug vpd_use_hostno=0 add_host=1 dev_size_mb=200
[centos@try ~]$ sudo ceph-disk --verbose prepare /dev/mapper/mpatha

For /dev/vdb, /dev/sda (from scsi_debug) and /dev/mapper/mpatha and the logs of

sudo systemctl stop systemd-udevd
sudo /usr/lib/systemd/systemd-udevd --debug

are attached for each of them. The /dev/vdb required an additional activate, the /dev/sda auto-activated and /dev/mapper/mpatha did not activate and failed to trigger the necessary udev rules.

  • ceph-disk was hacked to accept /dev/mapper/mpatha
  • /lib/udev/rules.d/60-ceph-partuuid-workaround.rules changed to not blacklist dm-* and create the part-uuid
Actions #14

Updated by Loïc Dachary over 8 years ago

The scsi_debug kernel module on Ubuntu 14.04 is with the linux-image-extra-3.13.0-61-generic package

Actions #15

Updated by Loïc Dachary over 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #16

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #17

Updated by Loïc Dachary over 8 years ago

To manually try from sources:

$ sudo ceph-deploy install --testing try
$ git clone -b wip-11881-multipath http://github.com/dachary/ceph
$ cd ceph
$ sudo ln -sf $(pwd)/src/ceph-disk /usr/sbin/ceph-disk
$ sudo ln -sf $(pwd)/udev/60-ceph-partuuid-workaround.rules /lib/udev/rules.d/60-ceph-partuuid-workaround.rules
$ sudo ln -sf $(pwd)/udev/95-ceph-osd.rules /lib/udev/rules.d/95-ceph-osd.rules

Actions #18

Updated by Loïc Dachary over 8 years ago

Running the ceph-deploy suite will use ceph-disk on various distributions

teuthology-openstack --verbose --key-name myself --key-filename ~/Downloads/myself --suite ceph-deploy --suite-branch wip-11881-multipath --email loic@dachary.org --distro ubuntu --filter=ubuntu_14.04 --ceph wip-11881-multipath 

Actions #19

Updated by Chaitanya Huilgol over 8 years ago

Loic,

couple of issue we have seen with multipath experiments.
1. the parttuid links are not very reliable for multipath device as these are usually hotplug. We have seen the parttypeuuid and partuuid links getting reset to underlying device nodes as the order of udev calling the rule scripts is not fixed
2. The holders based partition detection may not hold true in case of dm-crypt on mpath (or any other layering)

We are working on a port of ceph-disk that uses pyudev (udev database), the changes are rather extensive and I am not sure how much of that can be consumed here.
Once I am done with the changes, I will commit it in my github. Probably you can have a look and advice.

Thanks
Chaitanya

Actions #20

Updated by Loïc Dachary over 8 years ago

The holders based partition detection may not hold true in case of dm-crypt on mpath (or any other layering)

Yes. The dm-crypt logic should be tested when the underlying device is dm-*, that should be addressed in a separate pull request.

Actions #21

Updated by Loïc Dachary over 8 years ago

the parttuid links are not very reliable for multipath device as these are usually hotplug. We have seen the parttypeuuid and partuuid links getting reset to underlying device nodes as the order of udev calling the rule scripts is not fixed

I would be very interested to have a way to reproduce that. As far as I understand all should be fine because:

  • either partuuid etc. are handled by the native udev rules in which case the ceph udev rule cannot override them because it does not have precedence
  • ceph udev rules (workaround) is creating the symlinks and there is no conflict

Am I missing something ?

Actions #22

Updated by Loïc Dachary over 8 years ago

Once I am done with the changes, I will commit it in my github. Probably you can have a look and advice.

Extremely interesting ! Would you mind pushing the code to a public location right now so I can take a look ? Even if it is not finished, it might be a great source of inspiration. It is also a lot easier to comment on an extensive change during the early stages of development.

Actions #23

Updated by Chaitanya Huilgol over 8 years ago

Actually we were unable to create the parttype uuid links using the udev rules.
The part-uuid and type-uuid do not seem to be populated on the 'add' event for DM mpath devices. The blkid also does not return these values either?
Does this work for you on real scsi devices? (not the scsi_debug ones)
We ended up adding a rule to call a script on mpath add which would read the GPT partition and create links for the device. But this was not reliable and the links we created with the script would get reset by the rule invocation for the underlying block device.

Actions #24

Updated by Loïc Dachary over 8 years ago

We ended up adding a rule to call a script on mpath add which would read the GPT partition and create links for the device

I'm writing tests to verify all works as expected. I've also noticed that udev events were not called when multipath creates the dm devices and I've not yet come up with a workaround. Would you mind publishing your script somewhere for me to take a look ? It would help a lot.

Thanks !

Actions #25

Updated by Loïc Dachary over 8 years ago

Partitioning via multipath on Ubuntu 14.04 fails in a strange way, reported bug at https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1488688

Actions #26

Updated by Loïc Dachary over 8 years ago

Integration tests for multipath now pass on CentOS 7 non regression tests pass on Ubuntu 14.04. They should all pass on both once unrelated issues are resolved.

A workaround is added (explicit call to ceph-disk activate) until the CentOS activation bug http://tracker.ceph.com/issues/12786 is fixed.

The multipath tests do not run on Ubuntu because of the multipath / device mapper bug https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1488688 and it has not been tested on Debian.

Actions #27

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #28

Updated by Loïc Dachary over 8 years ago

teuthology-openstack --verbose --upload --key-name myself --key-filename ~/Downloads/myself --suite ceph-disk --suite-branch wip-11881-multipath --distro ubuntu --filter=centos_7 --ceph wip-11881-multipath debug/remove-me.yaml
teuthology-openstack --upload --verbose --key-name myself --key-filename ~/Downloads/myself --suite ceph-disk --suite-branch wip-11881-multipath --distro ubuntu --filter=ubuntu_14 --ceph wip-11881-multipath debug/remove-me.yaml
Actions #29

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #30

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #31

Updated by Loïc Dachary over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions #32

Updated by Chaitanya Huilgol over 8 years ago

Loic,

Sorry for the delay, here is the git will all the changes (This is giant code, however the changes are generic)
The README provides few details on the changes and the issues we have hit. Some of our SSDs have 4K sector sizes and this requires some special handling while creating partitions.

https://github.com/chaitanyahuilgol/ceph-disk-udev.git

Also, the udev rule for activation of OSDs is in a different rules file. For some reason we are not able to determine the partition uuids on a DM device via blkid and hence we have resorted to reading these from the disk ourselves.

Let me know your thoughts.

Regards,
Chaitanya

Actions #33

Updated by Christian Hüning over 8 years ago

I really appreciate this ticket and all the work gone into it, since I just ran into issues with multipathing and Ceph.
Can you tell me whether the support for multipathing will be included in the 0.94.4 release of HAMMER?

Actions #34

Updated by Loïc Dachary over 8 years ago

It is a new feature and I don't think there is plan to backport this feature to Hammer.

Actions #35

Updated by Christian Hüning over 8 years ago

So this'll be in 9.x.x ?
So is there any version of CEPH to use right now, which is production ready (stable) and supports device mapper multipathing?

Actions #36

Updated by Loïc Dachary over 8 years ago

It will be part of the Infernalis stable release. It is not currently available in a stable release.

Actions #37

Updated by Christian Hüning over 8 years ago

Ok. any idea when Infernalis stable might be out?

Actions #38

Updated by Loïc Dachary over 8 years ago

Infernalis is feature freeze and release critical bugs are being fixed. The release will be published when all release critical fixes have been dealt with and the integration tests are stable.

Actions #39

Updated by Nathan Cutler about 7 years ago

  • Related to Bug #19489: ceph-disk: failing to activate osd with multipath added
Actions

Also available in: Atom PDF