Bug #10375
closedOSD journal partition link could be broken after off linking the disk via storcli and one has to manually create the link to be able to start OSD.
0%
Description
1. offline the disk via stocli
/opt/MegaRAID/storcli/storcli64 /c0 /e20 /s6 set offline
2. verify OSD goes down
3. online the disk again
/opt/MegaRAID/storcli/storcli64 /c0 /e20 /s6 set online
4. start the osd
/etc/init.d/ceph start osd.<id>
5. verify /var/log/messages that osd cannot be started due to mount failure:
Oct 21 21:55:24 svl-csl-b-ceph2-003 bash: 2014-10-21 21:55:24.030076 7f5dc44a87c0 -1 filestore(/var/lib/ceph/osd/ceph-129) mount failed to open journal /var/lib/ceph/osd/ceph-129/journal: (2) No such file or directory
6. check the soft link of the journal partition and confirm it is broken:
ls -l var/lib/ceph/osd/ceph-129/journal
lsblk | grep 129 to find out the disk for the osd
cd /dev/disk/by-partuuid
ls -l | grep <journal partition, e.g. sdi2> and verify it is MISSING
7. create the partition link: e.g
ln -s ../../sdi2 84d73a48-5a26-4f95-80a2-09b5b0871ebf
8. restart OSD and verify it is running:
/etc/init.d/ceph start osd.129
tail -n50 /var/log/messages:
Oct 21 23:58:07 svl-csl-b-ceph2-003 systemd: Starting /usr/bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 129 --pid-file /var/run/ceph/osd.129.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Oct 21 23:58:07 svl-csl-b-ceph2-003 systemd: Started /usr/bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 129 --pid-file /var/run/ceph/osd.129.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Oct 21 23:58:07 svl-csl-b-ceph2-003 bash: starting osd.129 at :/0 osd_data /var/lib/ceph/osd/ceph-129 /var/lib/ceph/osd/ceph-129/journal
Updated by Sage Weil over 9 years ago
it sounds like udev is failing to create the link. this is a problem with either the megaraid driver or with udev..
Updated by Sage Weil over 9 years ago
- Status changed from New to Rejected
Tupper, is this reproducible? which kernel? doesn't sounds like a ceph problem.