Bug #19781
OSDs fail to start after reboot
0%
Description
I have 12 OSDs per server, and after a reboot only 7 or so come up.
The other ones are not even mounted:
$ mount | grep ceph
/dev/sdh1 on /var/lib/ceph/osd/hlm1-ceph01-30 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdf1 on /var/lib/ceph/osd/hlm1-ceph01-28 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdg1 on /var/lib/ceph/osd/hlm1-ceph01-29 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sda1 on /var/lib/ceph/osd/hlm1-ceph01-23 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdj1 on /var/lib/ceph/osd/hlm1-ceph01-32 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sde1 on /var/lib/ceph/osd/hlm1-ceph01-27 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdb1 on /var/lib/ceph/osd/hlm1-ceph01-24 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
The systemd services are failed:
ruben@hlm1-pod12-ceph05: ~$ sudo systemctl --state=failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● ceph-disk@dev-nvme0n1p13.service loaded failed failed Ceph disk activation: /dev/nvme0n1p13
● ceph-disk@dev-nvme0n1p15.service loaded failed failed Ceph disk activation: /dev/nvme0n1p15
● ceph-disk@dev-nvme0n1p16.service loaded failed failed Ceph disk activation: /dev/nvme0n1p16
● ceph-disk@dev-nvme0n1p5.service loaded failed failed Ceph disk activation: /dev/nvme0n1p5
● ceph-disk@dev-nvme0n1p6.service loaded failed failed Ceph disk activation: /dev/nvme0n1p6
● ceph-disk@dev-nvme0n1p7.service loaded failed failed Ceph disk activation: /dev/nvme0n1p7
● ceph-disk@dev-nvme0n1p8.service loaded failed failed Ceph disk activation: /dev/nvme0n1p8
● ceph-disk@dev-nvme0n1p9.service loaded failed failed Ceph disk activation: /dev/nvme0n1p9
● ceph-disk@dev-sdb1.service loaded failed failed Ceph disk activation: /dev/sdb1
● ceph-disk@dev-sdc1.service loaded failed failed Ceph disk activation: /dev/sdc1
● ceph-disk@dev-sdd1.service loaded failed failed Ceph disk activation: /dev/sdd1
● ceph-disk@dev-sdf1.service loaded failed failed Ceph disk activation: /dev/sdf1
● ceph-disk@dev-sdi1.service loaded failed failed Ceph disk activation: /dev/sdi1
● ceph-disk@dev-sdj1.service loaded failed failed Ceph disk activation: /dev/sdj1
● ceph-disk@dev-sdk1.service loaded failed failed Ceph disk activation: /dev/sdk1
● ceph-disk@dev-sdl1.service loaded failed failed Ceph disk activation: /dev/sdl1
● ceph-osd@25.service loaded failed failed Ceph object storage daemon
● ceph-osd@26.service loaded failed failed Ceph object storage daemon
● ceph-osd@31.service loaded failed failed Ceph object storage daemon
● ceph-osd@33.service loaded failed failed Ceph object storage daemon
● ceph-osd@34.service loaded failed failed Ceph object storage daemon
If I remember right this used to work reasonably reliable, so perhaps this is a regression in 10.2.6 or 10.2.7.
History
#1 Updated by Ruben Kerkhof almost 7 years ago
#2 Updated by Sage Weil almost 3 years ago
- Status changed from New to Closed