Project

General

Profile

Bug #19781

OSDs fail to start after reboot

Added by Ruben Kerkhof almost 7 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have 12 OSDs per server, and after a reboot only 7 or so come up.
The other ones are not even mounted:

$ mount | grep ceph
/dev/sdh1 on /var/lib/ceph/osd/hlm1-ceph01-30 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdf1 on /var/lib/ceph/osd/hlm1-ceph01-28 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdg1 on /var/lib/ceph/osd/hlm1-ceph01-29 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sda1 on /var/lib/ceph/osd/hlm1-ceph01-23 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdj1 on /var/lib/ceph/osd/hlm1-ceph01-32 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sde1 on /var/lib/ceph/osd/hlm1-ceph01-27 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)
/dev/sdb1 on /var/lib/ceph/osd/hlm1-ceph01-24 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)

The systemd services are failed:
ruben@hlm1-pod12-ceph05: ~$ sudo systemctl --state=failed
UNIT LOAD ACTIVE SUB DESCRIPTION
loaded failed failed Ceph disk activation: /dev/nvme0n1p13
loaded failed failed Ceph disk activation: /dev/nvme0n1p15
loaded failed failed Ceph disk activation: /dev/nvme0n1p16
loaded failed failed Ceph disk activation: /dev/nvme0n1p5
loaded failed failed Ceph disk activation: /dev/nvme0n1p6
loaded failed failed Ceph disk activation: /dev/nvme0n1p7
loaded failed failed Ceph disk activation: /dev/nvme0n1p8
loaded failed failed Ceph disk activation: /dev/nvme0n1p9
loaded failed failed Ceph disk activation: /dev/sdb1
loaded failed failed Ceph disk activation: /dev/sdc1
loaded failed failed Ceph disk activation: /dev/sdd1
loaded failed failed Ceph disk activation: /dev/sdf1
loaded failed failed Ceph disk activation: /dev/sdi1
loaded failed failed Ceph disk activation: /dev/sdj1
loaded failed failed Ceph disk activation: /dev/sdk1
loaded failed failed Ceph disk activation: /dev/sdl1
loaded failed failed Ceph object storage daemon
loaded failed failed Ceph object storage daemon
loaded failed failed Ceph object storage daemon
loaded failed failed Ceph object storage daemon
loaded failed failed Ceph object storage daemon

If I remember right this used to work reasonably reliable, so perhaps this is a regression in 10.2.6 or 10.2.7.

log.txt View - journald log (954 KB) Ruben Kerkhof, 04/26/2017 12:22 PM

History

#1 Updated by Ruben Kerkhof almost 7 years ago

#2 Updated by Sage Weil almost 3 years ago

  • Status changed from New to Closed

Also available in: Atom PDF