Bug #15596
closedOSDs not mounted at boot, ceph-disk says "Error: unrecognized partition type None"
0%
Description
I have yet to get anything to start without major intervention since upgrading from infernalis to jewel 10.2.0
systemctl start ceph does nothing
to start the mon I must run systemctl start ceph-mon@elara
to start an osd I must repeat the following for every disk:mount -t xfs /dev/disk/by-partlabel/Ceph_OSD.0.XFSdata /srv/ceph/osd/osd.0 -o noatime,nodiratime,logbsize=256k,logbufs=8,allocsize=4M
systemctl start ceph-osd@0
then finally I have to run this to start the MDS:systemctl start ceph-mds@fs1
I have a feeling the ceph-disk is misbehaving and causing the whole startup process to fail.
I don't know exactly where to start, please let me know what logs you need and I'll get them promptly.
on previous versions of ceph (firefly-infernalis) I've been using the same config and never had this problem. The info shown below is from a system that began life with infernalis on debian jewell. Except for a new problem creating pidfiles because of my config+running ceph as non-root (which has been resolved), infernalis worked perfectly on this particular config.
@root@elara:/etc/ceph# systemctl status ceph-osd@0
● ceph-osd@0.service - Ceph object storage daemon
Loaded: loaded (/lib/systemd/system/ceph-osd.service; disabled)
Active: failed (Result: start-limit) since Sat 2016-04-23 19:10:00 MDT; 44min ago
Process: 4141 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Process: 4089 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
Main PID: 4141 (code=exited, status=1/FAILURE)
Apr 23 19:10:00 elara systemd1: Unit ceph-osd@0.service entered failed state.
Apr 23 19:10:00 elara systemd1: ceph-osd@0.service start request repeated too quickly, refusing to start.
Apr 23 19:10:00 elara systemd1: Failed to start Ceph object storage daemon.
Apr 23 19:10:00 elara systemd1: Unit ceph-osd@0.service entered failed state.
Apr 23 19:25:07 elara systemd1: ceph-osd@0.service start request repeated too quickly, refusing to start.
Apr 23 19:25:07 elara systemd1: Failed to start Ceph object storage daemon.
root@elara:/etc/ceph# systemctl status ceph-disk@0
● ceph-disk@0.service - Ceph disk activation: /0
Loaded: loaded (/lib/systemd/system/ceph-disk@.service; static)
Active: inactive (dead)
root@elara:/etc/ceph# systemctl start ceph-disk@0
Job for ceph-disk@0.service failed. See 'systemctl status ceph-disk@0.service' and 'journalctl -xn' for details.
root@elara:/etc/ceph# systemctl status ceph-disk@0
● ceph-disk@0.service - Ceph disk activation: /0
Loaded: loaded (/lib/systemd/system/ceph-disk@.service; static)
Active: failed (Result: exit-code) since Sat 2016-04-23 19:55:08 MDT; 4s ago
Process: 10705 ExecStart=/bin/sh -c flock /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
Main PID: 10705 (code=exited, status=1/FAILURE)
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4964, in run
Apr 23 19:55:08 elara sh10705: main(sys.argv[1:])
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4915, in main
Apr 23 19:55:08 elara sh10705: args.func(args)
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4347, in main_trigger
Apr 23 19:55:08 elara sh10705: raise Error('unrecognized partition type %s' % parttype)
Apr 23 19:55:08 elara sh10705: ceph_disk.main.Error: Error: unrecognized partition type None
Apr 23 19:55:08 elara systemd1: ceph-disk@0.service: main process exited, code=exited, status=1/FAILURE
Apr 23 19:55:08 elara systemd1: Failed to start Ceph disk activation: /0.
Apr 23 19:55:08 elara systemd1: Unit ceph-disk@0.service entered failed state.
root@elara:/etc/ceph#
@
I could not find anyone else reporting this, but I can't be the only one experiencing this problem as a jewel early adopter.
Thanks a bunch for your help guys!
Updated by Heath Jepson almost 8 years ago
correction I said Debian Jewell I meant Debian Jessie
coffee hasn't kicked in yet.
Updated by Nathan Cutler almost 8 years ago
Out of curiosity, can you check if the ceph-mon.target
and ceph-osd.target
units are enabled?
systemctl is-enabled ceph-mon.target systemctl is-enabled ceph-osd.target
My guess is they aren't. I have a PR open to fix this.
Updated by Heath Jepson almost 8 years ago
this is what I get:
root@elara:~# systemctl is-enabled ceph-mon.target
Failed to get unit file state for ceph-mon.target: No such file or directory
root@elara:~# systemctl is-enabled ceph-osd.target
Failed to get unit file state for ceph-osd.target: No such file or directory
For some reason, I have the feeling that the root of the problem (or at least, a clue) lies in ceph-disk:
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4964, in run
Apr 23 19:55:08 elara sh10705: main(sys.argv[1:])
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4915, in main
Apr 23 19:55:08 elara sh10705: args.func(args)
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4347, in main_trigger
Apr 23 19:55:08 elara sh10705: raise Error('unrecognized partition type %s' % parttype)
Apr 23 19:55:08 elara sh10705: ceph_disk.main.Error: Error: unrecognized partition type None
Apr 23 19:55:08 elara systemd1: ceph-disk@0.service: main process exited, code=exited, status=1/FAILURE
Updated by Heath Jepson almost 8 years ago
is .target what you wanted or Did you mean to have me run the following:
root@elara:~# systemctl is-enabled ceph-osd@.service
disabled
root@elara:~# systemctl is-enabled ceph-mon@.service
disabled
root@elara:~# systemctl is-enabled ceph-disk@.service
static
root@elara:~# systemctl is-enabled ceph-mds@.service
disabled
Updated by Heath Jepson almost 8 years ago
or this:
root@elara:~# systemctl is-enabled ceph.target
disabled
sorry about the multiple edits, I keep getting new ideas the longer I think about it.
Updated by Nathan Cutler almost 8 years ago
- Subject changed from OSDs not mounted at boot, MON does not start at boot, worked in infernalis to OSDs not mounted at boot, ceph-disk says "Error: unrecognized partition type None"
Hi Heath. We have a PR open https://github.com/ceph/ceph/pull/8714 to fix the disabled targets, but for now you should enable them all:
systemctl enable ceph.target
systemctl enable ceph-mds.target
systemctl enable ceph-mon.target
systemctl enable ceph-osd.target
I'm changing the bug title to reflect that this is about the ceph-disk issue you're having.
Updated by Heath Jepson almost 8 years ago
Thanks. Let me know what info you need to troubleshoot the ceph-disk problem.
Updated by Greg Farnum almost 7 years ago
- Status changed from New to Can't reproduce
Haven't heard anything about this and it's been a year...