Project

General

Profile

Actions

Bug #15596

closed

OSDs not mounted at boot, ceph-disk says "Error: unrecognized partition type None"

Added by Heath Jepson almost 8 years ago. Updated almost 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have yet to get anything to start without major intervention since upgrading from infernalis to jewel 10.2.0

systemctl start ceph does nothing

to start the mon I must run systemctl start ceph-mon@elara

to start an osd I must repeat the following for every disk:
mount -t xfs /dev/disk/by-partlabel/Ceph_OSD.0.XFSdata /srv/ceph/osd/osd.0 -o noatime,nodiratime,logbsize=256k,logbufs=8,allocsize=4M
systemctl start ceph-osd@0

then finally I have to run this to start the MDS:
systemctl start ceph-mds@fs1

I have a feeling the ceph-disk is misbehaving and causing the whole startup process to fail.

I don't know exactly where to start, please let me know what logs you need and I'll get them promptly.

on previous versions of ceph (firefly-infernalis) I've been using the same config and never had this problem. The info shown below is from a system that began life with infernalis on debian jewell. Except for a new problem creating pidfiles because of my config+running ceph as non-root (which has been resolved), infernalis worked perfectly on this particular config.

@root@elara:/etc/ceph# systemctl status ceph-osd@0
- Ceph object storage daemon
Loaded: loaded (/lib/systemd/system/ceph-osd.service; disabled)
Active: failed (Result: start-limit) since Sat 2016-04-23 19:10:00 MDT; 44min ago
Process: 4141 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Process: 4089 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
Main PID: 4141 (code=exited, status=1/FAILURE)

Apr 23 19:10:00 elara systemd1: Unit entered failed state.
Apr 23 19:10:00 elara systemd1: start request repeated too quickly, refusing to start.
Apr 23 19:10:00 elara systemd1: Failed to start Ceph object storage daemon.
Apr 23 19:10:00 elara systemd1: Unit entered failed state.
Apr 23 19:25:07 elara systemd1: start request repeated too quickly, refusing to start.
Apr 23 19:25:07 elara systemd1: Failed to start Ceph object storage daemon.
root@elara:/etc/ceph# systemctl status ceph-disk@0
- Ceph disk activation: /0
Loaded: loaded (/lib/systemd/system/ceph-disk@.service; static)
Active: inactive (dead)
root@elara:/etc/ceph# systemctl start ceph-disk@0
Job for failed. See 'systemctl status ' and 'journalctl -xn' for details.
root@elara:/etc/ceph# systemctl status ceph-disk@0
- Ceph disk activation: /0
Loaded: loaded (/lib/systemd/system/ceph-disk@.service; static)
Active: failed (Result: exit-code) since Sat 2016-04-23 19:55:08 MDT; 4s ago
Process: 10705 ExecStart=/bin/sh -c flock /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
Main PID: 10705 (code=exited, status=1/FAILURE)

Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4964, in run
Apr 23 19:55:08 elara sh10705: main(sys.argv[1:])
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4915, in main
Apr 23 19:55:08 elara sh10705: args.func(args)
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4347, in main_trigger
Apr 23 19:55:08 elara sh10705: raise Error('unrecognized partition type %s' % parttype)
Apr 23 19:55:08 elara sh10705: ceph_disk.main.Error: Error: unrecognized partition type None
Apr 23 19:55:08 elara systemd1: : main process exited, code=exited, status=1/FAILURE
Apr 23 19:55:08 elara systemd1: Failed to start Ceph disk activation: /0.
Apr 23 19:55:08 elara systemd1: Unit entered failed state.
root@elara:/etc/ceph#
@

I could not find anyone else reporting this, but I can't be the only one experiencing this problem as a jewel early adopter.

Thanks a bunch for your help guys!

Actions #1

Updated by Heath Jepson almost 8 years ago

correction I said Debian Jewell I meant Debian Jessie

coffee hasn't kicked in yet.

Actions #2

Updated by Nathan Cutler almost 8 years ago

Out of curiosity, can you check if the ceph-mon.target and ceph-osd.target units are enabled?

systemctl is-enabled ceph-mon.target
systemctl is-enabled ceph-osd.target

My guess is they aren't. I have a PR open to fix this.

Actions #3

Updated by Heath Jepson almost 8 years ago

this is what I get:

root@elara:~# systemctl is-enabled ceph-mon.target
Failed to get unit file state for ceph-mon.target: No such file or directory
root@elara:~# systemctl is-enabled ceph-osd.target
Failed to get unit file state for ceph-osd.target: No such file or directory

For some reason, I have the feeling that the root of the problem (or at least, a clue) lies in ceph-disk:

Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4964, in run
Apr 23 19:55:08 elara sh10705: main(sys.argv[1:])
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4915, in main
Apr 23 19:55:08 elara sh10705: args.func(args)
Apr 23 19:55:08 elara sh10705: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4347, in main_trigger
Apr 23 19:55:08 elara sh10705: raise Error('unrecognized partition type %s' % parttype)
Apr 23 19:55:08 elara sh10705: ceph_disk.main.Error: Error: unrecognized partition type None
Apr 23 19:55:08 elara systemd1: : main process exited, code=exited, status=1/FAILURE

Actions #4

Updated by Heath Jepson almost 8 years ago

is .target what you wanted or Did you mean to have me run the following:

root@elara:~# systemctl is-enabled ceph-osd@.service
disabled
root@elara:~# systemctl is-enabled ceph-mon@.service
disabled
root@elara:~# systemctl is-enabled ceph-disk@.service
static
root@elara:~# systemctl is-enabled ceph-mds@.service
disabled

Actions #5

Updated by Heath Jepson almost 8 years ago

or this:

root@elara:~# systemctl is-enabled ceph.target
disabled

sorry about the multiple edits, I keep getting new ideas the longer I think about it.

Actions #6

Updated by Nathan Cutler almost 8 years ago

  • Subject changed from OSDs not mounted at boot, MON does not start at boot, worked in infernalis to OSDs not mounted at boot, ceph-disk says "Error: unrecognized partition type None"

Hi Heath. We have a PR open https://github.com/ceph/ceph/pull/8714 to fix the disabled targets, but for now you should enable them all:


systemctl enable ceph.target
systemctl enable ceph-mds.target
systemctl enable ceph-mon.target
systemctl enable ceph-osd.target

I'm changing the bug title to reflect that this is about the ceph-disk issue you're having.

Actions #7

Updated by Heath Jepson almost 8 years ago

Thanks. Let me know what info you need to troubleshoot the ceph-disk problem.

Actions #8

Updated by Greg Farnum almost 7 years ago

  • Status changed from New to Can't reproduce

Haven't heard anything about this and it's been a year...

Actions

Also available in: Atom PDF