Project

General

Profile

Actions

Bug #18305

open

ceph-osd systemd unit files incomplete

Added by Fabian Grünbichler over 7 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

tested with the Debian Jewel 10.2.5 packages on up-to-date Debian Jessie, probably applies to other version/distros using systemd as well.

the Debian packages install the following systemd units (following the cpeh-deploy quick start guide):
- ceph-create-keys@.service
- ceph.target
- ceph-mds@.service
- ceph-mds.target
- ceph-mon@.service
- ceph-mon.target
- ceph-disk@.service
- ceph-osd@.service
- ceph-osd.target

under systemd, the activation of OSDs is triggered by the udev rules in "/lib/udev/rules.d/95-ceph-osd.rules", which calls "ceph-disk trigger" on the OSD partition (which in turn calls the "ceph-disk@partition" service, which in turn activates the journal and OSD partitions with "ceph-disk activate[-journal]", which in turn starts the ).

this whole chain will not work on a lot of systems because udev rules may not trigger for devices already connected at boot (this is inherently racy, because udev does detect/trigger on devices if they are initialized late enough in the boot process, but if they are finished early). it does work (by accident?) at the moment because the old init script ("/etc/init.d/ceph") is still active even under systemd (it is not masked when installing the package), and the very last action the init script takes is calling "ceph-disk activate-all" (which in turn [re-]starts the ).

I propose masking the "ceph" init script with a new systemd one-shot service "ceph.service" that simply calls "ceph-disk activate-all". this would ensure that OSDs are brought up on boot even if located on devices which do not trigger the udev rules on boot, without impacting systems running under sys V init:

----%<----
[Unit]
Description=Ceph activate all disks task

[Service]
ExecStart=/usr/sbin/ceph-disk --log-stdout activate-all
Type=oneshot

[Install]
WantedBy=ceph.target
---->%--

I tested this under Debian Jessie, and it seems to work as intended.

journal excerpt from current default setup (notice how the ceph-osd service fails initially because the OSD partition is not mounted because udev did not trigger, and is only saved by the init script):
----%<----
Dec 20 09:30:54 deb-ceph-01 systemd1: Starting Ceph object storage daemon...
Dec 20 09:30:54 deb-ceph-01 systemd1: Started Ceph cluster key creator task.
Dec 20 09:30:54 deb-ceph-01 systemd1: Starting Ceph cluster monitor daemon...
Dec 20 09:30:54 deb-ceph-01 systemd1: Started Ceph cluster monitor daemon.
Dec 20 09:30:54 deb-ceph-01 systemd1: Starting LSB: Start Ceph distributed file system daemons at boot time...
Dec 20 09:30:54 deb-ceph-01 ceph-mon471: starting mon.deb-ceph-01 rank 0 at 10.0.0.81:6789/0 mon_data /var/lib/ceph/mon/ceph-deb-ceph-01 fsi
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh470: 2016-12-20 09:30:54.875521 7f5c34041700 1 auth: unable to find a keyring on /var/lib/c
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh470: 2016-12-20 09:30:54.875561 7f5c34041700 -1 monclient(hunting): ERROR: missing keyring,
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh470: 2016-12-20 09:30:54.875568 7f5c34041700 0 librados: osd.0 initialization error (2) No
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh470: Error connecting to cluster: ObjectNotFound
Dec 20 09:30:54 deb-ceph-01 systemd1: Started Ceph object storage daemon.
Dec 20 09:30:54 deb-ceph-01 ceph-osd852: 2016-12-20 09:30:54.973906 7f0cfd16b800 -1 ** ERROR: unable to open OSD superblock on /var/lib/cep
Dec 20 09:30:55 deb-ceph-01 systemd1: : main process exited, code=exited, status=1/FAILURE
Dec 20 09:30:55 deb-ceph-01 systemd1: Unit entered failed state.
Dec 20 09:30:55 deb-ceph-01 systemd1: [/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:30:55 deb-ceph-01 systemd1: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:30:55 deb-ceph-01 systemd1: Starting Ceph object storage daemon...
Dec 20 09:30:55 deb-ceph-01 systemd1: holdoff time over, scheduling restart.
Dec 20 09:30:55 deb-ceph-01 systemd1: Stopping Ceph object storage daemon...
Dec 20 09:30:55 deb-ceph-01 systemd1: Starting Ceph object storage daemon...
Dec 20 09:30:55 deb-ceph-01 ceph-osd-prestart.sh879: create-or-move updated item name 'osd.0' weight 0.0107 at location {host=deb-ceph-01,ro
Dec 20 09:30:55 deb-ceph-01 systemd1: Started Ceph object storage daemon.
Dec 20 09:30:55 deb-ceph-01 systemd1: Started LSB: Start Ceph distributed file system daemons at boot time.
Dec 20 09:30:55 deb-ceph-01 ceph-osd928: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
Dec 20 09:30:55 deb-ceph-01 ceph-osd928: 2016-12-20 09:30:55.691271 7f73dc471800 -1 osd.0 55 log_to_monitors {default=true}
---
>%--

journal excerpt with above ceph.service unit:
----%<----
Dec 20 09:17:48 deb-ceph-02 systemd1: Starting Activate all Ceph disks...
Dec 20 09:17:48 deb-ceph-02 systemd1: Started Ceph cluster key creator task.
Dec 20 09:17:48 deb-ceph-02 systemd1: Starting Ceph cluster monitor daemon...
Dec 20 09:17:48 deb-ceph-02 systemd1: Started Ceph cluster monitor daemon.
Dec 20 09:17:48 deb-ceph-02 systemd1: Starting Ceph object storage daemon...
Dec 20 09:17:48 deb-ceph-02 ceph-mon466: starting mon.deb-ceph-02 rank 1 at 10.0.0.82:6789/0 mon_data /var/lib/ceph/mon/ceph-deb-ceph-02 fsi
Dec 20 09:17:48 deb-ceph-02 systemd1: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:17:48 deb-ceph-02 systemd1: [/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:17:48 deb-ceph-02 ceph-osd-prestart.sh468: create-or-move updated item name 'osd.1' weight 0.0291 at location {host=deb-ceph-02,ro
Dec 20 09:17:48 deb-ceph-02 systemd1: Started Ceph object storage daemon.
Dec 20 09:17:48 deb-ceph-02 systemd1: Started Activate all Ceph disks.
Dec 20 09:17:49 deb-ceph-02 ceph-osd847: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Dec 20 09:17:49 deb-ceph-02 ceph-osd847: 2016-12-20 03:17:49.073587 7fd8fdbfe800 1 osd.1 52 log_to_monitors {default=true}
---
>%--


Files

ceph-log (9.94 KB) ceph-log journal output of ceph 10.2.9 on Debian 9.1 Fabian Grünbichler, 09/25/2017 11:34 AM
ceph-log2 (7.83 KB) ceph-log2 journal output of ceph 10.2.9 on Debian 9.1, with PR 17904 manually applied and ceph-osd@0 disabled before rebooting Fabian Grünbichler, 09/25/2017 01:55 PM
Actions

Also available in: Atom PDF