Project

General

Profile

Bug #18305

ceph-osd systemd unit files incomplete

Added by Fabian Grünbichler over 2 years ago. Updated almost 2 years ago.

Status:
Verified
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
Start date:
12/20/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

tested with the Debian Jewel 10.2.5 packages on up-to-date Debian Jessie, probably applies to other version/distros using systemd as well.

the Debian packages install the following systemd units (following the cpeh-deploy quick start guide):
- ceph-create-keys@.service
- ceph.target
- ceph-mds@.service
- ceph-mds.target
- ceph-mon@.service
- ceph-mon.target
- ceph-disk@.service
- ceph-osd@.service
- ceph-osd.target

under systemd, the activation of OSDs is triggered by the udev rules in "/lib/udev/rules.d/95-ceph-osd.rules", which calls "ceph-disk trigger" on the OSD partition (which in turn calls the "ceph-disk@partition" service, which in turn activates the journal and OSD partitions with "ceph-disk activate[-journal]", which in turn starts the ).

this whole chain will not work on a lot of systems because udev rules may not trigger for devices already connected at boot (this is inherently racy, because udev does detect/trigger on devices if they are initialized late enough in the boot process, but if they are finished early). it does work (by accident?) at the moment because the old init script ("/etc/init.d/ceph") is still active even under systemd (it is not masked when installing the package), and the very last action the init script takes is calling "ceph-disk activate-all" (which in turn [re-]starts the ).

I propose masking the "ceph" init script with a new systemd one-shot service "ceph.service" that simply calls "ceph-disk activate-all". this would ensure that OSDs are brought up on boot even if located on devices which do not trigger the udev rules on boot, without impacting systems running under sys V init:

----%<----
[Unit]
Description=Ceph activate all disks task

[Service]
ExecStart=/usr/sbin/ceph-disk --log-stdout activate-all
Type=oneshot

[Install]
WantedBy=ceph.target
---->%--

I tested this under Debian Jessie, and it seems to work as intended.

journal excerpt from current default setup (notice how the ceph-osd service fails initially because the OSD partition is not mounted because udev did not trigger, and is only saved by the init script):
----%<----
Dec 20 09:30:54 deb-ceph-01 systemd1: Starting Ceph object storage daemon...
Dec 20 09:30:54 deb-ceph-01 systemd1: Started Ceph cluster key creator task.
Dec 20 09:30:54 deb-ceph-01 systemd1: Starting Ceph cluster monitor daemon...
Dec 20 09:30:54 deb-ceph-01 systemd1: Started Ceph cluster monitor daemon.
Dec 20 09:30:54 deb-ceph-01 systemd1: Starting LSB: Start Ceph distributed file system daemons at boot time...
Dec 20 09:30:54 deb-ceph-01 ceph-mon471: starting mon.deb-ceph-01 rank 0 at 10.0.0.81:6789/0 mon_data /var/lib/ceph/mon/ceph-deb-ceph-01 fsi
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh470: 2016-12-20 09:30:54.875521 7f5c34041700 1 auth: unable to find a keyring on /var/lib/c
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh470: 2016-12-20 09:30:54.875561 7f5c34041700 -1 monclient(hunting): ERROR: missing keyring,
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh470: 2016-12-20 09:30:54.875568 7f5c34041700 0 librados: osd.0 initialization error (2) No
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh470: Error connecting to cluster: ObjectNotFound
Dec 20 09:30:54 deb-ceph-01 systemd1: Started Ceph object storage daemon.
Dec 20 09:30:54 deb-ceph-01 ceph-osd852: 2016-12-20 09:30:54.973906 7f0cfd16b800 -1 ** ERROR: unable to open OSD superblock on /var/lib/cep
Dec 20 09:30:55 deb-ceph-01 systemd1: : main process exited, code=exited, status=1/FAILURE
Dec 20 09:30:55 deb-ceph-01 systemd1: Unit entered failed state.
Dec 20 09:30:55 deb-ceph-01 systemd1: [/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:30:55 deb-ceph-01 systemd1: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:30:55 deb-ceph-01 systemd1: Starting Ceph object storage daemon...
Dec 20 09:30:55 deb-ceph-01 systemd1: holdoff time over, scheduling restart.
Dec 20 09:30:55 deb-ceph-01 systemd1: Stopping Ceph object storage daemon...
Dec 20 09:30:55 deb-ceph-01 systemd1: Starting Ceph object storage daemon...
Dec 20 09:30:55 deb-ceph-01 ceph-osd-prestart.sh879: create-or-move updated item name 'osd.0' weight 0.0107 at location {host=deb-ceph-01,ro
Dec 20 09:30:55 deb-ceph-01 systemd1: Started Ceph object storage daemon.
Dec 20 09:30:55 deb-ceph-01 systemd1: Started LSB: Start Ceph distributed file system daemons at boot time.
Dec 20 09:30:55 deb-ceph-01 ceph-osd928: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
Dec 20 09:30:55 deb-ceph-01 ceph-osd928: 2016-12-20 09:30:55.691271 7f73dc471800 -1 osd.0 55 log_to_monitors {default=true}
---->%---

journal excerpt with above ceph.service unit:
----%<----
Dec 20 09:17:48 deb-ceph-02 systemd1: Starting Activate all Ceph disks...
Dec 20 09:17:48 deb-ceph-02 systemd1: Started Ceph cluster key creator task.
Dec 20 09:17:48 deb-ceph-02 systemd1: Starting Ceph cluster monitor daemon...
Dec 20 09:17:48 deb-ceph-02 systemd1: Started Ceph cluster monitor daemon.
Dec 20 09:17:48 deb-ceph-02 systemd1: Starting Ceph object storage daemon...
Dec 20 09:17:48 deb-ceph-02 ceph-mon466: starting mon.deb-ceph-02 rank 1 at 10.0.0.82:6789/0 mon_data /var/lib/ceph/mon/ceph-deb-ceph-02 fsi
Dec 20 09:17:48 deb-ceph-02 systemd1: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:17:48 deb-ceph-02 systemd1: [/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:17:48 deb-ceph-02 ceph-osd-prestart.sh468: create-or-move updated item name 'osd.1' weight 0.0291 at location {host=deb-ceph-02,ro
Dec 20 09:17:48 deb-ceph-02 systemd1: Started Ceph object storage daemon.
Dec 20 09:17:48 deb-ceph-02 systemd1: Started Activate all Ceph disks.
Dec 20 09:17:49 deb-ceph-02 ceph-osd847: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Dec 20 09:17:49 deb-ceph-02 ceph-osd847: 2016-12-20 03:17:49.073587 7fd8fdbfe800 1 osd.1 52 log_to_monitors {default=true}
---->%---

ceph-log - journal output of ceph 10.2.9 on Debian 9.1 (9.94 KB) Fabian Grünbichler, 09/25/2017 11:34 AM

ceph-log2 - journal output of ceph 10.2.9 on Debian 9.1, with PR 17904 manually applied and ceph-osd@0 disabled before rebooting (7.83 KB) Fabian Grünbichler, 09/25/2017 01:55 PM

History

#1 Updated by Fabian Grünbichler over 2 years ago

note that the fix for http://tracker.ceph.com/issues/17889 should make this issue more obvious, as the ceph-osd@ services will not be started unconditionally at boot anymore

#2 Updated by Fabian Grünbichler over 2 years ago

logs again with correct formatting:

Dec 20 09:30:54 deb-ceph-01 systemd[1]: Starting Ceph object storage daemon...
Dec 20 09:30:54 deb-ceph-01 systemd[1]: Started Ceph cluster key creator task.
Dec 20 09:30:54 deb-ceph-01 systemd[1]: Starting Ceph cluster monitor daemon...
Dec 20 09:30:54 deb-ceph-01 systemd[1]: Started Ceph cluster monitor daemon.
Dec 20 09:30:54 deb-ceph-01 systemd[1]: Starting LSB: Start Ceph distributed file system daemons at boot time...
Dec 20 09:30:54 deb-ceph-01 ceph-mon[471]: starting mon.deb-ceph-01 rank 0 at 10.0.0.81:6789/0 mon_data /var/lib/ceph/mon/ceph-deb-ceph-01 fsid e6f60473-dc86-4cb7-ad09-30e69d92b09c
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh[470]: 2016-12-20 09:30:54.875521 7f5c34041700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh[470]: 2016-12-20 09:30:54.875561 7f5c34041700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh[470]: 2016-12-20 09:30:54.875568 7f5c34041700  0 librados: osd.0 initialization error (2) No such file or directory
Dec 20 09:30:54 deb-ceph-01 ceph-osd-prestart.sh[470]: Error connecting to cluster: ObjectNotFound
Dec 20 09:30:54 deb-ceph-01 systemd[1]: Started Ceph object storage daemon.
Dec 20 09:30:54 deb-ceph-01 ceph-osd[852]: 2016-12-20 09:30:54.973906 7f0cfd16b800 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or directory
Dec 20 09:30:55 deb-ceph-01 systemd[1]: ceph-osd@0.service: main process exited, code=exited, status=1/FAILURE
Dec 20 09:30:55 deb-ceph-01 systemd[1]: Unit ceph-osd@0.service entered failed state.
Dec 20 09:30:55 deb-ceph-01 systemd[1]: [/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:30:55 deb-ceph-01 systemd[1]: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:30:55 deb-ceph-01 systemd[1]: Starting Ceph object storage daemon...
Dec 20 09:30:55 deb-ceph-01 systemd[1]: ceph-osd@0.service holdoff time over, scheduling restart.
Dec 20 09:30:55 deb-ceph-01 systemd[1]: Stopping Ceph object storage daemon...
Dec 20 09:30:55 deb-ceph-01 systemd[1]: Starting Ceph object storage daemon...
Dec 20 09:30:55 deb-ceph-01 ceph-osd-prestart.sh[879]: create-or-move updated item name 'osd.0' weight 0.0107 at location {host=deb-ceph-01,root=default} to crush map
Dec 20 09:30:55 deb-ceph-01 systemd[1]: Started Ceph object storage daemon.
Dec 20 09:30:55 deb-ceph-01 systemd[1]: Started LSB: Start Ceph distributed file system daemons at boot time.
Dec 20 09:30:55 deb-ceph-01 ceph-osd[928]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
Dec 20 09:30:55 deb-ceph-01 ceph-osd[928]: 2016-12-20 09:30:55.691271 7f73dc471800 -1 osd.0 55 log_to_monitors {default=true}
Dec 20 09:17:48 deb-ceph-02 systemd[1]: Starting Activate all Ceph disks...
Dec 20 09:17:48 deb-ceph-02 systemd[1]: Started Ceph cluster key creator task.
Dec 20 09:17:48 deb-ceph-02 systemd[1]: Starting Ceph cluster monitor daemon...
Dec 20 09:17:48 deb-ceph-02 systemd[1]: Started Ceph cluster monitor daemon.
Dec 20 09:17:48 deb-ceph-02 systemd[1]: Starting Ceph object storage daemon...
Dec 20 09:17:48 deb-ceph-02 ceph-mon[466]: starting mon.deb-ceph-02 rank 1 at 10.0.0.82:6789/0 mon_data /var/lib/ceph/mon/ceph-deb-ceph-02 fsid e6f60473-dc86-4cb7-ad09-30e69d92b09c
Dec 20 09:17:48 deb-ceph-02 systemd[1]: [/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:17:48 deb-ceph-02 systemd[1]: [/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' in section 'Service'
Dec 20 09:17:48 deb-ceph-02 ceph-osd-prestart.sh[468]: create-or-move updated item name 'osd.1' weight 0.0291 at location {host=deb-ceph-02,root=default} to crush map
Dec 20 09:17:48 deb-ceph-02 systemd[1]: Started Ceph object storage daemon.
Dec 20 09:17:48 deb-ceph-02 systemd[1]: Started Activate all Ceph disks.
Dec 20 09:17:49 deb-ceph-02 ceph-osd[847]: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Dec 20 09:17:49 deb-ceph-02 ceph-osd[847]: 2016-12-20 03:17:49.073587 7fd8fdbfe800 -1 osd.1 52 log_to_monitors {default=true}

#3 Updated by Greg Farnum over 2 years ago

  • Assignee set to Loic Dachary

Loic, is this still a thing or did it get resolved in some of the other changes?

#4 Updated by Fabian Grünbichler about 2 years ago

@Greg:

$ ceph --version
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

$ apt show ceph-base
Package: ceph-base
Source: ceph
Version: 10.2.7-1~bpo80+1
Maintainer: Ceph Maintainers <ceph-maintainers@lists.ceph.com>
Installed-Size: 321 MB
Depends: binutils, ceph-common (= 10.2.7-1~bpo80+1), cryptsetup-bin | cryptsetup, debianutils, findutils, gdisk, grep, logrotate, python, python-argparse | libpython2.7-stdlib, python-pkg-resources, sdparm | hdparm, xfsprogs, libboost-iostreams1.55.0, libboost-random1.55.0, libboost-system1.55.0, libboost-thread1.55.0, libc6 (>= 2.16), libcephfs1, libgcc1 (>= 1:4.1.1), libnspr4 (>= 2:4.9-2~), libnss3 (>= 2:3.13.4-2~), libstdc++6 (>= 4.9)
Recommends: btrfs-tools, ceph-mds (= 10.2.7-1~bpo80+1), librados2 (= 10.2.7-1~bpo80+1), libradosstriper1 (= 10.2.7-1~bpo80+1), librbd1 (= 10.2.7-1~bpo80+1), ntp | time-daemon
Breaks: ceph (<< 10), ceph-test (<< 0.94-1322), python-ceph (<< 0.92-1223)
Replaces: ceph (<< 10), ceph-common (<< 0.78-500), ceph-test (<< 0.94-1322), python-ceph (<< 0.92-1223)
Homepage: http://ceph.com/
Priority: optional
Section: admin
Download-Size: 51.6 MB
APT-Manual-Installed: no
APT-Sources: http://download.ceph.com/debian-jewel/ jessie/main amd64 Packages
Description: common ceph daemon libraries and management tools
 Ceph is a massively scalable, open-source, distributed
 storage system that runs on commodity hardware and delivers object,
 block and file system storage.
 .
 This package contains the libraries and management tools that are common among
 the three Ceph server daemons (ceph-mon, ceph-osd, ceph-mds). These tools are
 necessary for creating, running, and administering a Ceph storage cluster.

N: There is 1 additional record. Please use the '-a' switch to see it

$ dpkg -L ceph-base | grep "systemd/system/\|init\.d/" 
/lib/systemd/system/ceph-create-keys@.service
/etc/init.d/ceph

$ for pkg in ceph ceph-base ceph-common ceph-mon ceph-osd; do echo "$pkg:";dpkg -L $pkg | grep "systemd/system/\|init.d/"; done
ceph:
ceph-base:
/lib/systemd/system/ceph-create-keys@.service
/etc/init.d/ceph
ceph-common:
/lib/systemd/system/ceph.target
/etc/init.d/rbdmap
ceph-mon:
/lib/systemd/system/ceph-mon.target
/lib/systemd/system/ceph-mon@.service
ceph-osd:
/lib/systemd/system/ceph-disk@.service
/lib/systemd/system/ceph-osd@.service
/lib/systemd/system/ceph-osd.target

$ sudo systemctl cat ceph.service
# /run/systemd/generator.late/ceph.service
# Automatically generated by systemd-sysv-generator

[Unit]
SourcePath=/etc/init.d/ceph
Description=LSB: Start Ceph distributed file system daemons at boot time
Before=runlevel2.target runlevel3.target runlevel4.target runlevel5.target shutdown.target
After=remote-fs.target nss-lookup.target network-online.target time-sync.target
Wants=network-online.target
Conflicts=shutdown.target

[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
SysVStartPriority=4
ExecStart=/etc/init.d/ceph start
ExecStop=/etc/init.d/ceph stop

also see http://marc.info/?l=ceph-devel&m=149276646923341 and following.

AFAICT the main thing that has changed since originally reporting this is that disabling the init script does more harm now - OSDs are (rightfully) no longer permanently enabled, so if udev does not start them, noone will.

#5 Updated by Loic Dachary almost 2 years ago

  • Status changed from New to Verified

this is inherently racy, because udev does detect/trigger on devices if they are initialized late enough in the boot process, but if they are finished early

I don't understand how that is is possible, it would be great if you could explain. If a device initializes early during the boot process, the corresponding events will be kept in memory and sent when udev is ready to act on them, i.e. when the local file systems are mounted and the files containing the udev rules are available.

It has been reported by a number of sources that it sometime is not the case but to this date we don't have a reproducer which is embarrassing. Do you have a way to reproduce this issue ?

#6 Updated by Fabian Grünbichler almost 2 years ago

Loic Dachary wrote:

this is inherently racy, because udev does detect/trigger on devices if they are initialized late enough in the boot process, but if they are finished early

I don't understand how that is is possible, it would be great if you could explain. If a device initializes early during the boot process, the corresponding events will be kept in memory and sent when udev is ready to act on them, i.e. when the local file systems are mounted and the files containing the udev rules are available.

It has been reported by a number of sources that it sometime is not the case but to this date we don't have a reproducer which is embarrassing. Do you have a way to reproduce this issue ?

seems like it's no longer reproducible using current Jewel or Luminous (on Debian Stretch). still very noisy and a bit delayed (the first attempt to start fails because the OSD path is not mounted, but then the hold-off time is over and it successfully restarts). it seems the OSD units are now only some times runtime enabled?

#7 Updated by Nathan Cutler almost 2 years ago

seems like it's no longer reproducible using current Jewel or Luminous (on Debian Stretch). still very noisy and a bit delayed (the first attempt to start fails because the OSD path is not mounted, but then the hold-off time is over and it successfully restarts). it seems the OSD units are now only some times runtime enabled?

Sounds like https://github.com/ceph/ceph/pull/17904 will fix it, once it is backported. Fabian, can you apply that patch manually to your OSD nodes?

#8 Updated by Fabian Grünbichler almost 2 years ago

Nathan Cutler wrote:

seems like it's no longer reproducible using current Jewel or Luminous (on Debian Stretch). still very noisy and a bit delayed (the first attempt to start fails because the OSD path is not mounted, but then the hold-off time is over and it successfully restarts). it seems the OSD units are now only some times runtime enabled?

Sounds like https://github.com/ceph/ceph/pull/17904 will fix it, once it is backported. Fabian, can you apply that patch manually to your OSD nodes?

I think that does not help for 10.2.9 - if ceph-osd@0 is not enabled, ceph-disk activate thinks the host is sysv init based ;)

$ systemctl status ceph-osd@0
● ceph-osd@0.service - Ceph object storage daemon
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: enabled)
   Active: inactive (dead)

#9 Updated by Loic Dachary almost 2 years ago

  • Assignee deleted (Loic Dachary)

Also available in: Atom PDF