cephadm: osd unit.run creates /var/run/ceph/$FSID too late, so OSD may not start after reboot
The OSD unit.run file currently has the following form:
/usr/bin/podman run [...] -v /var/run/ceph/$FSID:/var/run/ceph:z [...] ceph-volume lvm activate [...] /usr/bin/install -d -m0770 -o 167 -g 167 /var/run/ceph/$FSID # osd.$ID /usr/bin/podman run [...] ceph-osd [...]
i.e. first it invokes
podman [...] ceph-volume lvm activate, then creates /var/run/ceph/$FSID, then starts the OSD container. The problem is that the
podman [...] ceph-volume lvm activate call will fail because /var/run/ceph/$FSID doesn't exist (you'll see something like 'Error: error checking path "/var/run/ceph/8f5be3a6-f1bb-11ea-9130-525400a64977": stat /var/run/ceph/8f5be3a6-f1bb-11ea-9130-525400a64977: no such file or directory' in the journal and your OSD won't start).
I assume most users have never experienced this, because every other ceph daemon's unit.run file also creates /var/run/ceph/$FSID, so if any other ceph daemon (including the crash daemon, which usually runs on all nodes) starts first, the directory is already created, and so the OSDs start up just fine. During upgrades however, it's entirely possible to adopt a bunch of OSDs, and not start any other services on that node yet (including the crash service), reboot, and then have all the OSDs on that node fail to start. Ouch.
#1 Updated by Tim Serong about 2 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 37046
If you've got a node runing OSDs and no other services, this is trivially reproducible by first running
ceph orch rm crash to get rid of the crash daemon, then rebooting the OSD node.