Bug #23083
ceph-mgr fails to start after a system reboot on Ubuntu 16.04
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
ceph-mgr
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
While upgrading a Luminous v12.2.2 system to 12.2.3 I noticed that on all three Monitors the Mgr daemon wouldn't start.
I tried a couple of times, but it just wouldn't start after a reboot. I turned logging to debug_mgr=10 and tried again:
2018-02-22 11:29:43.543272 7fcd70a02700 4 mgr handle_mgr_map active in map: 0 active is 1034134 2018-02-22 11:29:43.544208 7fcd6cff4700 -1 received signal: Terminated from PID: 1 task name: /sbin/init UID: 0 2018-02-22 11:29:43.544231 7fcd6cff4700 -1 mgr handle_signal *** Got signal Terminated *** 2018-02-22 11:29:43.544235 7fcd6cff4700 4 mgr shutdown Shutting down 2018-02-22 11:29:43.546097 7fcd6cff4700 10 mgr shutdown waiting for module dashboard to shutdown 2018-02-22 11:29:43.546112 7fcd6cff4700 20 mgr Gil Switched to new thread state 0x564791f54260 2018-02-22 11:29:43.546454 7fcd6cff4700 4 mgr[dashboard] Stopping server... 2018-02-22 11:29:43.550298 7fcd6cff4700 4 mgr[dashboard] Stopped server 2018-02-22 11:29:43.550327 7fcd6cff4700 20 mgr ~Gil Destroying new thread state 0x564791f54260 2018-02-22 11:29:43.550335 7fcd6cff4700 10 mgr shutdown module dashboard shutdown 2018-02-22 11:29:43.550338 7fcd6cff4700 10 mgr shutdown joining thread for module dashboard 2018-02-22 11:29:43.557250 7fcd5eea5700 4 mgr[dashboard] Engine done. 2018-02-22 11:29:43.557280 7fcd5eea5700 20 mgr ~Gil Destroying new thread state 0x564791f52c60 2018-02-22 11:29:43.557427 7fcd6cff4700 10 mgr shutdown joined thread for module dashboard 2018-02-22 11:29:43.557446 7fcd6cff4700 20 mgr Gil Switched to new thread state 0x564791f54260 2018-02-22 11:29:43.557449 7fcd6cff4700 20 mgr ~Gil Destroying new thread state 0x564791f54260 2018-02-22 11:29:43.557467 7fcd6cff4700 20 mgr Gil Switched to new thread state 0x564791f54260 2018-02-22 11:29:43.557469 7fcd6cff4700 20 mgr ~Gil Destroying new thread state 0x564791f54260 2018-02-22 11:29:43.557474 7fcd6cff4700 20 mgr Gil Switched to new thread state 0x564791f54260 2018-02-22 11:29:43.557475 7fcd6cff4700 20 mgr ~Gil Destroying new thread state 0x564791f54260 2018-02-22 11:29:43.557477 7fcd6cff4700 20 mgr Gil Switched to new thread state 0x564791f54260 2018-02-22 11:29:43.557480 7fcd6cff4700 20 mgr ~Gil Destroying new thread state 0x564791f54260 2018-02-22 11:29:43.557482 7fcd6cff4700 20 mgr Gil Switched to new thread state 0x564791f54260 2018-02-22 11:29:43.557484 7fcd6cff4700 20 mgr ~Gil Destroying new thread state 0x564791f54260 2018-02-22 11:30:49.327560 7fa18c91d680 0 set uid:gid to 64045:64045 (ceph:ceph) 2018-02-22 11:30:49.327574 7fa18c91d680 0 ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) luminous (stable), process (unknown), pid 1524 2018-02-22 11:30:49.330552 7fa18c91d680 0 pidfile_write: ignore empty --pid-file 2018-02-22 11:30:49.598648 7f6b62bf5680 0 set uid:gid to 64045:64045 (ceph:ceph) 2018-02-22 11:30:49.598661 7f6b62bf5680 0 ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) luminous (stable), process (unknown), pid 1689 2018-02-22 11:30:49.598809 7f6b62bf5680 0 pidfile_write: ignore empty --pid-file 2018-02-22 11:30:49.791143 7fa2e38cc680 0 set uid:gid to 64045:64045 (ceph:ceph) 2018-02-22 11:30:49.791157 7fa2e38cc680 0 ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) luminous (stable), process (unknown), pid 1728 2018-02-22 11:30:49.791340 7fa2e38cc680 0 pidfile_write: ignore empty --pid-file
Here you can see that the Mgr was shut down for the reboot and after the reboot systemd tried to start it a couple of times.
systemd also shows that the mgr failed to start:
root@mon03:~# systemctl status ceph-mgr@mon03 ceph-mgr@mon03.service - Ceph cluster manager daemon Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: enabled) Active: inactive (dead) (Result: exit-code) since Thu 2018-02-22 11:30:50 CET; 2min 29s ago Process: 1728 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=255) Main PID: 1728 (code=exited, status=255) Feb 22 11:30:49 mon03.ceph.dz57927.ams02.cldin.net systemd[1]: ceph-mgr@mon03.service: Unit entered failed state. Feb 22 11:30:49 mon03.ceph.dz57927.ams02.cldin.net systemd[1]: ceph-mgr@mon03.service: Failed with result 'exit-code'. Feb 22 11:30:50 mon03.ceph.dz57927.ams02.cldin.net systemd[1]: ceph-mgr@mon03.service: Service hold-off time over, scheduling restart. Feb 22 11:30:50 mon03.ceph.dz57927.ams02.cldin.net systemd[1]: Stopped Ceph cluster manager daemon. Feb 22 11:30:50 mon03.ceph.dz57927.ams02.cldin.net systemd[1]: ceph-mgr@mon03.service: Start request repeated too quickly. Feb 22 11:30:50 mon03.ceph.dz57927.ams02.cldin.net systemd[1]: Failed to start Ceph cluster manager daemon. root@mon03:~#
A start doesn't work either:
root@mon03:~# systemctl start ceph-mgr@mon03 Job for ceph-mgr@mon03.service failed because the control process exited with error code. See "systemctl status ceph-mgr@mon03.service" and "journalctl -xe" for details. root@mon03:~#
Eventually reset-failed and start is required:
root@mon03:~# systemctl reset-failed ceph-mgr@mon03 root@mon03:~# systemctl start ceph-mgr@mon03 root@mon03:~#
I've seen this happen a few times and I have a feeling it is IPv6 related (this cluster runs IPv6).
The manager somehow seems to spawn prior to networking being enabled properly, but I haven't found any real evidence yet however.
Related issues
History
#1 Updated by Wido den Hollander about 6 years ago
#2 Updated by Sage Weil about 6 years ago
- Status changed from New to Pending Backport
- Backport set to luminous
#3 Updated by Nathan Cutler about 6 years ago
- Copied to Backport #23101: luminous: ceph-mgr fails to start after a system reboot on Ubuntu 16.04 added
#4 Updated by Nathan Cutler almost 6 years ago
- Status changed from Pending Backport to Resolved