Bug #56674
openUpgrade to v17.2.2 first step creates mgrs which cannot start ??
0%
Description
after starting the update to 17.2.2, first all mgrs are upgraded but they seem not be able to start :-(
When only one mgr is left (running on the previous version) the upgrade stops as no backup mgr is available !!
How to recover from this situation ??
Upgrade to 17.2.1 does not work as well:
Error EINVAL: Need at least 2 running mgr daemons for upgrade
The dashboard is also not available !!!
Trying to restart mgrs on the other nodes always fails:
myceph@ceph-d:~$ sudo systemctl restart ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service
Job for ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service failed because the control process exited with error code.
See "systemctl status ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service" and "journalctl -xe" for details.
myceph@ceph-d:~$ sudo systemctl status ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service
● ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service - Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4
Loaded: loaded (/etc/systemd/system/ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2022-07-22 05:50:53 UTC; 8min ago
Process: 1708450 ExecStart=/bin/bash /var/lib/ceph/2af6c250-ccb0-4fe8-80af-f2181098c0e4/mgr.ceph-d.fycgtr/unit.run (code=exited, status=134)
Process: 1708751 ExecStopPost=/bin/bash /var/lib/ceph/2af6c250-ccb0-4fe8-80af-f2181098c0e4/mgr.ceph-d.fycgtr/unit.poststop (code=exited, status=0/SUCCESS)
Main PID: 1708450 (code=exited, status=134)
Jul 22 05:50:53 ceph-d systemd1: Failed to start Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4.
Jul 22 05:54:39 ceph-d systemd1: ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service: Start request repeated too quickly.
Jul 22 05:54:39 ceph-d systemd1: ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service: Failed with result 'exit-code'.
Jul 22 05:54:39 ceph-d systemd1: Failed to start Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4.
Jul 22 05:57:37 ceph-d systemd1: ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service: Start request repeated too quickly.
Jul 22 05:57:37 ceph-d systemd1: ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service: Failed with result 'exit-code'.
Jul 22 05:57:37 ceph-d systemd1: Failed to start Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4.
Jul 22 05:59:12 ceph-d systemd1: ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service: Start request repeated too quickly.
Jul 22 05:59:12 ceph-d systemd1: ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service: Failed with result 'exit-code'.
Jul 22 05:59:12 ceph-d systemd1: Failed to start Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4.
if we catch the status of the starting mgr we get:
myceph@ceph-d:~$ sudo systemctl status ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service
● ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service - Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4
Loaded: loaded (/etc/systemd/system/ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2022-07-22 06:17:32 UTC; 4s ago
Main PID: 1714307 (bash)
Tasks: 17 (limit: 77056)
Memory: 28.4M
CGroup: /system.slice/system-ceph\x2d2af6c250\x2dccb0\x2d4fe8\x2d80af\x2df2181098c0e4.slice/ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@mgr.ceph-d.fycgtr.service
├─1714307 /bin/bash /var/lib/ceph/2af6c250-ccb0-4fe8-80af-f2181098c0e4/mgr.ceph-d.fycgtr/unit.run
└─1714359 /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-mgr --init --name ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4-mgr-ceph-d-fycgtr -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:1ed3a9d5b1ba45232047ed4b8ce7e265f2413f98eb27ab72c8acab4289>
Jul 22 06:17:35 ceph-d bash1714359: debug 2022-07-22T06:17:35.872+0000 7faf32cf0000 1 mgr[py] Loading python module 'test_orchestrator'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.276+0000 7faf32cf0000 -1 mgr[py] Module test_orchestrator has missing NOTIFY_TYPES member
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.276+0000 7faf32cf0000 1 mgr[py] Loading python module 'mirroring'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.424+0000 7faf32cf0000 1 mgr[py] Loading python module 'status'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.560+0000 7faf32cf0000 -1 mgr[py] Module status has missing NOTIFY_TYPES member
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.560+0000 7faf32cf0000 1 mgr[py] Loading python module 'progress'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.696+0000 7faf32cf0000 -1 mgr[py] Module progress has missing NOTIFY_TYPES member
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.696+0000 7faf32cf0000 1 mgr[py] Loading python module 'osd_support'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.840+0000 7faf32cf0000 -1 mgr[py] Module osd_support has missing NOTIFY_TYPES member
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.840+0000 7faf32cf0000 1 mgr[py] Loading python module 'snap_schedule'
Updated by Benno Lange almost 2 years ago
After several tries restarting mgrs manually, we caught a moment where 2 mgrs where available so we could issue an upgrade back to 17.2.1 which seemed to work for now ......