Project

General

Profile

Actions

Bug #56674

open

Upgrade to v17.2.2 first step creates mgrs which cannot start ??

Added by Benno Lange almost 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/client-upgrade
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

after starting the update to 17.2.2, first all mgrs are upgraded but they seem not be able to start :-(
When only one mgr is left (running on the previous version) the upgrade stops as no backup mgr is available !!
How to recover from this situation ??

Upgrade to 17.2.1 does not work as well:
Error EINVAL: Need at least 2 running mgr daemons for upgrade
The dashboard is also not available !!!

Trying to restart mgrs on the other nodes always fails:

myceph@ceph-d:~$ sudo systemctl restart
Job for failed because the control process exited with error code.
See "systemctl status " and "journalctl -xe" for details.
myceph@ceph-d:~$ sudo systemctl status
- Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4
Loaded: loaded (/etc/systemd/system/ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2022-07-22 05:50:53 UTC; 8min ago
Process: 1708450 ExecStart=/bin/bash /var/lib/ceph/2af6c250-ccb0-4fe8-80af-f2181098c0e4/mgr.ceph-d.fycgtr/unit.run (code=exited, status=134)
Process: 1708751 ExecStopPost=/bin/bash /var/lib/ceph/2af6c250-ccb0-4fe8-80af-f2181098c0e4/mgr.ceph-d.fycgtr/unit.poststop (code=exited, status=0/SUCCESS)
Main PID: 1708450 (code=exited, status=134)

Jul 22 05:50:53 ceph-d systemd1: Failed to start Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4.
Jul 22 05:54:39 ceph-d systemd1: : Start request repeated too quickly.
Jul 22 05:54:39 ceph-d systemd1: : Failed with result 'exit-code'.
Jul 22 05:54:39 ceph-d systemd1: Failed to start Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4.
Jul 22 05:57:37 ceph-d systemd1: : Start request repeated too quickly.
Jul 22 05:57:37 ceph-d systemd1: : Failed with result 'exit-code'.
Jul 22 05:57:37 ceph-d systemd1: Failed to start Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4.
Jul 22 05:59:12 ceph-d systemd1: : Start request repeated too quickly.
Jul 22 05:59:12 ceph-d systemd1: : Failed with result 'exit-code'.
Jul 22 05:59:12 ceph-d systemd1: Failed to start Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4.

if we catch the status of the starting mgr we get:

myceph@ceph-d:~$ sudo systemctl status
- Ceph mgr.ceph-d.fycgtr for 2af6c250-ccb0-4fe8-80af-f2181098c0e4
Loaded: loaded (/etc/systemd/system/ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4@.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2022-07-22 06:17:32 UTC; 4s ago
Main PID: 1714307 (bash)
Tasks: 17 (limit: 77056)
Memory: 28.4M
CGroup: /system.slice/system-ceph\x2d2af6c250\x2dccb0\x2d4fe8\x2d80af\
├─1714307 /bin/bash /var/lib/ceph/2af6c250-ccb0-4fe8-80af-f2181098c0e4/mgr.ceph-d.fycgtr/unit.run
└─1714359 /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-mgr --init --name ceph-2af6c250-ccb0-4fe8-80af-f2181098c0e4-mgr-ceph-d-fycgtr -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:1ed3a9d5b1ba45232047ed4b8ce7e265f2413f98eb27ab72c8acab4289>

Jul 22 06:17:35 ceph-d bash1714359: debug 2022-07-22T06:17:35.872+0000 7faf32cf0000 1 mgr[py] Loading python module 'test_orchestrator'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.276+0000 7faf32cf0000 -1 mgr[py] Module test_orchestrator has missing NOTIFY_TYPES member
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.276+0000 7faf32cf0000 1 mgr[py] Loading python module 'mirroring'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.424+0000 7faf32cf0000 1 mgr[py] Loading python module 'status'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.560+0000 7faf32cf0000 -1 mgr[py] Module status has missing NOTIFY_TYPES member
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.560+0000 7faf32cf0000 1 mgr[py] Loading python module 'progress'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.696+0000 7faf32cf0000 -1 mgr[py] Module progress has missing NOTIFY_TYPES member
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.696+0000 7faf32cf0000 1 mgr[py] Loading python module 'osd_support'
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.840+0000 7faf32cf0000 -1 mgr[py] Module osd_support has missing NOTIFY_TYPES member
Jul 22 06:17:36 ceph-d bash1714359: debug 2022-07-22T06:17:36.840+0000 7faf32cf0000 1 mgr[py] Loading python module 'snap_schedule'

Actions #1

Updated by Benno Lange almost 2 years ago

After several tries restarting mgrs manually, we caught a moment where 2 mgrs where available so we could issue an upgrade back to 17.2.1 which seemed to work for now ......

Actions

Also available in: Atom PDF