Bug #47702
Updated by Jan Fajerski over 3 years ago
Following https://docs.ceph.com/en/latest/cephadm/upgrade/#using-customized-container-images I attempted to upgrade _downgrade_ my cluster. The process starts fine but I end up in a weird state with two mgr daemons upgraded, the upgrade seemingly succeeded and a HEALTH_WARN. Starting with a healthy cluster at version 15.2.4-944-g85788353cf (SUSE downstream container) I run @ceph orch upgrade start --image <custom registry url>/containers/ses/7/containers/ses/7/ceph/ceph:15.2.5-220-gb758bfd693@. This starts the process alright and I can see the progress of the image pull in ceph -s. After a while this finishes and left the cluster in the following state: <pre> master:~ # ceph -s cluster: id: 4405d2ce-031b-11eb-a7e5-525400088cac health: HEALTH_WARN 1 hosts fail cephadm check services: mon: 3 daemons, quorum master,node2,node1 (age 113m) mgr: node2.toaphn(active, since 17m), standbys: master.pthpjq, node1.tcqcfr osd: 20 osds: 20 up (since 112m), 20 in (since 112m) data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 20 GiB used, 140 GiB / 160 GiB avail pgs: 1 active+clean master:~ # ceph versions { "mon": { "ceph version 15.2.4-944-g85788353cf (85788353cfa5b673d4966d4748513c33dbee228e) octopus (stable)": 3 }, "mgr": { "ceph version 15.2.4-944-g85788353cf (85788353cfa5b673d4966d4748513c33dbee228e) octopus (stable)": 1, "ceph version 15.2.5-220-gb758bfd693 (b758bfd69359a0ffa10bd5426d64e7636bb0a6c6) octopus (stable)": 2 }, "osd": { "ceph version 15.2.4-944-g85788353cf (85788353cfa5b673d4966d4748513c33dbee228e) octopus (stable)": 20 }, "mds": {}, "overall": { "ceph version 15.2.4-944-g85788353cf (85788353cfa5b673d4966d4748513c33dbee228e) octopus (stable)": 24, "ceph version 15.2.5-220-gb758bfd693 (b758bfd69359a0ffa10bd5426d64e7636bb0a6c6) octopus (stable)": 2 } } </pre> The current active mgr could not be failed. <pre> master:~ # ceph mgr fail toaphn Daemon not found 'toaphn', already failed? </pre>