Bug #47702: upgrading via ceph orch upgrade start results in partial application and mixed state - Orchestrator - Ceph

Bug #47702

Updated by Jan Fajerski over 3 years ago

Following https://docs.ceph.com/en/latest/cephadm/upgrade/#using-customized-container-images I attempted to upgrade _downgrade_ my cluster. 

 The process starts fine but I end up in a weird state with two mgr daemons upgraded, the upgrade seemingly succeeded and a HEALTH_WARN. 

 Starting with a healthy cluster at version 15.2.4-944-g85788353cf (SUSE downstream container) I run @ceph orch upgrade start --image <custom registry url>/containers/ses/7/containers/ses/7/ceph/ceph:15.2.5-220-gb758bfd693@. This starts the process alright and I can see the progress of the image pull in ceph -s. 

 After a while this finishes and left the cluster in the following state: 

 <pre> 
 master:~ # ceph -s 
   cluster: 
     id:       4405d2ce-031b-11eb-a7e5-525400088cac 
     health: HEALTH_WARN 
             1 hosts fail cephadm check 
 
   services: 
     mon: 3 daemons, quorum master,node2,node1 (age 113m) 
     mgr: node2.toaphn(active, since 17m), standbys: master.pthpjq, node1.tcqcfr 
     osd: 20 osds: 20 up (since 112m), 20 in (since 112m) 
 
   data: 
     pools:     1 pools, 1 pgs 
     objects: 0 objects, 0 B 
     usage:     20 GiB used, 140 GiB / 160 GiB avail 
     pgs:       1 active+clean 
 
 master:~ # ceph versions 
 { 
     "mon": { 
         "ceph version 15.2.4-944-g85788353cf (85788353cfa5b673d4966d4748513c33dbee228e) octopus (stable)": 3 
     }, 
     "mgr": { 
         "ceph version 15.2.4-944-g85788353cf (85788353cfa5b673d4966d4748513c33dbee228e) octopus (stable)": 1, 
         "ceph version 15.2.5-220-gb758bfd693 (b758bfd69359a0ffa10bd5426d64e7636bb0a6c6) octopus (stable)": 2 
     }, 
     "osd": { 
         "ceph version 15.2.4-944-g85788353cf (85788353cfa5b673d4966d4748513c33dbee228e) octopus (stable)": 20 
     }, 
     "mds": {}, 
     "overall": { 
         "ceph version 15.2.4-944-g85788353cf (85788353cfa5b673d4966d4748513c33dbee228e) octopus (stable)": 24, 
         "ceph version 15.2.5-220-gb758bfd693 (b758bfd69359a0ffa10bd5426d64e7636bb0a6c6) octopus (stable)": 2 
     } 
 } 
 </pre> 

 The current active mgr could not be failed. 

 <pre> 
 master:~ # ceph mgr fail toaphn 
 Daemon not found 'toaphn', already failed? 
 </pre>

Back

Project

General

Profile

Ceph » Orchestrator

Bug #47702