Documentation #45936

cephadm: document restart the whole cluster

Added by Sebastian Wagner 9 months ago.

Target version:
% Done:


Affected Versions:
Pull request ID:


[15:23:28] <dcapone2004> I have a ceph dev cluster of 3 nodes deployed using cephadm with octopus on centos 8
[15:23:42] <-- beigestair ( hat das Netzwerk verlassen (Remote host closed the connection)
[15:24:26] <dcapone2004> this is going to be a hyperconverged openstack cluster, that i am essentially testing....a key element is that the location that we deploy this cluster at will change in about 18 months, so I have been trying to write up procedures to safely shut down the cluster and power it back  up
[15:24:35] <dcapone2004> this is the latest place where i have run into an issue
[15:24:52] --> ragedragon ( hat #ceph betreten
[15:25:39] <dcapone2004> I stopped all disk activity on the cluster, set osd noout, then sht down the nodes of the cluster 1 by 1, with the active Manager being LAST
[15:25:55] --> beigestair ( hat #ceph betreten
[15:26:24] <dcapone2004> a few hours later, I tried to power the cluster back up starting with the last active manager and going in the reverse order that i shut the down
[15:26:44] <dcapone2004> and no I lost 2 OSD containers and all my manager containers
[15:26:48] <dcapone2004> now*
[15:27:10] <dcapone2004> ceph orch daemon redploy does nothing, nor does restart
[15:27:43] <dcapone2004> and when simply trying podman start, podman claims to not know about those containers, but the ceph dashboard shows the OSDs are in but down
[15:28:38] <SebastianW> dcapone2004: what does "I lost my manager containers" mean?
[15:28:57] <dcapone2004> meaning ceph -s shows no active manager containers
[15:29:31] <dcapone2004> podman start (my hostname) says the container doesnt exist
[15:31:22] <dcapone2004> originally when i first started it up I only lost 1 of 2 containers, then after trying to use redeploy, the second disappeared....I am unsure if this is/was related to my attempt to upgrade to 15.2.3 which failed (and I filed a bug report for) and major the inconsistant version numbers between containers caused this issue

Also available in: Atom PDF