Bug #49013: cephadm: Service definition causes some container startups to fail - Orchestrator - Ceph

Actions

Copy link

Bug #49013

closed

cephadm: Service definition causes some container startups to fail

Added by Gunther Heinrich about 3 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Adam King

Category:

cephadm

Target version:

Ceph - v15.2.8

% Done:

Source:

Community (user)

Tags:

Backport:

pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

39435

Crash signature (v1):

Crash signature (v2):

Description

In Bug #48870 I described the problem of failing containers during startup after I updated the cluster from 15.2.5 to 15.2.8.

After some additional checks I found out that in one of those updates the service definition in /etc/systemd/system for Ceph (for my cluster it's named ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@.service) the following lines were added to the section [Service]:

ExecStartPre=-/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStopPost=-/bin/rm -f /%t/%n-pid /%t/%n-cid
Type=forking
PIDFile=/%t/%n-pid

After I commented out these lines, the containers started regularly again after a reboot.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Gunther Heinrich about 3 years ago

I finally found out that it's the line "Type=forking" that's causing those issues.

Here are the steps I took to verify it:

Cluster is in version 15.2.5, using Ubuntu 20.04.1 and Podman 2.1.1. ceph orch ps reports all daemons as running. "Type=forking" is not in any service file
Update cluster to version 15.2.8
After the update all daemons are reported as running. This is before a restart of all nodes. "Type=forking" is now present in all service files
Shutdown and restart of all nodes
Many containers don't start anymore during boot (journalctl -b reports timeout errors), ceph orch ps lists these in the error status
Manually edit all service definition files to comment out or remove "type=forking"
Shutdown and restart of all nodes
The cluster starts as expected without any errors, all daemons are reported as running, journalctl -b doesn't display any issues

Can someone reproduce this problem or is it only on my cluster? If not what might be a good fix for this? Having to manually edit the files created by cephadm doesn't seem to be a good solution.

Actions

Copy link

Updated by Kai Stian Olstad about 3 years ago

The change to forking is in this commit https://github.com/ceph/ceph/commit/e6792f306ab4d07251588fdca6ed3876ae3a092a
and you unit file should be updated to have a -d to podman to make the forking in the service file work.
But that probably didn't happen for you and I'm not sure how or if Cephadm does this change.

Actions

Copy link

Updated by Gunther Heinrich about 3 years ago

Thanks for your reply and your suggestion.

I can report that your solution indeed solves the issue. After I commented back in type=forking and added the following line in the unit.run file for the node exporter on osd-03

-d --conmon-pidfile /run/ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service-pid --cidfile /run/ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service-cid

the container did start as expected after a reboot. I also modified the files on osd-02 and osd-01 and afterwards everything started as expected (I will edit the other nodes tomorrow).

So cephadm did not or could not update the unit.run file during the cluster upgrade. This fits with the last modified timestamp which on the other nodes is from June last year.

Actions

Copy link

Updated by Gunther Heinrich about 3 years ago

Short update:

All monitor containers fail to start (alertmanager, grafana, node-exporter and prometheus) because none of their respective unit.run files are updated by cephadm with the additional pid and cid and -d arguments.

Also, two of three mgr containers fail to start with the exception of one which was the last mgr container to be updated (which puts the cluster close to a full failure):

MON-01: -rw------- 1 root root 1425 Jan 29 11:21 unit.run
MON-02: -rw------- 1 root root 1219 Jan 29 11:19 unit.run
MON-03: -rw------- 1 root root 1219 Jan 29 11:20 unit.run

The timestamps above show that cephadm indeed is modifying the unit.run files.

On the mgrs I also noticed that the order of arguments in the unit.run files differ between the running and failing mgrs as if the program code creating them differs. The entrypoint-argument for example is found earlier on the running mgr compared to the other two. Also the net and ipc arguments are reversed. Between the failing unit.run files the positions are consistent.
Here is the unit.run of the running mgr:

/usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph-mgr --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-01.elswai -d --conmon-pidfile /run/ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service-pid --cidfile /run/ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service-cid -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15.2.8 -e NODE_NAME=iz-ceph-v1-mon-01 -v /var/run/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c:/var/run/ceph:z -v /var/log/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c:/var/log/ceph:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/mgr.iz-ceph-v1-mon-01.elswai:/var/lib/ceph/mgr/ceph-iz-ceph-v1-mon-01.elswai:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/mgr.iz-ceph-v1-mon-01.elswai/config:/etc/ceph/ceph.conf:z docker.io/ceph/ceph:v15.2.8 -n mgr.iz-ceph-v1-mon-01.elswai -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix="debug "

First failing mgr unit.run:

/usr/bin/podman run --rm --net=host --ipc=host --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-02.foqmfa -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15.2.8 -e NODE_NAME=iz-ceph-v1-mon-02 -v /var/run/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c:/var/run/ceph:z -v /var/log/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c:/var/log/ceph:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/mgr.iz-ceph-v1-mon-02.foqmfa:/var/lib/ceph/mgr/ceph-iz-ceph-v1-mon-02.foqmfa:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/mgr.iz-ceph-v1-mon-02.foqmfa/config:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph-mgr docker.io/ceph/ceph:v15.2.8 -n mgr.iz-ceph-v1-mon-02.foqmfa -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix="debug "

Second failing mgr unit.run:

/usr/bin/podman run --rm --net=host --ipc=host --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-03.ncnoal -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15.2.8 -e NODE_NAME=iz-ceph-v1-mon-03 -v /var/run/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c:/var/run/ceph:z -v /var/log/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c:/var/log/ceph:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/mgr.iz-ceph-v1-mon-03.ncnoal:/var/lib/ceph/mgr/ceph-iz-ceph-v1-mon-03.ncnoal:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/mgr.iz-ceph-v1-mon-03.ncnoal/config:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph-mgr docker.io/ceph/ceph:v15.2.8 -n mgr.iz-ceph-v1-mon-03.ncnoal -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix="debug "

Actions

Copy link

Updated by Gunther Heinrich about 3 years ago

I have an assumption and hope that someone can check if I'm correct.

For the update to 15.2.8 the service definition was changed to include forking. Thus, all unit.run files need to be updated to include the new arguments otherwise the containers fail to start.

During a cluster upgrade to 15.2.8 the new service definition gets deployed as expected. Therefore all unit.run files need an update. But several external containers didn't change in the last few months. Therefore cephadm doesn't redeploy those containers and therefore doesn't update their unit.run files (or any other configuration file).

Here's a cephadm log entry for prometheus on mon-01 which hints at this underlying issue:

    {
        "style": "cephadm:v1",
        "name": "prometheus.iz-ceph-v1-mon-01",
        "fsid": "68317c90-b44f-11ea-a0c4-d1443a31407c",
        "systemd_unit": "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01",
        "enabled": true,
        "state": "running",
        "container_id": "2acb500ca9f4c29ee878d7ff9c9fc4b0085735d1e649630254edf6f036811c14",
        "container_image_name": "docker.io/prom/prometheus:latest",
        "container_image_id": "396dc3b4e71788e1683c86b085f620203b92d21653bf3b3e74d3f0ae928d3502",
        "version": "2.19.1",
        "started": "2021-02-01T08:28:31.674698",
        "created": "2020-06-22T06:15:26.631727",
        "deployed": "2020-06-22T06:15:25.723729",
        "configured": "2020-07-17T06:55:43.515490" 
    }

Again, none of the unit.run files for the external containers were changed since June of last year (following list from mon-01):

alertmanager:        -rw-------  1 root   root     333     Jun 22  2020     unit.run
grafana:             -rw-------  1 root   root     629     Jun 22  2020     unit.run
node-exporter:       -rw-------  1 root   root     299     Jun 22  2020     unit.run
prometheus:          -rw-------  1 root   root     545     Jun 22  2020     unit.run

Does cephadm check the unit.run files and compare the actual state on the disk with a target state created during runtime?

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

Assignee set to Adam King

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

Related to Bug #48870: cephadm: Several services in error status after upgrade to 15.2.8: unrecognized arguments: --filter-for-batch added

Actions

Copy link

Updated by Gunther Heinrich about 3 years ago

Based on some code analysis in "/src/pybind/mgr/cephadm/upgrade.py" in found

for daemon_type in CEPH_UPGRADE_ORDER:
       logger.info('Upgrade: Checking %s daemons...' % daemon_type)

and in "/src/pybind/mgr/cephadm/utils.py" I found

CEPH_UPGRADE_ORDER = ['mgr', 'mon', 'crash', 'osd', 'mds', 'rgw', 'rbd-mirror']

...I now assume that the monitor containers are not part of a cluster upgrade at all (which is reasonable) and thus their unit.run files of course do not get updated despite the changed systemd settings. Am I correct with this assumption?

For testing purposes I now forced a repair by redeploying all monitor containers. For prometheus, alertmanager and grafana the redeployment helped to solve the issue. So one solution might be to include a redeployment flag when important settings get changed.

When I redeployed the mgr daemons / containers the cluster mgr stopped working for a rather long time where I saw constant reloads and redeployments. After several minutes it seemed to resolve itself so this felt like a close call.

Here's an excerpt from journalctl -b from one nodef or the record:

Feb 03 14:29:41 iz-ceph-v1-mon-03 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-03.ncnoal.service: Main process exited, code=exited, status=137/n/a
Feb 03 14:29:41 iz-ceph-v1-mon-03 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-03.ncnoal.service: Failed with result 'exit-code'.
Feb 03 14:29:41 iz-ceph-v1-mon-03 systemd[1]: Stopped Ceph mgr.iz-ceph-v1-mon-03.ncnoal for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Feb 03 14:29:41 iz-ceph-v1-mon-03 systemd[1]: Reloading.

Actions

Copy link

Updated by Marvin Boothby about 3 years ago

Hello there,

we are running two identical v15.2.8 clusters who apparently now show the same problem. I also did not register any problems until the machines were rebooted for CentOS upgrades.

The command orch ps shows alertmanager, grafana, node exporter and prometheus with error status. And also the associated systemd services are failed. Also on both clusters one of the two running mgr daemons shows the same error. After finding this issue I scanned for the mentioned unit.run files and indeed they are missing the -d --conmon-pidfile line. And also the line is missing in the respective mgr unit.run file that is showing the error. The working mgr daemon shows the line!

So apparently in addition to alertmanager, grafana, node exporter and prometheus there could be something wrong with the upgrade in respect to this modification. It is consistent on both clusters. I can provide additional information if needed.

Actions

Copy link

#10

Updated by Marvin Boothby about 3 years ago

Sorry @Gunther I kind of missed the part where you reported that the mgr damon shows the same behaviour I observed. Thank you for the in depth analysis.

So the state of our two clusters seem to be identical to what you describe. They run on latest CentOS 8 and were installed with cephadm bootstrap. Upgrade to v15.2.8 was done from v15.2.7.

Actions

Copy link

#11

Updated by Gunther Heinrich about 3 years ago

@Marvin
Thanks for confirming that this is an issue unrelated to my cluster.

I did another upgrade (this time from 15.2.7 to 15.2.8) to check when the important files are changed in relation to the actions of cephadm. Here's the cephadm log where I inserted the times the unit.run files are last changed and if the new arguments are missing or not:

2021-02-05T06:37:45.923363+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 232 : cephadm [INF] Upgrade: First pull of docker.io/ceph/ceph:v15.2.8
2021-02-05T06:37:51.788291+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 236 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-05T06:37:51.791298+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 237 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-05T06:37:52.643497+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 239 : cephadm [INF] It is presumed safe to stop ['mgr.iz-ceph-v1-mon-01.elswai']
2021-02-05T06:37:52.643550+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 240 : cephadm [INF] Upgrade: It is presumed safe to stop ['mgr.iz-ceph-v1-mon-01.elswai']
2021-02-05T06:37:52.643607+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 241 : cephadm [INF] Upgrade: Redeploying mgr.iz-ceph-v1-mon-01.elswai
2021-02-05T06:37:52.714120+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 242 : cephadm [INF] Deploying daemon mgr.iz-ceph-v1-mon-01.elswai on iz-ceph-v1-mon-01
    --> MGR unit.run file on MON-01 gets updated without new arguments
2021-02-05T06:38:11.065718+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 252 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-05T06:38:11.068201+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 253 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-05T06:38:11.068533+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 254 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
2021-02-05T06:38:12.943025+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 256 : cephadm [INF] Upgrade: Pulling docker.io/ceph/ceph:v15.2.8 on iz-ceph-v1-mon-03
2021-02-05T06:39:24.918650+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 293 : cephadm [INF] It is presumed safe to stop ['mgr.iz-ceph-v1-mon-03.ncnoal']
2021-02-05T06:39:24.918704+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 294 : cephadm [INF] Upgrade: It is presumed safe to stop ['mgr.iz-ceph-v1-mon-03.ncnoal']
2021-02-05T06:39:24.918746+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 295 : cephadm [INF] Upgrade: Redeploying mgr.iz-ceph-v1-mon-03.ncnoal
2021-02-05T06:39:24.966148+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 296 : cephadm [INF] Deploying daemon mgr.iz-ceph-v1-mon-03.ncnoal on iz-ceph-v1-mon-03
    --> MGR unit.run file on MON-03 gets updated without new arguments
2021-02-05T06:39:48.743168+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 309 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-05T06:39:48.745568+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 310 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-05T06:39:48.745948+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 311 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
2021-02-05T06:39:50.852772+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 313 : cephadm [INF] Upgrade: Pulling docker.io/ceph/ceph:v15.2.8 on iz-ceph-v1-mon-04
2021-02-05T06:41:04.475255+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 351 : cephadm [INF] It is presumed safe to stop ['mgr.iz-ceph-v1-mon-04.hywkah']
2021-02-05T06:41:04.475319+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 352 : cephadm [INF] Upgrade: It is presumed safe to stop ['mgr.iz-ceph-v1-mon-04.hywkah']
2021-02-05T06:41:04.475359+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 353 : cephadm [INF] Upgrade: Redeploying mgr.iz-ceph-v1-mon-04.hywkah
2021-02-05T06:41:04.533657+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 354 : cephadm [INF] Deploying daemon mgr.iz-ceph-v1-mon-04.hywkah on iz-ceph-v1-mon-04
    --> MGR unit.run file on MON-04 gets updated without new arguments
2021-02-05T06:41:22.513102+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 364 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-05T06:41:22.515431+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 365 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-05T06:41:22.515811+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 366 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
2021-02-05T06:41:23.758114+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 367 : cephadm [INF] Upgrade: Pulling docker.io/ceph/ceph:v15.2.8 on iz-ceph-v1-mon-05
2021-02-05T06:42:38.534789+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 406 : cephadm [INF] It is presumed safe to stop ['mgr.iz-ceph-v1-mon-05.mttnai']
2021-02-05T06:42:38.534887+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 407 : cephadm [INF] Upgrade: It is presumed safe to stop ['mgr.iz-ceph-v1-mon-05.mttnai']
2021-02-05T06:42:38.534966+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 408 : cephadm [INF] Upgrade: Redeploying mgr.iz-ceph-v1-mon-05.mttnai
2021-02-05T06:42:38.612822+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 409 : cephadm [INF] Deploying daemon mgr.iz-ceph-v1-mon-05.mttnai on iz-ceph-v1-mon-05
    --> MGR unit.run file on MON-05 gets updated without new arguments
2021-02-05T06:42:56.324761+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 419 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-05T06:42:56.327238+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 420 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-05T06:42:56.327583+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 421 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.iz-ceph-v1-mon-02.foqmfa)
2021-02-05T06:42:56.332113+0000 mgr.iz-ceph-v1-mon-02.foqmfa (mgr.20214122) 422 : cephadm [INF] Upgrade: there are 4 other already-upgraded standby mgrs, failing over
2021-02-05T06:43:34.370615+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 16 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-05T06:43:34.373661+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 17 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-05T06:43:37.710492+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 20 : cephadm [INF] It is presumed safe to stop ['mgr.iz-ceph-v1-mon-02.foqmfa']
2021-02-05T06:43:37.710568+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 21 : cephadm [INF] Upgrade: It is presumed safe to stop ['mgr.iz-ceph-v1-mon-02.foqmfa']
2021-02-05T06:43:37.710623+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 22 : cephadm [INF] Upgrade: Redeploying mgr.iz-ceph-v1-mon-02.foqmfa
2021-02-05T06:43:37.754525+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 23 : cephadm [INF] Deploying daemon mgr.iz-ceph-v1-mon-02.foqmfa on iz-ceph-v1-mon-02
    --> MGR unit.run file on MON-02 gets updated with new arguments
2021-02-05T06:43:55.650805+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 33 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.8 with id 5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185
2021-02-05T06:43:55.654291+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 34 : cephadm [INF] Upgrade: Checking mgr daemons...
2021-02-05T06:43:55.657735+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 35 : cephadm [WRN] Upgrade: 1 mgr daemon(s) are ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable) != target ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable)
2021-02-05T06:43:55.658030+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 36 : cephadm [INF] Upgrade: Setting container_image for all mgr...
2021-02-05T06:43:55.969131+0000 mgr.iz-ceph-v1-mon-05.mttnai (mgr.20217556) 37 : cephadm [INF] Upgrade: All mgr daemons are up to date.

Based on these times and actions I think that the one working mgr seems to be related to the mgr failing over to a different node.

So, here's yet another assumption regarding the upgrade process:

The cluster is in version 15.2.7, the active mgr is on mon-1 the other two mgr are on mon-2 and mon-3
The upgrade process starts
The active mgr upgrades the mgr on mon-2
The active mgr upgrades the mgr on mon-3
A fail over is triggered
The now active mgr on mon-2 upgrades the mgr on mon-1

The important issue is that since the active mgr upgrades the other two mgr daemons, it does this without being upgraded itself. Therefore an outdated mgr in version 15.2.7 upgrades the other mgr daemons to 15.2.8, resulting in the missing arguments in those unit.run files. After the fail over, a different mgr daemon takes over, this time being in version 15.2.8. Because of that, the unit.run file of the failed over mgr contains the new arguments.

This process might also explain why at least one working mgr is present after all my upgrade attempts because the last one is always upgraded by a mgr daemon in version 15.2.8.

If I'm correct with this assumption this might be a huge issue because any big configuration change in a cluster might not get deployed correctly because the upgrade process by design always starts/needs to start from an older codebase.

Actions

Copy link

#12

Updated by Sage Weil about 3 years ago

Okay, summarizing to make sure I understand. We have two problems:

1. During upgrade, the new mgr doesn't redeploy the other mgrs (again) to ensure the unit.run file is in sync with the (possibly new) unit file.
2. The monitoring containers aren't redeployed at all.

This wasn't explicitly stated in the initial ticket, but the only daemons that failed to start where the mgrs (all but 1), and the monitoring daemons?

Actions

Copy link

#13

Updated by Sebastian Wagner about 3 years ago

Status changed from New to In Progress

Actions

Copy link

#14

Updated by Marvin Boothby about 3 years ago

Sage Weil wrote:

This wasn't explicitly stated in the initial ticket, but the only daemons that failed to start where the mgrs (all but 1), and the monitoring daemons?

In my case that is how it was.

In the meantime I did an mgr redeployment on one cluster and now all my mgrs are in a good state. Orchestrator and systemd are happy. unit.run file is now consistent.

Just to be clear. In my case orchestrator and systemd are not happy (systemd sees services as failed but they are running) but all daemons and mgrs were running all the time. It only got "dangerous" during the redeploy. I will now try to also redeploy the monitoring daemons.

Actions

Copy link

#15

Updated by Gunther Heinrich about 3 years ago

1. During upgrade, the new mgr doesn't redeploy the other mgrs (again) to ensure the unit.run file is in sync with the (possibly new) unit file.

That seems to be the case.

The monitoring containers aren't redeployed at all.

Yes.

Here's a list of the failed daemons (ceph orch ps) from one of the first upgrade attempts to 15.2.8 to clarify:

alertmanager.iz-ceph-v1-mon-01       iz-ceph-v1-mon-01  error          4m ago     6M   0.21.0   docker.io/prom/alertmanager:latest   c876f5897d7b  9a771c886b09
crash.iz-ceph-v1-mon-01              iz-ceph-v1-mon-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  93d9a3267a2d
crash.iz-ceph-v1-mon-02              iz-ceph-v1-mon-02  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  574787725a46
crash.iz-ceph-v1-mon-03              iz-ceph-v1-mon-03  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  7f94569f5195
crash.iz-ceph-v1-osd-01              iz-ceph-v1-osd-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  5182be56dc6b
crash.iz-ceph-v1-osd-02              iz-ceph-v1-osd-02  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  9e23c565fc64
crash.iz-ceph-v1-osd-03              iz-ceph-v1-osd-03  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  12ea957024f8
grafana.iz-ceph-v1-mon-01            iz-ceph-v1-mon-01  error          4m ago     6M   6.6.2    docker.io/ceph/ceph-grafana:latest   87a51ecf0b1c  43c6d192a4dd
mds.cephfs.iz-ceph-v1-mon-01.vjehnz  iz-ceph-v1-mon-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  5ad9c3fae67d
mds.cephfs.iz-ceph-v1-mon-02.cbdqzm  iz-ceph-v1-mon-02  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  5e732eb43b59
mds.cephfs.iz-ceph-v1-mon-03.zpxfkg  iz-ceph-v1-mon-03  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  71ed07a47be2
mgr.iz-ceph-v1-mon-01.elswai         iz-ceph-v1-mon-01  error          4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  567b649f7825
mgr.iz-ceph-v1-mon-02.foqmfa         iz-ceph-v1-mon-02  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  5b2014ba51da
mgr.iz-ceph-v1-mon-03.ncnoal         iz-ceph-v1-mon-03  error          4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  115effface96
mon.iz-ceph-v1-mon-01                iz-ceph-v1-mon-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  7fc683227cee
mon.iz-ceph-v1-mon-02                iz-ceph-v1-mon-02  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  01951e4b5a5e
mon.iz-ceph-v1-mon-03                iz-ceph-v1-mon-03  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  ed2e540108f2
node-exporter.iz-ceph-v1-mon-01      iz-ceph-v1-mon-01  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  f8d1e263f5bd
node-exporter.iz-ceph-v1-mon-02      iz-ceph-v1-mon-02  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  deb31c0e96fe
node-exporter.iz-ceph-v1-mon-03      iz-ceph-v1-mon-03  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  803ad1ead652
node-exporter.iz-ceph-v1-osd-01      iz-ceph-v1-osd-01  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  31a86adea0da
node-exporter.iz-ceph-v1-osd-02      iz-ceph-v1-osd-02  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  98aba976cb23
node-exporter.iz-ceph-v1-osd-03      iz-ceph-v1-osd-03  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  121aed124f3b
osd.0                                iz-ceph-v1-osd-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  19b8ae70354c
osd.1                                iz-ceph-v1-osd-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  2569a941b756
osd.2                                iz-ceph-v1-osd-02  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  4e0a96f76663
osd.3                                iz-ceph-v1-osd-02  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  7256225f6fd7
osd.4                                iz-ceph-v1-osd-03  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  f86ecc759cee
osd.5                                iz-ceph-v1-osd-03  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  8da57f4da01f
prometheus.iz-ceph-v1-mon-01         iz-ceph-v1-mon-01  error          4m ago     6M   2.19.1   docker.io/prom/prometheus:latest     396dc3b4e717  2a0a8f4c368c

Although I am no Ceph developer I have two amateurish ideas that might help to resolve these issues:

The upgrade process includes all monitoring daemons so that they at least get redeployed. This also would help to resolve the problem that those daemons never get updated in a running cluster over time and might become outdated/incompatible. For example, in the list above you see that all monitoring daemons are deployed in the version "latest". Several months ago this was changed to explicit version numbers. Yet, in my cluster this was never updated by any upgrade
You could move the upgrade process into a separate daemon. Instead of "bootstrapping" the upgrade process from a running, possibly outdated mgr daemon, a freshly deployed "mgr update daemon" does this job. This way, you could ensure that the latest version is in charge of upgrading and all code related issues (exceptions, loops, memory leaks etc.) get isolated in that daemon. After the upgrade is finished the upgrade daemon could be then stopped and removed

Actions

Copy link

#16

Updated by Sebastian Wagner about 3 years ago

Pull request ID set to 39385

Actions

Copy link

#17

Updated by Adam King about 3 years ago

Status changed from In Progress to Fix Under Review
Pull request ID changed from 39385 to 39435

Actions

Copy link

#18

Updated by Adam King about 3 years ago

Note: https://github.com/ceph/ceph/pull/39385 is intended to allow redeploy of daemons put into the state described in this bug using an orch command to fix them rather than requiring manually fixing them by going to the host and changing the unit.run files. https://github.com/ceph/ceph/pull/39435 is intended to make sure all daemons are deployed by a mgr running the new image at the end of an upgrade so the problem is dealt with as part of the upgrade procedure and no additional user actions are necessary.

Actions

Copy link

#19

Updated by Sebastian Wagner about 3 years ago

Priority changed from Normal to Immediate

Actions

Copy link

#20

Updated by Sebastian Wagner about 3 years ago

Priority changed from Immediate to Urgent

Actions

Copy link

#21

Updated by Sebastian Wagner about 3 years ago

Workaround

When upgrading from a previous cephadm release, please redeploy iSCSI, NFS and the monitoring stack. Please run

ceph orch redeploy nfs
ceph orch redeploy iscsi
ceph orch redeploy node-exporter
ceph orch redeploy prometheus
ceph orch redeploy grafana
ceph orch redeploy alertmanager

to upgrade the rest of the services

Actions

Copy link

#22

Updated by Sage Weil about 3 years ago

Status changed from Fix Under Review to Pending Backport
Backport set to pacific

Actions

Copy link

#23

Updated by Sage Weil about 3 years ago

Status changed from Pending Backport to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Bug #49013

cephadm: Service definition causes some container startups to fail

Updated by Gunther Heinrich about 3 years ago

Updated by Kai Stian Olstad about 3 years ago

Updated by Gunther Heinrich about 3 years ago

Updated by Gunther Heinrich about 3 years ago

Updated by Gunther Heinrich about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Gunther Heinrich about 3 years ago

Updated by Marvin Boothby about 3 years ago

Updated by Marvin Boothby about 3 years ago

Updated by Gunther Heinrich about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Marvin Boothby about 3 years ago

Updated by Gunther Heinrich about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Adam King about 3 years ago

Updated by Adam King about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Sage Weil about 3 years ago