Project

General

Profile

Actions

Bug #51272

closed

upgrade job: mgr.x getting removed by cephadm task: UPGRADE_NO_STANDBY_MGR

Added by Sebastian Wagner almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Immediate
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I think this bug is not yet merged.

rados/cephadm/upgrade/{1-start-distro/1-start-ubuntu_20.04 2-repo_digest/defaut 3-start-upgrade 4-wait fixed-2}
  roles:
  - - mon.a
    - mon.c
    - mgr.y
    - osd.0
    - osd.1
    - osd.2
    - osd.3
    - client.0
    - node-exporter.a
    - alertmanager.a
  - - mon.b
    - mgr.x
    - osd.4
    - osd.5
    - osd.6
    - osd.7
    - client.1
    - prometheus.a
    - grafana.a
    - node-exporter.b

then

: audit 2021-06-15T20:14:24.260141+0000 mgr.y (mgr.14138) 64 : audit [DBG] from='client.34106 -' entity='client.admin' cmd=[{"prefix": "orch apply", "service_type": "mgr", "placement": "2;smithi143=x", "target">

notice the placement only contains 2;smithi143=x

2021-06-15T20:14:29.203 INFO:journalctl@ceph.mgr.y.smithi135.stdout:Jun 15 20:14:29 smithi135 systemd[1]: Stopping Ceph mgr.y for e2a4517e-ce15-11eb-8c13-001a4aab830c...

*resulting in *

cluster 2021-06-15T20:21:09.388112+0000 mgr.x (mgr.34112) 238 : cluster [DBG] pgmap v218: 1 pgs: 1 active+clean; 0 B data, 3.7 MiB used, 707 GiB / 715 GiB avail
: debug 2021-06-15T20:21:11.241+0000 7ffa34117700 -1 log_channel(cephadm) log [ERR] : Upgrade: Paused due to UPGRADE_NO_STANDBY_MGR: Upgrade: Need standby mgr daemon
: audit 2021-06-15T20:21:11.239485+0000 mon.a (mon.0) 433 : audit [INF] from='mgr.34112 ' entity='mgr.x'
: audit 2021-06-15T20:21:11.241293+0000 mon.c (mon.1) 207 : audit [DBG] from='mgr.34112 172.21.15.143:0/2430240313' entity='mgr.x' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
: cephadm 2021-06-15T20:21:11.241839+0000 mgr.x (mgr.34112) 239 : cephadm [INF] Upgrade: Target is quay.ceph.io/ceph-ci/ceph:da5e8184007182fa3cd5c8385fee4e08c5620fe2 with id 219a75e51380d5cdf3af7b1fa194d1bedd11>
: cephadm 2021-06-15T20:21:11.244338+0000 mgr.x (mgr.34112) 240 : cephadm [INF] Upgrade: Checking mgr daemons...
: cephadm 2021-06-15T20:21:11.244711+0000 mgr.x (mgr.34112) 241 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.x)
: cephadm 2021-06-15T20:21:11.247775+0000 mgr.x (mgr.34112) 242 : cephadm [ERR] Upgrade: Paused due to UPGRADE_NO_STANDBY_MGR: Upgrade: Need standby mgr daemon
: audit 2021-06-15T20:21:11.253146+0000 mon.a (mon.0) 434 : audit [INF] from='mgr.34112 ' entity='mgr.x'
: cluster 2021-06-15T20:21:11.255641+0000 mgr.x (mgr.34112) 243 : cluster [DBG] pgmap v219: 1 pgs: 1 active+clean; 0 B data, 3.7 MiB used, 707 GiB / 715 GiB avail
: audit 2021-06-15T20:21:11.259712+0000 mon.a (mon.0) 435 : audit [INF] from='mgr.34112 ' entity='mgr.x'
2021-06-15T20:21:16.892 INFO:teuthology.orchestra.run.smithi135.stdout:NAME             HOST       STATUS          REFRESHED  AGE  VERSION  IMAGE NAME                            IMAGE ID      CONTAINER ID
2021-06-15T20:21:16.892 INFO:teuthology.orchestra.run.smithi135.stdout:alertmanager.a   smithi135  running (117s)  107s ago   2m   0.20.0   docker.io/prom/alertmanager:v0.20.0   0881eb8f169f  d7ab1fc469b4
2021-06-15T20:21:16.892 INFO:teuthology.orchestra.run.smithi135.stdout:grafana.a        smithi143  running (2m)    107s ago   2m   6.6.2    docker.io/ceph/ceph-grafana:6.6.2     a0dce381714a  bdf08596362b
2021-06-15T20:21:16.892 INFO:teuthology.orchestra.run.smithi135.stdout:mgr.x            smithi143  running (6m)    107s ago   6m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  bf659290d1ab
2021-06-15T20:21:16.893 INFO:teuthology.orchestra.run.smithi135.stdout:mon.a            smithi135  running (8m)    107s ago   9m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  a0083afbce6f
2021-06-15T20:21:16.893 INFO:teuthology.orchestra.run.smithi135.stdout:mon.b            smithi143  running (7m)    107s ago   7m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  177430b8b423
2021-06-15T20:21:16.893 INFO:teuthology.orchestra.run.smithi135.stdout:mon.c            smithi135  running (7m)    107s ago   7m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  881e672542be
2021-06-15T20:21:16.893 INFO:teuthology.orchestra.run.smithi135.stdout:node-exporter.a  smithi135  running (2m)    107s ago   2m   0.18.1   docker.io/prom/node-exporter:v0.18.1  e5a616e4b9cf  acd96e0cc12e
2021-06-15T20:21:16.894 INFO:teuthology.orchestra.run.smithi135.stdout:node-exporter.b  smithi143  running (2m)    107s ago   2m   0.18.1   docker.io/prom/node-exporter:v0.18.1  e5a616e4b9cf  a3c897228c6d
2021-06-15T20:21:16.894 INFO:teuthology.orchestra.run.smithi135.stdout:osd.0            smithi135  running (5m)    107s ago   5m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  9805ecc9628d
2021-06-15T20:21:16.894 INFO:teuthology.orchestra.run.smithi135.stdout:osd.1            smithi135  running (5m)    107s ago   5m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  29d8fc3fbb7f
2021-06-15T20:21:16.894 INFO:teuthology.orchestra.run.smithi135.stdout:osd.2            smithi135  running (5m)    107s ago   5m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  193e0a2a0487
2021-06-15T20:21:16.895 INFO:teuthology.orchestra.run.smithi135.stdout:osd.3            smithi135  running (4m)    107s ago   4m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  e2dea4bf5490
2021-06-15T20:21:16.895 INFO:teuthology.orchestra.run.smithi135.stdout:osd.4            smithi143  running (4m)    107s ago   4m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  e0e19361a64a
2021-06-15T20:21:16.895 INFO:teuthology.orchestra.run.smithi135.stdout:osd.5            smithi143  running (3m)    107s ago   3m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  71c57f8c0e3d
2021-06-15T20:21:16.895 INFO:teuthology.orchestra.run.smithi135.stdout:osd.6            smithi143  running (3m)    107s ago   3m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  4da5baa064d1
2021-06-15T20:21:16.895 INFO:teuthology.orchestra.run.smithi135.stdout:osd.7            smithi143  running (3m)    107s ago   3m   15.2.9   docker.io/ceph/ceph:v15.2.9           dfc483079636  098193d20e10
2021-06-15T20:21:16.896 INFO:teuthology.orchestra.run.smithi135.stdout:prometheus.a     smithi143  running (110s)  107s ago   2m   2.18.1   docker.io/prom/prometheus:v2.18.1     de242295e225  fb7dd6cd2280

http://qa-proxy.ceph.com/teuthology/yuriw-2021-06-15_18:44:29-rados-wip-yuri8-testing-2021-06-15-0839-octopus-distro-basic-smithi/6174184/teuthology.log

Actions #1

Updated by Deepika Upadhyay almost 3 years ago

  • Pull request ID set to 41478

Adding to analysis:
successful pick from pacific branch:

2021-06-16T09:17:56.280 INFO:tasks.cephadm:Adding mgr.y on smithi002
2021-06-16T09:17:56.280 INFO:tasks.cephadm:Adding mgr.x on smithi049
2021-06-16T09:17:56.281 DEBUG:teuthology.orchestra.run.smithi049:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:octopus shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 04f764a6-ce83-11eb-8c13-001a4aab830c -- ceph orch apply mgr '2;smithi002=y;smithi049=x'

seems like failing one is not deploying first manager out of 2:

2021-06-22T12:26:38.231 INFO:tasks.cephadm:Adding mgr.x on smithi195
2021-06-22T12:26:38.232 DEBUG:teuthology.orchestra.run.smithi195:> sudo /home/ubuntu/cephtest/cephadm --image docker.io/ceph/ceph:v15.2.9 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid b403ebe2-d354-11eb-8c18-001a4aab830c -- ceph orch apply mgr '2;smithi195=x'

looking into code: found needed to backport for: c79fa6d780580f99b62117e54326a4ef4b7adfef ( This prevents the first mgr from being shut down due to lack of
appropriate placements.)

https://pulpito.ceph.com/ideepika-2021-06-23_08:45:48-rados:cephadm-wip-deepika2-testing-2021-06-23-0828-octopus-distro-basic-smithi

Actions #2

Updated by Deepika Upadhyay almost 3 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF