Project

General

Profile

Actions

Bug #45791

closed

cephadm: Upgrade is failing octopus on centos 8 %d format a numbe ri srequired: not string

Added by David Capone almost 4 years ago. Updated almost 4 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

New dev cluster running centos 8 with octopus release deployed using cephadm. Deployed on version 15.2.1.

Attempted per documentation to run ceph orch upgrade start --ceph_version 15.2.3. Upgrade starts and I think succeeds on the managers and fails on the monitors. Cluster goes into staus:

health: HEALTH_ERR
Module 'cephadm' has failed: %d format: a number is required, not str

Only way to clear it is

ceph orch upgrade stop
ceph mgr module disable cephadm
ceph mgr module enable cephadm

After that failure, I attempted to upgrade to 15.2.2 in case it was an intermediary upgrade issue and say failure occurred.

Running latest cephadm release provided by dnf update

Installed Packages

cephadm.x86_64                                                              2:15.2.3-0.el8                                                              @Ceph
Actions #1

Updated by Sebastian Wagner almost 4 years ago

  • Description updated (diff)
  • Category changed from cephadm (binary) to cephadm
  • Status changed from New to Need More Info

Can you please attache the full MGR log file?

Actions #2

Updated by David Capone almost 4 years ago

If more log is needed I can add, but I believe below is the relevant snip you need that pertains to the upgrade....

Jun 02 06:22:54 dev-lx-ceph12 bash10661: cephadm 2020-06-02T10:22:53.892396+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 147 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.3 with id d72755c420bcbdae08d063de6035d060ea0487f8a43f777c75bdbfcd9fd907fa
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:53.893085+0000 mon.dev-lx-ceph12 (mon.0) 98 : audit [DBG] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash10661: cephadm 2020-06-02T10:22:53.895291+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 148 : cephadm [INF] Upgrade: Checking mgr daemons...
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:53.896773+0000 mon.dev-lx-ceph12 (mon.0) 99 : audit [DBG] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "versions"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash10661: cephadm 2020-06-02T10:22:53.899261+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 149 : cephadm [INF] Upgrade: All mgr daemons are up to date.
Jun 02 06:22:54 dev-lx-ceph12 bash10661: cephadm 2020-06-02T10:22:53.899861+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 150 : cephadm [INF] Upgrade: Checking mon daemons...
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:53.901105+0000 mon.dev-lx-ceph12 (mon.0) 100 : audit [DBG] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "versions"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash10661: cephadm 2020-06-02T10:22:53.903299+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 151 : cephadm [INF] Upgrade: All mon daemons are up to date.
Jun 02 06:22:54 dev-lx-ceph12 bash10661: cephadm 2020-06-02T10:22:53.904139+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 152 : cephadm [INF] Upgrade: Checking crash daemons...
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:53.904954+0000 mon.dev-lx-ceph12 (mon.0) 101 : audit [DBG] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "versions"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash10661: cephadm 2020-06-02T10:22:53.907043+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 153 : cephadm [INF] Upgrade: Setting container_image for all crash...
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:53.907622+0000 mon.dev-lx-ceph12 (mon.0) 102 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config set", "name": "container_image", "value": "docker.io/ceph/ceph:v15.2.3", "who": "crash"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:53.909447+0000 mon.dev-lx-ceph12 (mon.0) 103 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph11"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:53.966447+0000 mon.dev-lx-ceph12 (mon.0) 104 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd='[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph11"}]': finished
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:53.968613+0000 mon.dev-lx-ceph12 (mon.0) 105 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph12"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:54.024460+0000 mon.dev-lx-ceph12 (mon.0) 106 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd='[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph12"}]': finished
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:54.026613+0000 mon.dev-lx-ceph12 (mon.0) 107 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph13"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash10661: audit 2020-06-02T10:22:54.099527+0000 mon.dev-lx-ceph12 (mon.0) 108 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd='[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph13"}]': finished
Jun 02 06:22:54 dev-lx-ceph12 bash10661: cephadm 2020-06-02T10:22:54.101505+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 154 : cephadm [INF] Upgrade: All crash daemons are up to date.
Jun 02 06:22:54 dev-lx-ceph12 bash10661: cephadm 2020-06-02T10:22:54.102106+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 155 : cephadm [INF] Upgrade: Checking osd daemons...
Jun 02 06:22:54 dev-lx-ceph12 rsyslogd3507: message too long (10806) with configured size 8096, begin of message is: audit 2020-06-02T10:22:53.813310+0000 mon.dev-lx-ceph12 (mon.0) 94 : audit [INF] [v8.37.0-13.el8 try http://www.rsyslog.com/e/2445 ]
Jun 02 06:22:54 dev-lx-ceph12 rsyslogd3507: message too long (10808) with configured size 8096, begin of message is: audit 2020-06-02T10:22:53.873438+0000 mon.dev-lx-ceph12 (mon.0) 95 : audit [INF] [v8.37.0-13.el8 try http://www.rsyslog.com/e/2445 ]
Jun 02 06:22:55 dev-lx-ceph12 bash8188: debug 2020-06-02T10:22:55.308+0000 7f8b16816700 0 log_channel(cluster) log [DBG] : pgmap v61: 464 pgs: 464 active+clean; 11 GiB data, 33 GiB used, 12 TiB / 12 TiB avail
Jun 02 06:22:56 dev-lx-ceph12 bash8188: ::ffff:172.16.231.81 - - [02/Jun/2020:10:22:56] "GET /metrics HTTP/1.1" 200 162838 "" "Prometheus/2.18.1"
Jun 02 06:22:56 dev-lx-ceph12 bash10661: debug 2020-06-02T10:22:56.295+0000 7ffb4b8e5700 0 mon.dev-lx-ceph12@0(leader) e3 handle_command mon_command({"prefix": "osd ok-to-stop", "ids": ["10"]} v 0) v1
Jun 02 06:22:56 dev-lx-ceph12 bash10661: debug 2020-06-02T10:22:56.295+0000 7ffb4b8e5700 0 log_channel(audit) log [DBG] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "osd ok-to-stop", "ids": ["10"]}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash8188: debug 2020-06-02T10:22:56.295+0000 7f8b17818700 0 log_channel(audit) log [DBG] : from='mon.0 -' entity='mon.' cmd=[{"prefix": "osd ok-to-stop", "ids": ["10"]}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash8188: debug 2020-06-02T10:22:56.298+0000 7f8b1b7e0700 0 log_channel(cephadm) log [INF] : Upgrade: It is safe to stop osd.10
Jun 02 06:22:56 dev-lx-ceph12 bash8188: debug 2020-06-02T10:22:56.299+0000 7f8b1b7e0700 0 log_channel(cephadm) log [INF] : Upgrade: Redeploying osd.10
Jun 02 06:22:56 dev-lx-ceph12 bash10661: debug 2020-06-02T10:22:56.300+0000 7ffb4b8e5700 0 mon.dev-lx-ceph12@0(leader) e3 handle_command mon_command({"prefix": "config set", "name": "container_image", "value": "docker.io/ceph/ceph:v15.2.3", "who": "osd.10"} v 0) v1
Jun 02 06:22:56 dev-lx-ceph12 bash10661: debug 2020-06-02T10:22:56.300+0000 7ffb4b8e5700 0 log_channel(audit) log [INF] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config set", "name": "container_image", "value": "docker.io/ceph/ceph:v15.2.3", "who": "osd.10"}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash10661: debug 2020-06-02T10:22:56.345+0000 7ffb4a0e2700 0 log_channel(audit) log [INF] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd='[{"prefix": "config set", "name": "container_image", "value": "docker.io/ceph/ceph:v15.2.3", "who": "osd.10"}]': finished
Jun 02 06:22:56 dev-lx-ceph12 bash10661: debug 2020-06-02T10:22:56.347+0000 7ffb4b8e5700 0 mon.dev-lx-ceph12@0(leader) e3 handle_command mon_command({"prefix": "auth get", "entity": "osd.10"} v 0) v1
Jun 02 06:22:56 dev-lx-ceph12 bash10661: debug 2020-06-02T10:22:56.347+0000 7ffb4b8e5700 0 log_channel(audit) log [INF] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "auth get", "entity": "osd.10"}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash10661: debug 2020-06-02T10:22:56.349+0000 7ffb4b8e5700 0 mon.dev-lx-ceph12@0(leader) e3 handle_command mon_command({"prefix": "config generate-minimal-conf"} v 0) v1
Jun 02 06:22:56 dev-lx-ceph12 bash10661: debug 2020-06-02T10:22:56.349+0000 7ffb4b8e5700 0 log_channel(audit) log [DBG] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config generate-minimal-conf"}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash8188: debug 2020-06-02T10:22:56.353+0000 7f8b1b7e0700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'cephadm' while running on mgr.dev-lx-ceph12.maqgoo: %d format: a number is required, not str
Jun 02 06:22:56 dev-lx-ceph12 bash8188: debug 2020-06-02T10:22:56.353+0000 7f8b1b7e0700 -1 cephadm.serve:
Jun 02 06:22:56 dev-lx-ceph12 bash8188: debug 2020-06-02T10:22:56.353+0000 7f8b1b7e0700 -1 Traceback (most recent call last):
Jun 02 06:22:56 dev-lx-ceph12 bash8188: File "/usr/share/ceph/mgr/cephadm/module.py", line 1174, in serve
Jun 02 06:22:56 dev-lx-ceph12 bash8188: self._do_upgrade()
Jun 02 06:22:56 dev-lx-ceph12 bash8188: File "/usr/share/ceph/mgr/cephadm/module.py", line 929, in _do_upgrade
Jun 02 06:22:56 dev-lx-ceph12 bash8188: 'redeploy'
Jun 02 06:22:56 dev-lx-ceph12 bash8188: File "/usr/share/ceph/mgr/cephadm/module.py", line 1971, in _daemon_action
Jun 02 06:22:56 dev-lx-ceph12 bash8188: return self._create_daemon(daemon_type, daemon_id, host)
Jun 02 06:22:56 dev-lx-ceph12 bash8188: File "/usr/share/ceph/mgr/cephadm/module.py", line 2380, in _create_daemon
Jun 02 06:22:56 dev-lx-ceph12 bash8188: raise OrchestratorError('osd.%d not in osdmap' % daemon_id)
Jun 02 06:22:56 dev-lx-ceph12 bash8188: TypeError: %d format: a number is required, not str
Jun 02 06:22:56 dev-lx-ceph12 bash10661: cluster 2020-06-02T10:22:55.308989+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 156 : cluster [DBG] pgmap v61: 464 pgs: 464 active+clean; 11 GiB data, 33 GiB used, 12 TiB / 12 TiB avail

After this point there are no other messages logged that pertain to upgrading.

Actions #3

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from Need More Info to Can't reproduce
  • Target version deleted (v15.2.1)
Actions

Also available in: Atom PDF