Bug #45791: cephadm: Upgrade is failing octopus on centos 8 %d format a numbe ri srequired: not string - Orchestrator - Ceph

Actions

Copy link

Bug #45791

closed

cephadm: Upgrade is failing octopus on centos 8 %d format a numbe ri srequired: not string

Added by David Capone almost 4 years ago. Updated almost 4 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

cephadm

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

New dev cluster running centos 8 with octopus release deployed using cephadm. Deployed on version 15.2.1.

Attempted per documentation to run ceph orch upgrade start --ceph_version 15.2.3. Upgrade starts and I think succeeds on the managers and fails on the monitors. Cluster goes into staus:

health: HEALTH_ERR
            Module 'cephadm' has failed: %d format: a number is required, not str

Only way to clear it is

ceph orch upgrade stop
ceph mgr module disable cephadm
ceph mgr module enable cephadm

After that failure, I attempted to upgrade to 15.2.2 in case it was an intermediary upgrade issue and say failure occurred.

Running latest cephadm release provided by dnf update

Installed Packages

cephadm.x86_64                                                              2:15.2.3-0.el8                                                              @Ceph

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Description updated (diff)
Category changed from cephadm (binary) to cephadm
Status changed from New to Need More Info

Can you please attache the full MGR log file?

Actions

Copy link

Updated by David Capone almost 4 years ago

If more log is needed I can add, but I believe below is the relevant snip you need that pertains to the upgrade....

Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: cephadm 2020-06-02T10:22:53.892396+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 147 : cephadm [INF] Upgrade: Target is docker.io/ceph/ceph:v15.2.3 with id d72755c420bcbdae08d063de6035d060ea0487f8a43f777c75bdbfcd9fd907fa
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:53.893085+0000 mon.dev-lx-ceph12 (mon.0) 98 : audit [DBG] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config dump", "format": "json"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: cephadm 2020-06-02T10:22:53.895291+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 148 : cephadm [INF] Upgrade: Checking mgr daemons...
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:53.896773+0000 mon.dev-lx-ceph12 (mon.0) 99 : audit [DBG] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "versions"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: cephadm 2020-06-02T10:22:53.899261+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 149 : cephadm [INF] Upgrade: All mgr daemons are up to date.
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: cephadm 2020-06-02T10:22:53.899861+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 150 : cephadm [INF] Upgrade: Checking mon daemons...
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:53.901105+0000 mon.dev-lx-ceph12 (mon.0) 100 : audit [DBG] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "versions"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: cephadm 2020-06-02T10:22:53.903299+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 151 : cephadm [INF] Upgrade: All mon daemons are up to date.
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: cephadm 2020-06-02T10:22:53.904139+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 152 : cephadm [INF] Upgrade: Checking crash daemons...
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:53.904954+0000 mon.dev-lx-ceph12 (mon.0) 101 : audit [DBG] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "versions"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: cephadm 2020-06-02T10:22:53.907043+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 153 : cephadm [INF] Upgrade: Setting container_image for all crash...
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:53.907622+0000 mon.dev-lx-ceph12 (mon.0) 102 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config set", "name": "container_image", "value": "docker.io/ceph/ceph:v15.2.3", "who": "crash"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:53.909447+0000 mon.dev-lx-ceph12 (mon.0) 103 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph11"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:53.966447+0000 mon.dev-lx-ceph12 (mon.0) 104 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd='[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph11"}]': finished
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:53.968613+0000 mon.dev-lx-ceph12 (mon.0) 105 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph12"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:54.024460+0000 mon.dev-lx-ceph12 (mon.0) 106 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd='[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph12"}]': finished
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:54.026613+0000 mon.dev-lx-ceph12 (mon.0) 107 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph13"}]: dispatch
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: audit 2020-06-02T10:22:54.099527+0000 mon.dev-lx-ceph12 (mon.0) 108 : audit [INF] from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd='[{"prefix": "config rm", "name": "container_image", "who": "client.crash.dev-lx-ceph13"}]': finished
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: cephadm 2020-06-02T10:22:54.101505+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 154 : cephadm [INF] Upgrade: All crash daemons are up to date.
Jun 02 06:22:54 dev-lx-ceph12 bash¹⁰⁶⁶¹: cephadm 2020-06-02T10:22:54.102106+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 155 : cephadm [INF] Upgrade: Checking osd daemons...
Jun 02 06:22:54 dev-lx-ceph12 rsyslogd³⁵⁰⁷: message too long (10806) with configured size 8096, begin of message is: audit 2020-06-02T10:22:53.813310+0000 mon.dev-lx-ceph12 (mon.0) 94 : audit [INF] [v8.37.0-13.el8 try http://www.rsyslog.com/e/2445 ]
Jun 02 06:22:54 dev-lx-ceph12 rsyslogd³⁵⁰⁷: message too long (10808) with configured size 8096, begin of message is: audit 2020-06-02T10:22:53.873438+0000 mon.dev-lx-ceph12 (mon.0) 95 : audit [INF] [v8.37.0-13.el8 try http://www.rsyslog.com/e/2445 ]
Jun 02 06:22:55 dev-lx-ceph12 bash⁸¹⁸⁸: debug 2020-06-02T10:22:55.308+0000 7f8b16816700 0 log_channel(cluster) log [DBG] : pgmap v61: 464 pgs: 464 active+clean; 11 GiB data, 33 GiB used, 12 TiB / 12 TiB avail
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: ::ffff:172.16.231.81 - - [02/Jun/2020:10:22:56] "GET /metrics HTTP/1.1" 200 162838 "" "Prometheus/2.18.1"
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: debug 2020-06-02T10:22:56.295+0000 7ffb4b8e5700 0 mon.dev-lx-ceph12@0(leader) e3 handle_command mon_command({"prefix": "osd ok-to-stop", "ids": ["10"]} v 0) v1
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: debug 2020-06-02T10:22:56.295+0000 7ffb4b8e5700 0 log_channel(audit) log [DBG] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "osd ok-to-stop", "ids": ["10"]}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: debug 2020-06-02T10:22:56.295+0000 7f8b17818700 0 log_channel(audit) log [DBG] : from='mon.0 -' entity='mon.' cmd=[{"prefix": "osd ok-to-stop", "ids": ["10"]}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: debug 2020-06-02T10:22:56.298+0000 7f8b1b7e0700 0 log_channel(cephadm) log [INF] : Upgrade: It is safe to stop osd.10
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: debug 2020-06-02T10:22:56.299+0000 7f8b1b7e0700 0 log_channel(cephadm) log [INF] : Upgrade: Redeploying osd.10
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: debug 2020-06-02T10:22:56.300+0000 7ffb4b8e5700 0 mon.dev-lx-ceph12@0(leader) e3 handle_command mon_command({"prefix": "config set", "name": "container_image", "value": "docker.io/ceph/ceph:v15.2.3", "who": "osd.10"} v 0) v1
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: debug 2020-06-02T10:22:56.300+0000 7ffb4b8e5700 0 log_channel(audit) log [INF] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config set", "name": "container_image", "value": "docker.io/ceph/ceph:v15.2.3", "who": "osd.10"}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: debug 2020-06-02T10:22:56.345+0000 7ffb4a0e2700 0 log_channel(audit) log [INF] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd='[{"prefix": "config set", "name": "container_image", "value": "docker.io/ceph/ceph:v15.2.3", "who": "osd.10"}]': finished
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: debug 2020-06-02T10:22:56.347+0000 7ffb4b8e5700 0 mon.dev-lx-ceph12@0(leader) e3 handle_command mon_command({"prefix": "auth get", "entity": "osd.10"} v 0) v1
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: debug 2020-06-02T10:22:56.347+0000 7ffb4b8e5700 0 log_channel(audit) log [INF] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "auth get", "entity": "osd.10"}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: debug 2020-06-02T10:22:56.349+0000 7ffb4b8e5700 0 mon.dev-lx-ceph12@0(leader) e3 handle_command mon_command({"prefix": "config generate-minimal-conf"} v 0) v1
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: debug 2020-06-02T10:22:56.349+0000 7ffb4b8e5700 0 log_channel(audit) log [DBG] : from='mgr.434264 172.16.231.81:0/2107193256' entity='mgr.dev-lx-ceph12.maqgoo' cmd=[{"prefix": "config generate-minimal-conf"}]: dispatch
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: debug 2020-06-02T10:22:56.353+0000 7f8b1b7e0700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'cephadm' while running on mgr.dev-lx-ceph12.maqgoo: %d format: a number is required, not str
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: debug 2020-06-02T10:22:56.353+0000 7f8b1b7e0700 -1 cephadm.serve:
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: debug 2020-06-02T10:22:56.353+0000 7f8b1b7e0700 -1 Traceback (most recent call last):
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: File "/usr/share/ceph/mgr/cephadm/module.py", line 1174, in serve
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: self._do_upgrade()
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: File "/usr/share/ceph/mgr/cephadm/module.py", line 929, in _do_upgrade
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: 'redeploy'
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: File "/usr/share/ceph/mgr/cephadm/module.py", line 1971, in _daemon_action
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: return self._create_daemon(daemon_type, daemon_id, host)
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: File "/usr/share/ceph/mgr/cephadm/module.py", line 2380, in _create_daemon
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: raise OrchestratorError('osd.%d not in osdmap' % daemon_id)
Jun 02 06:22:56 dev-lx-ceph12 bash⁸¹⁸⁸: TypeError: %d format: a number is required, not str
Jun 02 06:22:56 dev-lx-ceph12 bash¹⁰⁶⁶¹: cluster 2020-06-02T10:22:55.308989+0000 mgr.dev-lx-ceph12.maqgoo (mgr.434264) 156 : cluster [DBG] pgmap v61: 464 pgs: 464 active+clean; 11 GiB data, 33 GiB used, 12 TiB / 12 TiB avail