Project

General

Profile

Bug #53939

ceph-nfs-upgrade, pacific: Upgrade Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.0 on host smithi103 failed

Added by Sebastian Wagner 11 months ago. Updated 3 months ago.

Status:
New
Priority:
Immediate
Assignee:
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

mon[102341]:  : cluster [WRN] Health check failed: Upgrading daemon osd.0 on host smithi103 failed. (UPGRADE_REDEPLOY_DAEMON)
mon[66897]: cephadm 2022-01-18T16:27:48.439275+0000 mgr.smithi103.wyeocw (mgr.14712) 129 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Redeploy daemon osd.0 ...
mon[66897]: Non-zero exit code 1 from systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0
mon[66897]: systemctl: stderr Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: systemctl: stderr See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
mon[66897]: Traceback (most recent call last):
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8615, in <module>
mon[66897]:     main()
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8603, in main
mon[66897]:     r = ctx.func(ctx)
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1790, in _default_image
mon[66897]:     return func(ctx)
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 4603, in command_deploy
mon[66897]:     ports=daemon_ports)
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2715, in deploy_daemon
mon[66897]:     c, osd_fsid=osd_fsid, ports=ports)
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2960, in deploy_daemon_units
mon[66897]:     call_throws(ctx, ['systemctl', 'start', unit_name])
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1469, in call_throws
mon[66897]:     raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
mon[66897]: RuntimeError: Failed command: systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0: Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
mon[66897]: Traceback (most recent call last):
mon[66897]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1402, in _remote_connection
mon[66897]:     yield (conn, connr)
mon[66897]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1295, in _run_cephadm
mon[66897]:     code, '\n'.join(err)))
mon[66897]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Redeploy daemon osd.0 ...
mon[66897]: Non-zero exit code 1 from systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0
mon[66897]: systemctl: stderr Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: systemctl: stderr See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
mon[66897]: Traceback (most recent call last):
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8615, in <module>
mon[66897]:     main()
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 8603, in main
mon[66897]:     r = ctx.func(ctx)
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1790, in _default_image
mon[66897]:     return func(ctx)
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 4603, in command_deploy
mon[66897]:     ports=daemon_ports)
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2715, in deploy_daemon
mon[66897]:     c, osd_fsid=osd_fsid, ports=ports)
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 2960, in deploy_daemon_units
mon[66897]:     call_throws(ctx, ['systemctl', 'start', unit_name])
mon[66897]:   File "/var/lib/ceph/e287ac0e-7879-11ec-8c34-001a4aab830c/cephadm.c659ab77cc705b8440c5bb10bf729dd981addbc618204d30ac82f427ecc4779d", line 1469, in call_throws
mon[66897]:     raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
mon[66897]: RuntimeError: Failed command: systemctl start ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0: Job for ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service failed because a timeout was exceeded.
mon[66897]: See "systemctl status ceph-e287ac0e-7879-11ec-8c34-001a4aab830c@osd.0.service" and "journalctl -xe" for details.

...

cephadm 2022-01-18T16:27:48.439412+0000 mgr.smithi103.wyeocw (mgr.14712) 130 : cephadm [ERR] Upgrade: Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.0 on host smithi103 failed.

https://pulpito.ceph.com/swagner-2022-01-18_15:34:53-rados:cephadm-wip-swagner2-testing-2022-01-18-1242-pacific-distro-default-smithi/6624255


Related issues

Related to Orchestrator - Bug #46204: cephadm upgrade test: fail if upgrade status is set to error Resolved

History

#1 Updated by Sebastian Wagner 11 months ago

  • Description updated (diff)

#2 Updated by Adam King 10 months ago

also appears in mds-upgrade-sequence test http://pulpito.front.sepia.ceph.com/adking-2022-02-15_22:33:11-orch:cephadm-wip-adk2-testing-2022-02-15-1304-pacific-distro-basic-smithi/6685826

2022-02-15T23:05:56.261 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: Non-zero exit code 1 from systemctl start ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4
2022-02-15T23:05:56.261 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: systemctl: stderr Job for ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4.service failed because a timeout was exceeded.
2022-02-15T23:05:56.261 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: systemctl: stderr See "systemctl status ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4.service" and "journalctl -xe" for details.
2022-02-15T23:05:56.262 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: Traceback (most recent call last):
2022-02-15T23:05:56.262 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 8824, in <module>
2022-02-15T23:05:56.262 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     main()
2022-02-15T23:05:56.263 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 8812, in main
2022-02-15T23:05:56.263 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     r = ctx.func(ctx)
2022-02-15T23:05:56.263 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 1940, in _default_image
2022-02-15T23:05:56.263 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     return func(ctx)
2022-02-15T23:05:56.264 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 4794, in command_deploy
2022-02-15T23:05:56.264 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     ports=daemon_ports)
2022-02-15T23:05:56.264 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 2877, in deploy_daemon
2022-02-15T23:05:56.264 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     c, osd_fsid=osd_fsid, ports=ports)
2022-02-15T23:05:56.264 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 3122, in deploy_daemon_units
2022-02-15T23:05:56.265 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     call_throws(ctx, ['systemctl', 'start', unit_name])
2022-02-15T23:05:56.265 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 1617, in call_throws
2022-02-15T23:05:56.265 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
2022-02-15T23:05:56.265 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: RuntimeError: Failed command: systemctl start ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4: Job for ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4.service failed because a timeout was exceeded.
2022-02-15T23:05:56.266 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: See "systemctl status ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4.service" and "journalctl -xe" for details.
2022-02-15T23:05:56.266 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: Traceback (most recent call last):
2022-02-15T23:05:56.266 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1419, in _remote_connection
2022-02-15T23:05:56.266 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     yield (conn, connr)
2022-02-15T23:05:56.267 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1316, in _run_cephadm
2022-02-15T23:05:56.267 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     code, '\n'.join(err)))
2022-02-15T23:05:56.267 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Redeploy daemon osd.4 ...
2022-02-15T23:05:56.267 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: Non-zero exit code 1 from systemctl start ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4
2022-02-15T23:05:56.268 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: systemctl: stderr Job for ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4.service failed because a timeout was exceeded.
2022-02-15T23:05:56.268 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: systemctl: stderr See "systemctl status ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4.service" and "journalctl -xe" for details.
2022-02-15T23:05:56.268 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: Traceback (most recent call last):
2022-02-15T23:05:56.268 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 8824, in <module>
2022-02-15T23:05:56.269 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     main()
2022-02-15T23:05:56.269 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 8812, in main
2022-02-15T23:05:56.269 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     r = ctx.func(ctx)
2022-02-15T23:05:56.270 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 1940, in _default_image
2022-02-15T23:05:56.270 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     return func(ctx)
2022-02-15T23:05:56.270 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 4794, in command_deploy
2022-02-15T23:05:56.270 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     ports=daemon_ports)
2022-02-15T23:05:56.271 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 2877, in deploy_daemon
2022-02-15T23:05:56.271 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     c, osd_fsid=osd_fsid, ports=ports)
2022-02-15T23:05:56.271 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 3122, in deploy_daemon_units
2022-02-15T23:05:56.271 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     call_throws(ctx, ['systemctl', 'start', unit_name])
2022-02-15T23:05:56.272 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:   File "/var/lib/ceph/f1423eb4-8eb1-11ec-8c35-001a4aab830c/cephadm.19c30fdae773446a5aedb07d2e1282485a9413711bc49504cc554a3588e49e90", line 1617, in call_throws
2022-02-15T23:05:56.272 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:     raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
2022-02-15T23:05:56.272 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: RuntimeError: Failed command: systemctl start ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4: Job for ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4.service failed because a timeout was exceeded.
2022-02-15T23:05:56.272 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: See "systemctl status ceph-f1423eb4-8eb1-11ec-8c35-001a4aab830c@osd.4.service" and "journalctl -xe" for details.
2022-02-15T23:05:56.273 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]:
2022-02-15T23:05:56.273 INFO:journalctl@ceph.mon.smithi135.smithi135.stdout:Feb 15 23:05:55 smithi135 conmon[55656]: cephadm 2022-02-15T23:05:55.397008+0000 mgr.smithi049.fpucuw (mgr.14658) 233 : cephadm [ERR] Upgrade: Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.4 on host smithi135 failed.

#3 Updated by Laura Flores 9 months ago

Several dead jobs of this kind here:
/a/yuriw-2022-02-24_22:04:22-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi

#4 Updated by Laura Flores 9 months ago

  • Backport set to quincy, pacific

#5 Updated by Laura Flores 9 months ago

/a/yuriw-2022-03-01_17:45:51-rados-wip-yuri3-testing-2022-02-28-0757-pacific-distro-default-smithi/6714704

#6 Updated by Kamoltat (Junior) Sirivadhna 9 months ago

/a/yuriw-2022-03-17_14:54:32-rados-wip-yuri10-testing-2022-03-16-1432-pacific-distro-default-smithi/6742179

#7 Updated by Aishwarya Mathuria 9 months ago

/a/yuriw-2022-03-23_14:51:02-rados-wip-yuri4-testing-2022-03-21-1648-pacific-distro-default-smithi/6756012

#8 Updated by Laura Flores 8 months ago

/a/yuriw-2022-03-25_18:42:52-rados-wip-yuri7-testing-2022-03-24-1341-pacific-distro-default-smithi/6761209

#9 Updated by Kamoltat (Junior) Sirivadhna 8 months ago

/a/yuriw-2022-03-26_19:49:47-rados-wip-yuri10-testing-2022-03-22-1809-pacific-distro-default-smithi/6762725
/a/yuriw-2022-03-26_19:49:47-rados-wip-yuri10-testing-2022-03-22-1809-pacific-distro-default-smithi/6762733
/a/yuriw-2022-03-26_19:49:47-rados-wip-yuri10-testing-2022-03-22-1809-pacific-distro-default-smithi/6762740
/a/yuriw-2022-03-26_19:49:47-rados-wip-yuri10-testing-2022-03-22-1809-pacific-distro-default-smithi/6762753

#10 Updated by Laura Flores 8 months ago

/a/yuriw-2022-04-01_01:23:52-rados-wip-yuri2-testing-2022-03-31-1523-pacific-distro-default-smithi/6770861

#11 Updated by Laura Flores 8 months ago

/a/lflores-2022-04-22_20:48:19-rados-wip-55324-pacific-backport-distro-default-smithi/6801316

Description: rados/cephadm/mgr-nfs-upgrade/{0-distro/centos_8.stream_container_tools 1-bootstrap/octopus 1-start 2-nfs 3-upgrade-with-workload 4-final}

#12 Updated by Redouane Kachach Elhichou 7 months ago

  • Status changed from New to Resolved
  • Assignee set to Adam King
  • Pull request ID set to 45920

#13 Updated by Redouane Kachach Elhichou 7 months ago

  • Related to Bug #46204: cephadm upgrade test: fail if upgrade status is set to error added

#14 Updated by Adam King 7 months ago

  • Status changed from Resolved to New
  • Pull request ID deleted (45920)

PR didn't really fix issue, just made the job fail properly rather than timing out after 6 hours

#15 Updated by Laura Flores 6 months ago

/a/yuriw-2022-06-02_00:50:42-rados-wip-yuri4-testing-2022-06-01-1350-pacific-distro-default-smithi/6859636

#16 Updated by Adam King 6 months ago

saw this same issue presenting itself slightly differently.

https://pulpito.ceph.com/adking-2022-06-22_00:33:18-rados:cephadm-wip-adk2-testing-2022-06-21-1756-pacific-distro-default-smithi/6891065
https://pulpito.ceph.com/adking-2022-06-22_00:33:18-rados:cephadm-wip-adk2-testing-2022-06-21-1756-pacific-distro-default-smithi/6891125
https://pulpito.ceph.com/adking-2022-06-22_00:30:38-orch:cephadm-wip-adk2-testing-2022-06-21-1756-pacific-distro-default-smithi/6890973

instead failing with

Command failed on smithi002 with status 1: 'sudo /home/ubuntu/cephtest/cephadm --image docker.io/ceph/ceph:v16.2.4 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 81a02aae-f1de-11ec-8428-001a4aab830c -e sha1=e3cb1e0d16e2b374aa4360db82f2f495a775d0f1 -- bash -c \'ceph versions | jq -e \'"\'"\'.overall | length == 1\'"\'"\'\''

which is just the check at the end that makes sure all the daemons were actually upgraded. Further investigation showed that this was still just the issue with redeploying osd.0

2022-06-22T04:02:55.123 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: Traceback (most recent call last):
2022-06-22T04:02:55.124 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 9168, in <module>
2022-06-22T04:02:55.124 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     main()
2022-06-22T04:02:55.124 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 9156, in main
2022-06-22T04:02:55.124 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     r = ctx.func(ctx)
2022-06-22T04:02:55.125 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 1969, in _default_image
2022-06-22T04:02:55.125 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     return func(ctx)
2022-06-22T04:02:55.125 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 5014, in command_deploy
2022-06-22T04:02:55.126 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     ports=daemon_ports)
2022-06-22T04:02:55.126 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 2928, in deploy_daemon
2022-06-22T04:02:55.126 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     c, osd_fsid=osd_fsid, ports=ports)
2022-06-22T04:02:55.126 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 3173, in deploy_daemon_units
2022-06-22T04:02:55.127 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     call_throws(ctx, ['systemctl', 'start', unit_name])
2022-06-22T04:02:55.127 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 1636, in call_throws
2022-06-22T04:02:55.127 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
2022-06-22T04:02:55.127 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: RuntimeError: Failed command: systemctl start ceph-81a02aae-f1de-11ec-8428-001a4aab830c@osd.0: Job for ceph-81a02aae-f1de-11ec-8428-001a4aab830c@osd.0.service failed because a timeout was exceeded.
2022-06-22T04:02:55.128 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: See "systemctl status ceph-81a02aae-f1de-11ec-8428-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
2022-06-22T04:02:55.128 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: Traceback (most recent call last):
2022-06-22T04:02:55.128 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1454, in _remote_connection
2022-06-22T04:02:55.128 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     yield (conn, connr)
2022-06-22T04:02:55.129 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1351, in _run_cephadm
2022-06-22T04:02:55.129 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     code, '\n'.join(err)))
2022-06-22T04:02:55.129 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Redeploy daemon osd.0 ...
2022-06-22T04:02:55.129 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: Non-zero exit code 1 from systemctl start ceph-81a02aae-f1de-11ec-8428-001a4aab830c@osd.0
2022-06-22T04:02:55.130 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: systemctl: stderr Job for ceph-81a02aae-f1de-11ec-8428-001a4aab830c@osd.0.service failed because a timeout was exceeded.
2022-06-22T04:02:55.130 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: systemctl: stderr See "systemctl status ceph-81a02aae-f1de-11ec-8428-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
2022-06-22T04:02:55.130 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: Traceback (most recent call last):
2022-06-22T04:02:55.131 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 9168, in <module>
2022-06-22T04:02:55.131 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     main()
2022-06-22T04:02:55.131 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 9156, in main
2022-06-22T04:02:55.131 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     r = ctx.func(ctx)
2022-06-22T04:02:55.132 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 1969, in _default_image
2022-06-22T04:02:55.132 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     return func(ctx)
2022-06-22T04:02:55.132 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 5014, in command_deploy
2022-06-22T04:02:55.132 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     ports=daemon_ports)
2022-06-22T04:02:55.133 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 2928, in deploy_daemon
2022-06-22T04:02:55.133 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     c, osd_fsid=osd_fsid, ports=ports)
2022-06-22T04:02:55.133 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 3173, in deploy_daemon_units
2022-06-22T04:02:55.133 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     call_throws(ctx, ['systemctl', 'start', unit_name])
2022-06-22T04:02:55.134 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:   File "/var/lib/ceph/81a02aae-f1de-11ec-8428-001a4aab830c/cephadm.df7ea842f88b9a01f2ed5fe6b5a9c73bd1646a47a0f991b7d1c71179d480ae3a", line 1636, in call_throws
2022-06-22T04:02:55.134 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:     raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
2022-06-22T04:02:55.134 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: RuntimeError: Failed command: systemctl start ceph-81a02aae-f1de-11ec-8428-001a4aab830c@osd.0: Job for ceph-81a02aae-f1de-11ec-8428-001a4aab830c@osd.0.service failed because a timeout was exceeded.
2022-06-22T04:02:55.135 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: See "systemctl status ceph-81a02aae-f1de-11ec-8428-001a4aab830c@osd.0.service" and "journalctl -xe" for details.
2022-06-22T04:02:55.135 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]:
2022-06-22T04:02:55.135 INFO:journalctl@ceph.mon.smithi002.smithi002.stdout:Jun 22 04:02:54 smithi002 ceph-81a02aae-f1de-11ec-8428-001a4aab830c-mon-smithi002[79297]: cephadm 2022-06-22T04:02:53.015280+0000 mgr.smithi002.sovhqg (mgr.14742) 139 : cephadm [ERR] Upgrade: Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.0 on host smithi002 failed.

Not sure why the error is presenting itself differently here, but it's definitely the same issue.

#17 Updated by Laura Flores 5 months ago

/a/yuriw-2022-07-19_23:25:12-rados-wip-yuri2-testing-2022-07-15-0755-pacific-distro-default-smithi/6939118

#18 Updated by Kamoltat (Junior) Sirivadhna 4 months ago

/a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958090

#19 Updated by Kamoltat (Junior) Sirivadhna 4 months ago

/a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958233
/a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958301

#20 Updated by Kamoltat (Junior) Sirivadhna 4 months ago

/a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958404

#21 Updated by Kamoltat (Junior) Sirivadhna 4 months ago

/a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958440

#22 Updated by Adam King 4 months ago

had a pacific orch/cephadm run here https://pulpito.ceph.com/adking-2022-08-17_23:17:58-orch:cephadm-wip-adk2-testing-2022-08-17-1543-pacific-distro-default-smithi/ where all 3 instances of mgr-nfs-upgrade passed. Did another 15 runs of that test https://pulpito.ceph.com/adking-2022-08-18_12:25:13-orch:cephadm-wip-adk2-testing-2022-08-17-1543-pacific-distro-default-smithi/ 14/15 passed and the one failure seemed to be some failure to pull a package

Failed to connect to https://copr.fedorainfracloud.org/coprs/ceph/python3-asyncssh/repo/epel-8/dnf.repo?arch=x86_64:

that is unrelated to the actual test. I think it's possible this is fixed by https://github.com/ceph/ceph/pull/47535

#23 Updated by Matan Breizman 4 months ago

/a/yuriw-2022-08-22_21:19:34-rados-wip-yuri4-testing-2022-08-18-1020-pacific-distro-default-smithi/6986467

/a/yuriw-2022-08-22_21:19:34-rados-wip-yuri4-testing-2022-08-18-1020-pacific-distro-default-smithi/6986489

/a/yuriw-2022-08-22_21:19:34-rados-wip-yuri4-testing-2022-08-18-1020-pacific-distro-default-smithi/6986502

#24 Updated by Laura Flores 3 months ago

  • Tags set to test-failure

#25 Updated by Laura Flores 3 months ago

/a/yuriw-2022-08-24_16:39:47-rados-wip-yuri4-testing-2022-08-24-0707-pacific-distro-default-smithi/6990258

#26 Updated by Adam King 3 months ago

https://github.com/ceph/ceph/pull/47535 was merged Sep 6th. Interested to see if we see this failure any more with builds created Sep 7th onward.

Also available in: Atom PDF