Actions
Bug #57891
closed[Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception
% Done:
0%
Source:
Tags:
backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
- Upgrade paused due to one host not being reachable in the cluster.
- Resumed the upgrade with the resume command
- and then the upgrade started working but we still have this HEALTH_ERR message in ceph -s!
[root@gibba001 ~]# ceph -s cluster: id: f9d4cf6a-edcf-11ec-a96a-3cecef3d8fb8 health: HEALTH_ERR 1 failed cephadm daemon(s) Slow OSD heartbeats on back (longest 2552.612ms) Slow OSD heartbeats on front (longest 2555.707ms) Upgrade: failed due to an unexpected exception services: mon: 4 daemons, quorum gibba001,gibba002,gibba003,gibba005 (age 2h) mgr: gibba008.tfggyq(active, since 3h), standbys: gibba006.enemnj osd: 925 osds: 925 up (since 17s), 925 in (since 3w) data: pools: 2 pools, 8193 pgs objects: 123.12M objects, 470 GiB usage: 1.8 TiB used, 11 TiB / 12 TiB avail pgs: 8193 active+clean progress: Upgrade to 17.2.5 (2h) [===.........................] (remaining: 22h) [root@gibba001 ~]# ceph orch upgrade status { "target_image": "quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45", "in_progress": true, "which": "Upgrading all daemon types on all hosts", "services_complete": [ "mgr", "crash", "mon" ], "progress": "116/1008 daemons upgraded", "message": "Currently upgrading osd daemons", "is_paused": false } [root@gibba001 ~]# [root@gibba001 ~]# ceph health detail HEALTH_ERR 1 failed cephadm daemon(s); Slow OSD heartbeats on back (longest 1321.448ms); Slow OSD heartbeats on front (longest 1318.494ms); Upgrade: failed due to an unexpected exception [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) daemon prometheus.gibba001 on gibba001 is in error state [WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1321.448ms) Slow OSD heartbeats on back from osd.643 [] to osd.90 [] 1321.448 msec possibly improving Slow OSD heartbeats on back from osd.80 [] to osd.279 [] 1020.241 msec [WRN] OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 1318.494ms) Slow OSD heartbeats on front from osd.643 [] to osd.90 [] 1318.494 msec [ERR] UPGRADE_EXCEPTION: Upgrade: failed due to an unexpected exception Unexpected exception occurred during upgrade process: Unable to reach remote host gibba002. [root@gibba001 ~]# ceph versions { "mon": { "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 4 }, "mgr": { "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2 }, "osd": { "ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)": 850, "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 75 }, "mds": {}, "overall": { "ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)": 850, "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 81 } }
Updated by Vikhyat Umrao over 1 year ago
- Subject changed from HEALTH_ERR: Upgrade: failed due to an unexpected exception to [Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception
Updated by Adam King over 1 year ago
- Status changed from New to Pending Backport
- Backport set to quincy
Updated by Backport Bot over 1 year ago
- Copied to Backport #58447: quincy: [Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception added
Updated by Adam King about 1 year ago
- Status changed from Pending Backport to Resolved
Actions