Bug #57891
[Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception
% Done:
0%
Source:
Tags:
backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
- Upgrade paused due to one host not being reachable in the cluster.
- Resumed the upgrade with the resume command
- and then the upgrade started working but we still have this HEALTH_ERR message in ceph -s!
[root@gibba001 ~]# ceph -s
cluster:
id: f9d4cf6a-edcf-11ec-a96a-3cecef3d8fb8
health: HEALTH_ERR
1 failed cephadm daemon(s)
Slow OSD heartbeats on back (longest 2552.612ms)
Slow OSD heartbeats on front (longest 2555.707ms)
Upgrade: failed due to an unexpected exception
services:
mon: 4 daemons, quorum gibba001,gibba002,gibba003,gibba005 (age 2h)
mgr: gibba008.tfggyq(active, since 3h), standbys: gibba006.enemnj
osd: 925 osds: 925 up (since 17s), 925 in (since 3w)
data:
pools: 2 pools, 8193 pgs
objects: 123.12M objects, 470 GiB
usage: 1.8 TiB used, 11 TiB / 12 TiB avail
pgs: 8193 active+clean
progress:
Upgrade to 17.2.5 (2h)
[===.........................] (remaining: 22h)
[root@gibba001 ~]# ceph orch upgrade status
{
"target_image": "quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"mgr",
"crash",
"mon"
],
"progress": "116/1008 daemons upgraded",
"message": "Currently upgrading osd daemons",
"is_paused": false
}
[root@gibba001 ~]#
[root@gibba001 ~]# ceph health detail
HEALTH_ERR 1 failed cephadm daemon(s); Slow OSD heartbeats on back (longest 1321.448ms); Slow OSD heartbeats on front (longest 1318.494ms); Upgrade: failed due to an unexpected exception
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon prometheus.gibba001 on gibba001 is in error state
[WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1321.448ms)
Slow OSD heartbeats on back from osd.643 [] to osd.90 [] 1321.448 msec possibly improving
Slow OSD heartbeats on back from osd.80 [] to osd.279 [] 1020.241 msec
[WRN] OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 1318.494ms)
Slow OSD heartbeats on front from osd.643 [] to osd.90 [] 1318.494 msec
[ERR] UPGRADE_EXCEPTION: Upgrade: failed due to an unexpected exception
Unexpected exception occurred during upgrade process: Unable to reach remote host gibba002.
[root@gibba001 ~]# ceph versions
{
"mon": {
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 4
},
"mgr": {
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
},
"osd": {
"ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)": 850,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 75
},
"mds": {},
"overall": {
"ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)": 850,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 81
}
}
Related issues
History
#1 Updated by Vikhyat Umrao 10 months ago
- Subject changed from HEALTH_ERR: Upgrade: failed due to an unexpected exception to [Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception
#5 Updated by Backport Bot 7 months ago
- Copied to Backport #58447: quincy: [Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception added
#6 Updated by Backport Bot 7 months ago
- Tags set to backport_processed