Project

General

Profile

Bug #57891

[Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception

Added by Vikhyat Umrao 4 months ago. Updated 18 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

- Upgrade paused due to one host not being reachable in the cluster.
- Resumed the upgrade with the resume command
- and then the upgrade started working but we still have this HEALTH_ERR message in ceph -s!


[root@gibba001 ~]# ceph -s
  cluster:
    id:     f9d4cf6a-edcf-11ec-a96a-3cecef3d8fb8
    health: HEALTH_ERR
            1 failed cephadm daemon(s)
            Slow OSD heartbeats on back (longest 2552.612ms)
            Slow OSD heartbeats on front (longest 2555.707ms)
            Upgrade: failed due to an unexpected exception

  services:
    mon: 4 daemons, quorum gibba001,gibba002,gibba003,gibba005 (age 2h)
    mgr: gibba008.tfggyq(active, since 3h), standbys: gibba006.enemnj
    osd: 925 osds: 925 up (since 17s), 925 in (since 3w)

  data:
    pools:   2 pools, 8193 pgs
    objects: 123.12M objects, 470 GiB
    usage:   1.8 TiB used, 11 TiB / 12 TiB avail
    pgs:     8193 active+clean

  progress:
    Upgrade to 17.2.5 (2h)
      [===.........................] (remaining: 22h)

[root@gibba001 ~]# ceph orch upgrade status
{
    "target_image": "quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45",
    "in_progress": true,
    "which": "Upgrading all daemon types on all hosts",
    "services_complete": [
        "mgr",
        "crash",
        "mon" 
    ],
    "progress": "116/1008 daemons upgraded",
    "message": "Currently upgrading osd daemons",
    "is_paused": false
}
[root@gibba001 ~]# 

[root@gibba001 ~]# ceph health detail
HEALTH_ERR 1 failed cephadm daemon(s); Slow OSD heartbeats on back (longest 1321.448ms); Slow OSD heartbeats on front (longest 1318.494ms); Upgrade: failed due to an unexpected exception
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
    daemon prometheus.gibba001 on gibba001 is in error state
[WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1321.448ms)
    Slow OSD heartbeats on back from osd.643 [] to osd.90 [] 1321.448 msec possibly improving
    Slow OSD heartbeats on back from osd.80 [] to osd.279 [] 1020.241 msec
[WRN] OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 1318.494ms)
    Slow OSD heartbeats on front from osd.643 [] to osd.90 [] 1318.494 msec
[ERR] UPGRADE_EXCEPTION: Upgrade: failed due to an unexpected exception
    Unexpected exception occurred during upgrade process: Unable to reach remote host gibba002. 

[root@gibba001 ~]# ceph versions
{
    "mon": {
        "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 4
    },
    "mgr": {
        "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)": 850,
        "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 75
    },
    "mds": {},
    "overall": {
        "ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy (stable)": 850,
        "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 81
    }
}


Related issues

Copied to Orchestrator - Backport #58447: quincy: [Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception In Progress

History

#1 Updated by Vikhyat Umrao 4 months ago

  • Subject changed from HEALTH_ERR: Upgrade: failed due to an unexpected exception to [Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception

#2 Updated by Adam King 4 months ago

  • Assignee set to Adam King

#3 Updated by Adam King 3 months ago

  • Pull request ID set to 48592

#4 Updated by Adam King 18 days ago

  • Status changed from New to Pending Backport
  • Backport set to quincy

#5 Updated by Backport Bot 18 days ago

  • Copied to Backport #58447: quincy: [Gibba Cluster] HEALTH_ERR: Upgrade: failed due to an unexpected exception added

#6 Updated by Backport Bot 18 days ago

  • Tags set to backport_processed

Also available in: Atom PDF