Project

General

Profile

Actions

Bug #53693

closed

ceph orch upgrade start is getting stuck in gibba cluster

Added by Vikhyat Umrao over 2 years ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

- The current ceph version

# ceph versions
{
    "mon": {
        "ceph version 17.0.0-9475-g8ea352e9 (8ea352e994feffca1bfd357a20c491df01db91a9) quincy (dev)": 5
    },
    "mgr": {
        "ceph version 17.0.0-9475-g8ea352e9 (8ea352e994feffca1bfd357a20c491df01db91a9) quincy (dev)": 2
    },
    "osd": {
        "ceph version 17.0.0-9475-g8ea352e9 (8ea352e994feffca1bfd357a20c491df01db91a9) quincy (dev)": 970
    },
    "mds": {
        "ceph version 17.0.0-9475-g8ea352e9 (8ea352e994feffca1bfd357a20c491df01db91a9) quincy (dev)": 2
    },
    "overall": {
        "ceph version 17.0.0-9475-g8ea352e9 (8ea352e994feffca1bfd357a20c491df01db91a9) quincy (dev)": 979
    }
}

The version we were trying to upgrade:

{
    "needs_update": {
        "crash.gibba001": {
            "current_id": "f79fcb826d512859ef4914712095ea7ee02622fc213f5c39ab7b2ec468965efd",
            "current_name": "quay.ceph.io/ceph-ci/ceph@sha256:14b1ea54031bea23a37c589a02be794dca9c5a0807116ffef655bea631f9a62e",
            "current_version": "17.0.0-9475-g8ea352e9" 
        },
        "crash.gibba002": {
            "current_id": "f79fcb826d512859ef4914712095ea7ee02622fc213f5c39ab7b2ec468965efd",
            "current_name": "quay.ceph.io/ceph-ci/ceph@sha256:14b1ea54031bea23a37c589a02be794dca9c5a0807116ffef655bea631f9a62e",
            "current_version": "17.0.0-9475-g8ea352e9" 
        },

........
........

    ],
    "target_digest": "quay.ceph.io/ceph-ci/ceph@sha256:465e18548d5a9e1155bd093dfaa894e3cbc8f5b2e5a3d22b22c73a7979664155",
    "target_id": "9081735aa97cbfd10601ab1fc5fcaed6c8b41c2b22517b73c297aab304e5ffdd",
    "target_name": "quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa",
    "target_version": "ceph version 17.0.0-9718-g4ff72306 (4ff723061fc15c803dcf6556d02f56bdf56de5fa) quincy (dev)",
    "up_to_date": []
}

- Upgrade start and status with debug log enabled

[root@gibba001 ~]# ceph config set mgr mgr/cephadm/log_level debug

[root@gibba001 ~]# ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa
Initiating upgrade to quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa

[root@gibba001 ~]# ceph orch upgrade status
{
    "target_image": "quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa",
    "in_progress": true,
    "services_complete": [],
    "progress": "",
}

- Ceph MGR Logs:

2021-12-21T21:34:47.490+0000 7fd28c7cb700  0 log_channel(audit) log [DBG] : from='client.17948814 -' entity='client.admin' cmd=[{"prefix": "orch upgrade start", "image": "quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa", "target": ["mon-mgr", ""]}]: dispatch

2021-12-21T21:34:47.492+0000 7fd28cfcc700  0 [cephadm INFO root] Upgrade: Started with target quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa

2021-12-21T21:34:47.492+0000 7fd28cfcc700  0 log_channel(cephadm) log [INF] : Upgrade: Started with target quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa

2021-12-21T21:34:47.492+0000 7fd28cfcc700  0 [progress INFO root] update: starting ev 668dc33f-3fca-4bd9-9ca7-0b926137fd71 (Upgrade to quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa)

- In debug logs, we have only the following, maybe nothing related to upgrade?

2021-12-21T21:34:48.142+0000 7fd286f00700  0 [progress INFO root] Processing OSDMap change 44389..44389
2021-12-21T21:34:50.813+0000 7fd25c16f700  0 [cephadm DEBUG root] Refreshed host gibba026 daemons (28)
2021-12-21T21:34:50.819+0000 7fd25d171700  0 [cephadm DEBUG root] Refreshed host gibba027 daemons (28)
2021-12-21T21:34:50.847+0000 7fd28b7c9700  0 log_channel(cluster) log [DBG] : pgmap v98: 65553 pgs: 1 active+clean+scrubbing+deep, 65552 active+clean; 992 GiB data, 4.1 TiB used, 8.8 TiB / 13 TiB avail
2021-12-21T21:34:51.075+0000 7fd25d171700  0 [cephadm DEBUG root] Received up-to-date metadata from agent on host gibba027.
2021-12-21T21:34:51.077+0000 7fd25c16f700  0 [cephadm DEBUG root] Received up-to-date metadata from agent on host gibba026.
2021-12-21T21:34:51.083+0000 7fd25a96c700  0 [cephadm DEBUG root] Refreshed host gibba023 daemons (28)
2021-12-21T21:34:51.237+0000 7fd25a96c700  0 [cephadm DEBUG root] Received up-to-date metadata from agent on host gibba023.
2021-12-21T21:34:51.483+0000 7fd25996a700  0 [cephadm DEBUG root] Refreshed host gibba030 daemons (28)
2021-12-21T21:34:51.585+0000 7fd25a16b700  0 [cephadm DEBUG root] Refreshed host gibba029 daemons (28)
2021-12-21T21:34:51.604+0000 7fd25996a700  0 [cephadm DEBUG root] Received up-to-date metadata from agent on host gibba030.
2021-12-21T21:34:51.696+0000 7fd25a16b700  0 [cephadm DEBUG root] Received up-to-date metadata from agent on host gibba029.
2021-12-21T21:34:51.841+0000 7fd25d972700  0 [cephadm DEBUG root] Refreshed host gibba031 daemons (28)
2021-12-21T21:34:51.954+0000 7fd259169700  0 [cephadm DEBUG root] Refreshed host gibba032 daemons (28)
2021-12-21T21:34:51.996+0000 7fd25d972700  0 [cephadm DEBUG root] Received up-to-date metadata from agent on host gibba031.
2021-12-21T21:34:52.087+0000 7fd259169700  0 [cephadm DEBUG root] Received up-to-date metadata from agent on host gibba032.

- Ceph staus

# ceph -s
  cluster:
    id:     182eef00-53b5-11ec-84d3-3cecef3d8fb8
    health: HEALTH_OK

  services:
    mon: 5 daemons, quorum gibba001,gibba002,gibba004,gibba005,gibba006 (age 3h)
    mgr: gibba001.zptzqf(active, since 7m), standbys: gibba002.veobjs
    mds: 1/1 daemons up, 1 standby
    osd: 1073 osds: 970 up (since 14m), 970 in (since 23h)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 65553 pgs
    objects: 230.34M objects, 992 GiB
    usage:   4.1 TiB used, 8.8 TiB / 13 TiB avail
    pgs:     65553 active+clean

  progress:
    Upgrade to quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa (0s)
      [............................] 

- In Progress bar of ceph status upgrade status is stuck always as following we have given approx more than 15+hours to this upgrade to move forward but no luch

 progress:
    Upgrade to quay.ceph.io/ceph-ci/ceph:4ff723061fc15c803dcf6556d02f56bdf56de5fa (0s)
      [............................] 
Actions

Also available in: Atom PDF