Bug #47694: downgrading via ceph orch upgrade start results in partial application and mixed state - Orchestrator - Ceph

Actions

Copy link

Bug #47694

closed

downgrading via ceph orch upgrade start results in partial application and mixed state

Added by Jan Fajerski over 3 years ago. Updated about 3 years ago.

Status:

Won't Fix

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Following https://docs.ceph.com/en/latest/cephadm/upgrade/#using-customized-container-images I attempted to downgrade my cluster.

The process starts fine but I end up in a weird state with two mgr daemons downgraded, the upgrade seemingly succeeded and a HEALTH_WARN.

Starting with a healthy cluster at version 15.2.5-220-gb758bfd693 (SUSE downstream container) I run ceph orch upgrade start --image <custome registry url>/containers/ses/7/containers/ses/7/ceph/ceph:15.2.0.108. This starts the process alright and I can see the progress of the image pull in ceph -s.

After a while this finishes and left the cluster in the following state:

master:~ # ceph versions
{
    "mon": {
        "ceph version 15.2.5-220-gb758bfd693 (b758bfd69359a0ffa10bd5426d64e7636bb0a6c6) octopus (stable)": 3
    },
    "mgr": {
        "ceph version 15.2.0-108-g8cf4f02b08 (8cf4f02b0814fc5dc803ae5923cb310bb08de967) octopus (stable)": 2,
        "ceph version 15.2.5-220-gb758bfd693 (b758bfd69359a0ffa10bd5426d64e7636bb0a6c6) octopus (stable)": 1
    },
    "osd": {
        "ceph version 15.2.5-220-gb758bfd693 (b758bfd69359a0ffa10bd5426d64e7636bb0a6c6) octopus (stable)": 20
    },
    "mds": {
        "ceph version 15.2.5-220-gb758bfd693 (b758bfd69359a0ffa10bd5426d64e7636bb0a6c6) octopus (stable)": 2
    },
    "overall": {
        "ceph version 15.2.0-108-g8cf4f02b08 (8cf4f02b0814fc5dc803ae5923cb310bb08de967) octopus (stable)": 2,
        "ceph version 15.2.5-220-gb758bfd693 (b758bfd69359a0ffa10bd5426d64e7636bb0a6c6) octopus (stable)": 26
    }
}
master:~ # ceph -s
  cluster:
    id:     2f578f24-02e5-11eb-92b7-52540064363c
    health: HEALTH_WARN
            4 hosts fail cephadm check
            failed to probe daemons or devices
            28 stray daemons(s) not managed by cephadm

  services:
    mon: 3 daemons, quorum master,node1,node2 (age 2h)
    mgr: node2.ibtqev(active, since 4m), standbys: master.wdjpkv, node1.cgixgj
    mds: sesdev_fs:1 {0=sesdev_fs.node3.jwnsyq=up:active} 1 up:standby
    osd: 20 osds: 20 up (since 2h), 20 in (since 2h)

  task status:
    scrub status:
        mds.sesdev_fs.node3.jwnsyq: idle

  data:
    pools:   3 pools, 65 pgs
    objects: 22 objects, 2.8 KiB
    usage:   20 GiB used, 140 GiB / 160 GiB avail
    pgs:     65 active+clean

The current active mgr could not be failed.

master:~ # ceph mgr fail ibtqev
Daemon not found 'ibtqev', already failed

I'm aware the upgrade command should probably not expected to handle a downgrade. I think some validation should probably be done to avoid this situation, if only to avoid users running into issues due to mistyping.

Related issues 1 (0 open — 1 closed)