Project

General

Profile

Actions

Bug #48567

closed

octopus: "cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum c,a" in upgrade:nautilus-x-octopus

Added by Yuri Weinstein over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/nautilus-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is for 15.2.8 release

Runs:
https://pulpito.ceph.com/teuthology-2020-12-10_16:13:48-upgrade:nautilus-x-octopus-distro-basic-smithi/
https://pulpito.ceph.com/yuriw-2020-12-09_23:07:56-upgrade:nautilus-x-octopus-distro-basic-smithi/
Jobs:
['5696843', '5696844']
['5698719', '5698727']

Logs:
/a/teuthology-2020-12-10_16:13:48-upgrade:nautilus-x-octopus-distro-basic-smithi/5698719/teuthology.log
/a/yuriw-2020-12-09_23:07:56-upgrade:nautilus-x-octopus-distro-basic-smithi/5696843/teuthology.log

failure_reason: '"2020-12-10T21:56:51.664776+0000 mon.c (mon.1) 115 : cluster [WRN]
  Health detail: HEALTH_WARN 1/3 mons down, quorum c,a" in cluster log'

Seems to be introduced after 11/27/20
https://pulpito.ceph.com/?suite=upgrade%3Anautilus-x&branch=octopus

Actions #1

Updated by Yuri Weinstein over 3 years ago

  • Description updated (diff)
Actions #2

Updated by Yuri Weinstein over 3 years ago

  • Description updated (diff)
Actions #3

Updated by Neha Ojha over 3 years ago

  • Subject changed from "cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum c,a" in upgrade:nautilus-x-octopus to octopus: "cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum c,a" in upgrade:nautilus-x-octopus
Actions #4

Updated by Neha Ojha over 3 years ago

  • Status changed from New to Triaged
  • Priority changed from Urgent to Normal

I think the problem is that https://github.com/ceph/ceph/pull/38118 has merged in nautilus while https://github.com/ceph/ceph/pull/38345 hasn't merged in octopus. When doing an upgrade test from N->O, the nautilus version adds extra health detail to clog but does not set "mon health detail to clog = false" because the test suite being used is octopus, which still doesn't have that change.

The fact that this warning only appears while the mons are running nautilus proves the above theory.

2020-12-10 21:56:51.662 7f4ec0c3e700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN 1/3 mons down, quorum c,a
.
.
2020-12-10 21:56:51.806 7f4ec2441700  7 mon.c@1(peon).log v68 update_from_paxos applying incremental log 68 2020-12-10 21:56:51.749969 mon.b (mon.0) 8 : cluster [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, quorum c,a)
.
.
2020-12-10T22:24:33.202+0000 7f776b6b2540  0 ceph version 15.2.7-678-gd0234e1e (d0234e1ede269851a029cd700e16721cae756cc6) octopus (stable), process ceph-mon, pid 16286

https://github.com/ceph/ceph/pull/38118 merged 11 days ago which aligns with "Seems to be introduced after 11/27/20 https://pulpito.ceph.com/?suite=upgrade%3Anautilus-x&branch=octopus"

Actions #6

Updated by Nathan Cutler over 3 years ago

  • Status changed from Triaged to Resolved
  • Pull request ID set to 38345
Actions #7

Updated by Neha Ojha over 3 years ago

Nathan Cutler wrote:

@Neha ., so this is fixed by https://github.com/ceph/ceph/pull/38345 ?

Yes

Actions

Also available in: Atom PDF