Project

General

Profile

Actions

Bug #64864

closed

cephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster log

Added by Sridhar Seshasayee 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
orchestrator
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The following tests in the cephadm suite failed with the warning:

/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587779
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587855
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587949

All the tests above add "MON_DOWN" to the ignore list as it's expected. In addition to the health
warning, the health detail is also logged by all the tests shown below:

"cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c" in cluster log

All the tests failed due to the above warning not present in the ignorelist.

Therefore, this tracker may be used to track the addition of "mons down" warning
as well to the ignore list for the tests.

Logs from 7587779 are shown below as an example:

2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.900461+0000 mon.a (mon.0) 274 : cluster [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)
2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c
2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.908009+0000 mon.a (mon.0) 276 : cluster [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum a,c

...

2024-03-10T02:10:47.804 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[ERR\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1
2024-03-10T02:10:47.859 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[WRN\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1
2024-03-10T02:10:47.915 INFO:teuthology.orchestra.run.smithi033.stdout:2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c
Actions

Also available in: Atom PDF