Project

General

Profile

Actions

Bug #64864

closed

cephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster log

Added by Sridhar Seshasayee about 2 months ago. Updated 26 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
orchestrator
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The following tests in the cephadm suite failed with the warning:

/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587779
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587855
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587949

All the tests above add "MON_DOWN" to the ignore list as it's expected. In addition to the health
warning, the health detail is also logged by all the tests shown below:

"cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c" in cluster log

All the tests failed due to the above warning not present in the ignorelist.

Therefore, this tracker may be used to track the addition of "mons down" warning
as well to the ignore list for the tests.

Logs from 7587779 are shown below as an example:

2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.900461+0000 mon.a (mon.0) 274 : cluster [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)
2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c
2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.908009+0000 mon.a (mon.0) 276 : cluster [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum a,c

...

2024-03-10T02:10:47.804 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[ERR\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1
2024-03-10T02:10:47.859 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[WRN\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1
2024-03-10T02:10:47.915 INFO:teuthology.orchestra.run.smithi033.stdout:2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c
Actions #1

Updated by Sridhar Seshasayee about 2 months ago

  • Description updated (diff)
Actions #2

Updated by Sridhar Seshasayee about 2 months ago

  • Category changed from cephadm to orchestrator
Actions #3

Updated by Sridhar Seshasayee about 2 months ago

  • Translation missing: en.field_tag_list set to test-failure
  • Tags deleted (test-failure)
Actions #4

Updated by Aishwarya Mathuria about 1 month ago

/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609867
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609907

Actions #5

Updated by Nitzan Mordechai about 1 month ago

/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620793
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620804
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620848
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620903
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620939
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620978
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7621027
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7621054

Actions #6

Updated by Laura Flores about 1 month ago

/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623410
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623421
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623458
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623475
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623536
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623550
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623575
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623597
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623612
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623620
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623624

Actions #7

Updated by Laura Flores about 1 month ago

  • Assignee set to Laura Flores
Actions #8

Updated by Laura Flores about 1 month ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 56619
Actions #9

Updated by Laura Flores 26 days ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF