Bug #64864
closedcephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster log
0%
Description
The following tests in the cephadm suite failed with the warning:
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587779
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587855
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587949
All the tests above add "MON_DOWN" to the ignore list as it's expected. In addition to the health
warning, the health detail is also logged by all the tests shown below:
"cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c" in cluster log
All the tests failed due to the above warning not present in the ignorelist.
Therefore, this tracker may be used to track the addition of "mons down" warning
as well to the ignore list for the tests.
Logs from 7587779 are shown below as an example:
2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.900461+0000 mon.a (mon.0) 274 : cluster [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN) 2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c 2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.908009+0000 mon.a (mon.0) 276 : cluster [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum a,c ... 2024-03-10T02:10:47.804 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[ERR\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1 2024-03-10T02:10:47.859 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[WRN\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1 2024-03-10T02:10:47.915 INFO:teuthology.orchestra.run.smithi033.stdout:2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c