Project

General

Profile

Bug #64864

Updated by Sridhar Seshasayee 2 months ago

The following tests in the cephadm suite failed with the warning: 

 /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587779 
 /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587855 
 /a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587949 

 All the tests above add "MON_DOWN" to the ignore list as it's expected. In addition to the health 
 warning, the health detail is also logged by all the tests shown below: 

 <pre> 
 "cluster cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c" in cluster log 
 </pre> 

 All the tests failed due to the above warning not present in the ignorelist. 

 Therefore, this tracker may be used to track the addition of "mons down" warning 
 as well to the ignore list for the tests. 

 Logs from 7587779 are shown below as an example: 

 <pre> 
 2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.900461+0000 mon.a (mon.0) 274 : cluster [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN) 
 2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c 
 2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.908009+0000 mon.a (mon.0) 276 : cluster [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum a,c 

 ... 

 2024-03-10T02:10:47.804 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[ERR\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1 
 2024-03-10T02:10:47.859 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[WRN\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1 
 2024-03-10T02:10:47.915 INFO:teuthology.orchestra.run.smithi033.stdout:2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c 
 </pre>

Back