Bug #65265
openqa: health warning "no active mgr (MGR_DOWN)" occurs before and after test_nfs runs
0%
Description
Link to the job - https://pulpito.ceph.com/rishabh-2024-03-27_05:27:11-fs-wip-rishabh-testing-20240326.131558-testing-default-smithi/7625569/
The tests (qa/tasks/cephfs/test_nfs.py
) ran successfully but the job failed due to the unexpected health warnings -
2024-03-27T06:38:24.458 INFO:teuthology.orchestra.run.smithi184.stdout:2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)
This health warning occurred 4 times in total, 2 times before tes_nfs.py started running and 2 times after test_nfs.py finished running and never during test_nfs.py was running.
Warning 1, line 11268 - 2024-03-27T06:07:34.833 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:34 smithi184 bash[21504]: cluster 2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)
Warning 2, line 23642 - 2024-03-27T06:07:49.277 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:48 smithi184 bash[21504]: cluster 2024-03-27T06:07:47.832342+0000 mon.a (mon.0) 342 : cluster [INF] Health check cleared: MGR_DOWN (was: no active mgr)
Then cluster becomes healthy, from line 23643 -
2024-03-27T06:07:49.277 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:48 smithi184 bash[21504]: cluster 2024-03-27T06:07:47.832393+0000 mon.a (mon.0) 343 : cluster [INF] Cluster is now healthy 2024-03-27T06:07:49.277 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:48 smithi184 bash[21504]: cluster 2024-03-27T06:07:47.836380+0000 mon.a (mon.0) 344 : cluster [DBG] mgrmap e20: x(active, star
Tests start running, line 42136 -
2024-03-27T06:07:52.025 INFO:tasks.cephfs_test_runner:Starting test: test_cephfs_export_update_at_non_dir_path (tasks.cephfs.test_nfs.TestNFS)
Cluster is health again when tests are at the end of test_nfs.py, from line 231178 -
2024-03-27T06:32:18.776 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:32:18 smithi184 bash[21504]: cluster 2024-03-27T06:32:17.531023+0000 mon.a (mon.0) 3332 : cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded) 2024-03-27T06:32:18.776 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:32:18 smithi184 bash[21504]: cluster 2024-03-27T06:32:17.531067+0000 mon.a (mon.0) 3333 : cluster [INF] Cluster is now healthy
Tests finish running, line 247158 -
2024-03-27T06:37:26.372 INFO:tasks.cephfs_test_runner:Ran 30 tests in 1796.992s
Warning 3, line 247231 - 2024-03-27T06:38:24.458 INFO:teuthology.orchestra.run.smithi184.stdout:2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)
Warning 4, line 247236 - 2024-03-27T06:38:24.673 INFO:teuthology.orchestra.run.smithi184.stdout:2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)
From /a/rishabh-2024-03-27_05:27:11-fs-wip-rishabh-testing-20240326.131558-testing-default-smithi/7625569/remote/smithi184/log/2d1fee3e-ebff-11ee-95d0-87774f69a715/ceph-mgr.x.log.gz
-
2024-03-27T06:00:40.207+0000 7f426b494200 0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7 2024-03-27T06:00:55.715+0000 7f303e506200 0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7 2024-03-27T06:01:29.896+0000 7f8248a0c200 0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7 2024-03-27T06:07:39.269+0000 7fbe2299a200 0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7 2024-03-27T06:07:56.965+0000 7f09f5cd8200 0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7 2024-03-27T06:18:30.951+0000 7f61b33d3200 0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7 2024-03-27T06:18:40.251+0000 7f39b1c35200 0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
None of the PR in the testing batch looks related to this. Infact this doesn't look related to CephFS. Venky confirmed the same.