Project

General

Profile

Actions

Bug #65265

open

qa: health warning "no active mgr (MGR_DOWN)" occurs before and after test_nfs runs

Added by Rishabh Dave about 1 month ago. Updated 1 day ago.

Status:
Pending Backport
Priority:
Urgent
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy,reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/nfs
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Link to the job - https://pulpito.ceph.com/rishabh-2024-03-27_05:27:11-fs-wip-rishabh-testing-20240326.131558-testing-default-smithi/7625569/

The tests (qa/tasks/cephfs/test_nfs.py) ran successfully but the job failed due to the unexpected health warnings -

2024-03-27T06:38:24.458 INFO:teuthology.orchestra.run.smithi184.stdout:2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)

This health warning occurred 4 times in total, 2 times before tes_nfs.py started running and 2 times after test_nfs.py finished running and never during test_nfs.py was running.

Warning 1, line 11268 - 2024-03-27T06:07:34.833 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:34 smithi184 bash[21504]: cluster 2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)
Warning 2, line 23642 - 2024-03-27T06:07:49.277 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:48 smithi184 bash[21504]: cluster 2024-03-27T06:07:47.832342+0000 mon.a (mon.0) 342 : cluster [INF] Health check cleared: MGR_DOWN (was: no active mgr)
Then cluster becomes healthy, from line 23643 -

2024-03-27T06:07:49.277 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:48 smithi184 bash[21504]: cluster 2024-03-27T06:07:47.832393+0000 mon.a (mon.0) 343 : cluster [INF] Cluster is now healthy
2024-03-27T06:07:49.277 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:48 smithi184 bash[21504]: cluster 2024-03-27T06:07:47.836380+0000 mon.a (mon.0) 344 : cluster [DBG] mgrmap e20: x(active, star

Tests start running, line 42136 - 2024-03-27T06:07:52.025 INFO:tasks.cephfs_test_runner:Starting test: test_cephfs_export_update_at_non_dir_path (tasks.cephfs.test_nfs.TestNFS)
Cluster is health again when tests are at the end of test_nfs.py, from line 231178 -
2024-03-27T06:32:18.776 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:32:18 smithi184 bash[21504]: cluster 2024-03-27T06:32:17.531023+0000 mon.a (mon.0) 3332 : cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded)
2024-03-27T06:32:18.776 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:32:18 smithi184 bash[21504]: cluster 2024-03-27T06:32:17.531067+0000 mon.a (mon.0) 3333 : cluster [INF] Cluster is now healthy

Tests finish running, line 247158 - 2024-03-27T06:37:26.372 INFO:tasks.cephfs_test_runner:Ran 30 tests in 1796.992s

Warning 3, line 247231 - 2024-03-27T06:38:24.458 INFO:teuthology.orchestra.run.smithi184.stdout:2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)
Warning 4, line 247236 - 2024-03-27T06:38:24.673 INFO:teuthology.orchestra.run.smithi184.stdout:2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)

From /a/rishabh-2024-03-27_05:27:11-fs-wip-rishabh-testing-20240326.131558-testing-default-smithi/7625569/remote/smithi184/log/2d1fee3e-ebff-11ee-95d0-87774f69a715/ceph-mgr.x.log.gz -

2024-03-27T06:00:40.207+0000 7f426b494200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:00:55.715+0000 7f303e506200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:01:29.896+0000 7f8248a0c200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:07:39.269+0000 7fbe2299a200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:07:56.965+0000 7f09f5cd8200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:18:30.951+0000 7f61b33d3200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:18:40.251+0000 7f39b1c35200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7

None of the PR in the testing batch looks related to this. Infact this doesn't look related to CephFS. Venky confirmed the same.


Related issues 4 (3 open1 closed)

Related to CephFS - Bug #65021: qa/suites/fs/nfs: cluster [WRN] Health check failed: 1 stray daemon(s) not managed by cephadm (CEPHADM_STRAY_DAEMON)" in cluster logDuplicateDhairya Parmar

Actions
Copied to CephFS - Backport #66060: squid: qa: health warning "no active mgr (MGR_DOWN)" occurs before and after test_nfs runsNewDhairya ParmarActions
Copied to CephFS - Backport #66061: reef: qa: health warning "no active mgr (MGR_DOWN)" occurs before and after test_nfs runsNewDhairya ParmarActions
Copied to CephFS - Backport #66062: quincy: qa: health warning "no active mgr (MGR_DOWN)" occurs before and after test_nfs runsNewDhairya ParmarActions
Actions

Also available in: Atom PDF