Bug #65265: qa: health warning "no active mgr (MGR_DOWN)" occurs before and after test_nfs runs - CephFS - Ceph

Actions

Copy link

Bug #65265

open

qa: health warning "no active mgr (MGR_DOWN)" occurs before and after test_nfs runs

Added by Rishabh Dave about 1 month ago. Updated 1 day ago.

Status:

Pending Backport

Priority:

Urgent

Assignee:

Dhairya Parmar

Category:

Correctness/Safety

Target version:

Ceph - v20.0.0

% Done:

Source:

Tags:

backport_processed

Backport:

quincy,reef,squid

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

mgr/nfs

Labels (FS):

Pull request ID:

56944

Crash signature (v1):

Crash signature (v2):

Description

Link to the job - https://pulpito.ceph.com/rishabh-2024-03-27_05:27:11-fs-wip-rishabh-testing-20240326.131558-testing-default-smithi/7625569/

The tests (qa/tasks/cephfs/test_nfs.py) ran successfully but the job failed due to the unexpected health warnings -

2024-03-27T06:38:24.458 INFO:teuthology.orchestra.run.smithi184.stdout:2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)

This health warning occurred 4 times in total, 2 times before tes_nfs.py started running and 2 times after test_nfs.py finished running and never during test_nfs.py was running.

Warning 1, line 11268 - 2024-03-27T06:07:34.833 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:34 smithi184 bash[21504]: cluster 2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)
Warning 2, line 23642 - 2024-03-27T06:07:49.277 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:48 smithi184 bash[21504]: cluster 2024-03-27T06:07:47.832342+0000 mon.a (mon.0) 342 : cluster [INF] Health check cleared: MGR_DOWN (was: no active mgr)
Then cluster becomes healthy, from line 23643 -

2024-03-27T06:07:49.277 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:48 smithi184 bash[21504]: cluster 2024-03-27T06:07:47.832393+0000 mon.a (mon.0) 343 : cluster [INF] Cluster is now healthy
2024-03-27T06:07:49.277 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:07:48 smithi184 bash[21504]: cluster 2024-03-27T06:07:47.836380+0000 mon.a (mon.0) 344 : cluster [DBG] mgrmap e20: x(active, star

Tests start running, line 42136 -

2024-03-27T06:07:52.025 INFO:tasks.cephfs_test_runner:Starting test: test_cephfs_export_update_at_non_dir_path (tasks.cephfs.test_nfs.TestNFS)

Cluster is health again when tests are at the end of test_nfs.py, from line 231178 -

2024-03-27T06:32:18.776 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:32:18 smithi184 bash[21504]: cluster 2024-03-27T06:32:17.531023+0000 mon.a (mon.0) 3332 : cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded)
2024-03-27T06:32:18.776 INFO:journalctl@ceph.mon.a.smithi184.stdout:Mar 27 06:32:18 smithi184 bash[21504]: cluster 2024-03-27T06:32:17.531067+0000 mon.a (mon.0) 3333 : cluster [INF] Cluster is now healthy

Tests finish running, line 247158 - 2024-03-27T06:37:26.372 INFO:tasks.cephfs_test_runner:Ran 30 tests in 1796.992s

Warning 3, line 247231 - 2024-03-27T06:38:24.458 INFO:teuthology.orchestra.run.smithi184.stdout:2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)
Warning 4, line 247236 - 2024-03-27T06:38:24.673 INFO:teuthology.orchestra.run.smithi184.stdout:2024-03-27T06:07:34.219228+0000 mon.a (mon.0) 323 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)

From /a/rishabh-2024-03-27_05:27:11-fs-wip-rishabh-testing-20240326.131558-testing-default-smithi/7625569/remote/smithi184/log/2d1fee3e-ebff-11ee-95d0-87774f69a715/ceph-mgr.x.log.gz -

2024-03-27T06:00:40.207+0000 7f426b494200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:00:55.715+0000 7f303e506200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:01:29.896+0000 7f8248a0c200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:07:39.269+0000 7fbe2299a200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:07:56.965+0000 7f09f5cd8200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:18:30.951+0000 7f61b33d3200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7
2024-03-27T06:18:40.251+0000 7f39b1c35200  0 ceph version 19.0.0-2478-g155268c4 (155268c4e432a12433aa833f174f9fe3b1016ae0) squid (dev), process ceph-mgr, pid 7

None of the PR in the testing batch looks related to this. Infact this doesn't look related to CephFS. Venky confirmed the same.

Related issues 4 (3 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #65265

qa: health warning "no active mgr (MGR_DOWN)" occurs before and after test_nfs runs

Updated by Rishabh Dave about 1 month ago

Updated by Laura Flores about 1 month ago

Updated by Venky Shankar about 1 month ago

Updated by Venky Shankar about 1 month ago

Updated by Dhairya Parmar about 1 month ago

Updated by Dhairya Parmar about 1 month ago

Updated by Dhairya Parmar about 1 month ago

Updated by Venky Shankar about 1 month ago

Updated by Venky Shankar about 1 month ago

Updated by Dhairya Parmar about 1 month ago

Updated by Venky Shankar about 1 month ago

Updated by Dhairya Parmar about 1 month ago

Updated by Venky Shankar about 1 month ago

Updated by Dhairya Parmar about 1 month ago

Updated by Venky Shankar about 1 month ago

Updated by Dhairya Parmar about 1 month ago

Updated by Rishabh Dave 22 days ago · Edited

Updated by Dhairya Parmar 22 days ago

Updated by Patrick Donnelly 22 days ago

Updated by Dhairya Parmar 22 days ago · Edited

Updated by Dhairya Parmar 22 days ago

Updated by Patrick Donnelly 22 days ago

Updated by Patrick Donnelly 22 days ago

Updated by Dhairya Parmar 21 days ago

Updated by Milind Changire 14 days ago

Updated by Venky Shankar 1 day ago

Updated by Backport Bot 1 day ago

Updated by Backport Bot 1 day ago

Updated by Backport Bot 1 day ago

Updated by Backport Bot 1 day ago