Actions
Bug #51092
closedmds: Timed out waiting for MDS daemons to become healthy
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2021-06-03T11:59:43.755 INFO:tasks.cephfs_test_runner:test_full_fsync (tasks.cephfs.test_full.TestClusterFull) ... ERROR 2021-06-03T11:59:43.756 INFO:tasks.cephfs_test_runner: 2021-06-03T11:59:43.756 INFO:tasks.cephfs_test_runner:====================================================================== 2021-06-03T11:59:43.757 INFO:tasks.cephfs_test_runner:ERROR: test_full_fsync (tasks.cephfs.test_full.TestClusterFull) 2021-06-03T11:59:43.757 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2021-06-03T11:59:43.757 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2021-06-03T11:59:43.758 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_986f0ca109bf8b3eec6027f40cf5b33b8b7bbaf4/qa/tasks/cephfs/test_full.py", line 395, in setUp 2021-06-03T11:59:43.758 INFO:tasks.cephfs_test_runner: super(TestClusterFull, self).setUp() 2021-06-03T11:59:43.758 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_986f0ca109bf8b3eec6027f40cf5b33b8b7bbaf4/qa/tasks/cephfs/test_full.py", line 32, in setUp 2021-06-03T11:59:43.758 INFO:tasks.cephfs_test_runner: CephFSTestCase.setUp(self) 2021-06-03T11:59:43.759 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_986f0ca109bf8b3eec6027f40cf5b33b8b7bbaf4/qa/tasks/cephfs/cephfs_test_case.py", line 169, in setUp 2021-06-03T11:59:43.759 INFO:tasks.cephfs_test_runner: self.fs.wait_for_daemons() 2021-06-03T11:59:43.759 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_986f0ca109bf8b3eec6027f40cf5b33b8b7bbaf4/qa/tasks/cephfs/filesystem.py", line 1075, in wait_for_daemons 2021-06-03T11:59:43.760 INFO:tasks.cephfs_test_runner: raise RuntimeError("Timed out waiting for MDS daemons to become healthy") 2021-06-03T11:59:43.760 INFO:tasks.cephfs_test_runner:RuntimeError: Timed out waiting for MDS daemons to become healthy 2021-06-03T11:59:43.760 INFO:tasks.cephfs_test_runner: 2021-06-03T11:59:43.760 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2021-06-03T11:59:43.761 INFO:tasks.cephfs_test_runner:Ran 4 tests in 416.483s 2021-06-03T11:59:43.761 INFO:tasks.cephfs_test_runner: 2021-06-03T11:59:43.761 INFO:tasks.cephfs_test_runner:FAILED (errors=1) 2021-06-03T11:59:43.762 INFO:tasks.cephfs_test_runner:
Checked the `mds.c` daemon log, it seems since the osd is full and the mds.c kept up:creating state:
2021-06-03T11:55:58.894+0000 7f01a0e67700 7 mds.0.server operator(): full = 1 epoch = 92 2021-06-03T11:55:58.894+0000 7f01a0e67700 4 mds.0.39 handle_osd_map epoch 92, 0 new blocklist entries 2021-06-03T11:55:58.894+0000 7f01a0e67700 10 mds.0.server apply_blocklist: killed 0 ... 2021-06-03T11:56:01.110+0000 7f01a0e67700 10 mds.c my gid is 6078 2021-06-03T11:56:01.110+0000 7f01a0e67700 10 mds.c map says I am mds.0.39 state up:creating 2021-06-03T11:56:01.110+0000 7f01a0e67700 10 mds.c msgr says I am [v2:172.21.15.25:6834/3032317271,v1:172.21.15.25:6835/3032317271] 2021-06-03T11:56:01.110+0000 7f01a0e67700 10 mds.c handle_mds_map: handling map as rank 0 2021-06-03T11:56:01.110+0000 7f01a0e67700 10 notify_mdsmap: mds.metrics 2021-06-03T11:56:01.110+0000 7f01a0e67700 10 notify_mdsmap: mds.metrics: rank0 is unavailable
From osd.4 log, the osd.4 daemon was in failsafe state:
2021-06-03T11:56:10.693+0000 7f814d502700 20 osd.4 93 check_full_status cur ratio 9.22337e+10, physical ratio 9.22337e+10, new state failsafe
Actions