Bug #47612
openERROR: setUpClass (tasks.mgr.dashboard.test_health.HealthTest)
0%
Description
2020-09-23 10:28:57,731.731 INFO:__main__:---------------------------------------------------------------------- 2020-09-23 10:28:57,731.731 INFO:__main__:Traceback (most recent call last): 2020-09-23 10:28:57,732.732 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 117, in setUpClass 2020-09-23 10:28:57,732.732 INFO:__main__: cls._load_module("dashboard") 2020-09-23 10:28:57,732.732 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/mgr_test_case.py", line 144, in _load_module 2020-09-23 10:28:57,732.732 INFO:__main__: cls.wait_until_true(has_restarted, timeout=30) 2020-09-23 10:28:57,733.733 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/ceph_test_case.py", line 197, in wait_until_true 2020-09-23 10:28:57,733.733 INFO:__main__: raise TestTimeoutError("Timed out after {0}s".format(elapsed)) 2020-09-23 10:28:57,733.733 INFO:__main__:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s 2020-09-23 10:28:57,733.733 INFO:__main__: 2020-09-23 10:28:57,734.734 INFO:__main__:---------------------------------------------------------------------- 2020-09-23 10:28:57,734.734 INFO:__main__:Ran 49 tests in 559.575s 2020-09-23 10:28:57,734.734 INFO:__main__: 2020-09-23 10:28:57,734.734 INFO:__main__:
Updated by Ernesto Puerta about 3 years ago
- Project changed from mgr to Dashboard
- Category changed from API Tests to Testing - API
Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
Updated by Casey Bodley 3 months ago
- Assignee deleted (
Laura Paduano)
Updated by Pedro González Gómez 3 months ago
Updated by Pedro González Gómez 3 months ago
- Assignee set to Pedro González Gómez
Updated by Casey Bodley about 1 month ago
https://jenkins.ceph.com/job/ceph-api/71512/
2024-03-29 08:38:15,187.187 INFO:__main__:---------------------------------------------------------------------- 2024-03-29 08:38:15,187.187 INFO:__main__:Traceback (most recent call last): 2024-03-29 08:38:15,187.187 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 222, in setUpClass 2024-03-29 08:38:15,187.187 INFO:__main__: cls.mds_cluster.mds_stop() 2024-03-29 08:38:15,187.187 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/cephfs/filesystem.py", line 364, in mds_stop 2024-03-29 08:38:15,187.187 INFO:__main__: self._one_or_all(mds_id, lambda id_: self.mds_daemons[id_].stop()) 2024-03-29 08:38:15,188.188 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/cephfs/filesystem.py", line 338, in _one_or_all 2024-03-29 08:38:15,188.188 INFO:__main__: with parallel() as p: 2024-03-29 08:38:15,188.188 INFO:__main__: File "/tmp/tmp.SK1qhyV79P/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 84, in __exit__ 2024-03-29 08:38:15,188.188 INFO:__main__: for result in self: 2024-03-29 08:38:15,188.188 INFO:__main__: File "/tmp/tmp.SK1qhyV79P/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 98, in __next__ 2024-03-29 08:38:15,188.188 INFO:__main__: resurrect_traceback(result) 2024-03-29 08:38:15,188.188 INFO:__main__: File "/tmp/tmp.SK1qhyV79P/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 30, in resurrect_traceback 2024-03-29 08:38:15,188.188 INFO:__main__: raise exc.exc_info[1] 2024-03-29 08:38:15,188.188 INFO:__main__: File "/tmp/tmp.SK1qhyV79P/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 23, in capture_traceback 2024-03-29 08:38:15,188.188 INFO:__main__: return func(*args, **kwargs) 2024-03-29 08:38:15,189.189 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/cephfs/filesystem.py", line 364, in 2024-03-29 08:38:15,189.189 INFO:__main__: self._one_or_all(mds_id, lambda id_: self.mds_daemons[id_].stop()) 2024-03-29 08:38:15,189.189 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 563, in stop 2024-03-29 08:38:15,189.189 INFO:__main__: raise MaxWhileTries( 2024-03-29 08:38:15,189.189 INFO:__main__:teuthology.exceptions.MaxWhileTries: Timed out waiting for daemon mds.a
Updated by Casey Bodley about 1 month ago
again on squid https://jenkins.ceph.com/job/ceph-api/71686/
Updated by Casey Bodley about 1 month ago
Updated by Casey Bodley 21 days ago
https://jenkins.ceph.com/job/ceph-api/72561/
2024-04-17 17:40:43,405.405 INFO:__main__:---------------------------------------------------------------------- 2024-04-17 17:40:43,405.405 INFO:__main__:Traceback (most recent call last): 2024-04-17 17:40:43,405.405 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 222, in setUpClass 2024-04-17 17:40:43,405.405 INFO:__main__: cls.mds_cluster.mds_stop() 2024-04-17 17:40:43,405.405 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/cephfs/filesystem.py", line 364, in mds_stop 2024-04-17 17:40:43,405.405 INFO:__main__: self._one_or_all(mds_id, lambda id_: self.mds_daemons[id_].stop()) 2024-04-17 17:40:43,405.405 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/cephfs/filesystem.py", line 338, in _one_or_all 2024-04-17 17:40:43,405.405 INFO:__main__: with parallel() as p: 2024-04-17 17:40:43,405.405 INFO:__main__: File "/tmp/tmp.E54XocQKFg/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 84, in __exit__ 2024-04-17 17:40:43,405.405 INFO:__main__: for result in self: 2024-04-17 17:40:43,406.406 INFO:__main__: File "/tmp/tmp.E54XocQKFg/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 98, in __next__ 2024-04-17 17:40:43,406.406 INFO:__main__: resurrect_traceback(result) 2024-04-17 17:40:43,406.406 INFO:__main__: File "/tmp/tmp.E54XocQKFg/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 30, in resurrect_traceback 2024-04-17 17:40:43,406.406 INFO:__main__: raise exc.exc_info[1] 2024-04-17 17:40:43,406.406 INFO:__main__: File "/tmp/tmp.E54XocQKFg/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 23, in capture_traceback 2024-04-17 17:40:43,406.406 INFO:__main__: return func(*args, **kwargs) 2024-04-17 17:40:43,406.406 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/cephfs/filesystem.py", line 364, in 2024-04-17 17:40:43,406.406 INFO:__main__: self._one_or_all(mds_id, lambda id_: self.mds_daemons[id_].stop()) 2024-04-17 17:40:43,406.406 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 563, in stop 2024-04-17 17:40:43,406.406 INFO:__main__: raise MaxWhileTries( 2024-04-17 17:40:43,407.407 INFO:__main__:teuthology.exceptions.MaxWhileTries: Timed out waiting for daemon mds.a
Updated by Casey Bodley 8 days ago
https://jenkins.ceph.com/job/ceph-api/73295/
2024-05-01 14:33:49,801.801 INFO:__main__:---------------------------------------------------------------------- 2024-05-01 14:33:49,801.801 INFO:__main__:Traceback (most recent call last): 2024-05-01 14:33:49,801.801 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 222, in setUpClass 2024-05-01 14:33:49,801.801 INFO:__main__: cls.mds_cluster.mds_stop() 2024-05-01 14:33:49,802.802 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/cephfs/filesystem.py", line 370, in mds_stop 2024-05-01 14:33:49,802.802 INFO:__main__: self._one_or_all(mds_id, lambda id_: self.mds_daemons[id_].stop()) 2024-05-01 14:33:49,802.802 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/cephfs/filesystem.py", line 344, in _one_or_all 2024-05-01 14:33:49,802.802 INFO:__main__: with parallel() as p: 2024-05-01 14:33:49,802.802 INFO:__main__: File "/tmp/tmp.cbe5g4hISM/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 84, in __exit__ 2024-05-01 14:33:49,802.802 INFO:__main__: for result in self: 2024-05-01 14:33:49,802.802 INFO:__main__: File "/tmp/tmp.cbe5g4hISM/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 98, in __next__ 2024-05-01 14:33:49,802.802 INFO:__main__: resurrect_traceback(result) 2024-05-01 14:33:49,802.802 INFO:__main__: File "/tmp/tmp.cbe5g4hISM/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 30, in resurrect_traceback 2024-05-01 14:33:49,802.802 INFO:__main__: raise exc.exc_info[1] 2024-05-01 14:33:49,802.802 INFO:__main__: File "/tmp/tmp.cbe5g4hISM/venv/lib/python3.10/site-packages/teuthology/parallel.py", line 23, in capture_traceback 2024-05-01 14:33:49,802.802 INFO:__main__: return func(*args, **kwargs) 2024-05-01 14:33:49,802.802 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/cephfs/filesystem.py", line 370, in 2024-05-01 14:33:49,802.802 INFO:__main__: self._one_or_all(mds_id, lambda id_: self.mds_daemons[id_].stop()) 2024-05-01 14:33:49,802.802 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 573, in stop 2024-05-01 14:33:49,802.802 INFO:__main__: raise MaxWhileTries( 2024-05-01 14:33:49,802.802 INFO:__main__:teuthology.exceptions.MaxWhileTries: Timed out waiting for daemon mds.a
hi @Pedro González Gómez , have you made any progress with this?
Updated by Leonid Usov 8 days ago
I was about to open an issue and got to this one with a search. I suspect that this should be handled by the cephfs team, but before we can proceed we should change the logging setup for mds to at least 10/10. Currently, the only logs I see for failed runs like this one: https://jenkins.ceph.com/job/ceph-api/73292/ is
2024-05-01T13:42:20.605+0000 7f0883e6fbc0 0 ceph version Development (no_version) squid (dev), process ceph-mds, pid 3146654 2024-05-01T13:42:20.605+0000 7f0883e6fbc0 1 main not setting numa affinity 2024-05-01T13:42:20.625+0000 7f087f632640 1 mds.a Updating MDS map to version 1 from mon.0 2024-05-01T13:42:20.629+0000 7f087f632640 1 mds.a Updating MDS map to version 2 from mon.0 2024-05-01T13:42:20.629+0000 7f087f632640 1 mds.a Monitors have assigned me to become a standby 2024-05-01T13:48:53.961+0000 7f16b6bf4bc0 -1 WARNING: all dangerous and experimental features are enabled. 2024-05-01T13:48:53.977+0000 7f16b6bf4bc0 -1 WARNING: all dangerous and experimental features are enabled. 2024-05-01T13:48:53.977+0000 7f16b6bf4bc0 0 ceph version Development (no_version) squid (dev), process ceph-mds, pid 3165902 2024-05-01T13:48:53.977+0000 7f16b6bf4bc0 1 main not setting numa affinity 2024-05-01T13:48:53.981+0000 7f16b6bf4bc0 -1 WARNING: all dangerous and experimental features are enabled. 2024-05-01T13:48:54.017+0000 7f16b23b7640 1 mds.a Updating MDS map to version 7 from mon.1 2024-05-01T13:48:54.317+0000 7f16b23b7640 1 mds.a Updating MDS map to version 8 from mon.1 2024-05-01T13:48:54.317+0000 7f16b23b7640 1 mds.a Monitors have assigned me to become a standby 2024-05-01T13:48:54.373+0000 7f16b23b7640 1 mds.a Updating MDS map to version 9 from mon.1 2024-05-01T13:48:54.377+0000 7f16b23b7640 1 mds.0.9 handle_mds_map I am now mds.0.9 2024-05-01T13:48:54.377+0000 7f16b23b7640 1 mds.0.9 handle_mds_map state change up:standby --> up:creating 2024-05-01T13:48:54.377+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x1 2024-05-01T13:48:54.377+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x100 2024-05-01T13:48:54.377+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x600 2024-05-01T13:48:54.377+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x601 2024-05-01T13:48:54.377+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x602 2024-05-01T13:48:54.377+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x603 2024-05-01T13:48:54.381+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x604 2024-05-01T13:48:54.381+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x605 2024-05-01T13:48:54.381+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x606 2024-05-01T13:48:54.381+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x607 2024-05-01T13:48:54.381+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x608 2024-05-01T13:48:54.381+0000 7f16b23b7640 0 mds.0.cache creating system inode with ino:0x609 2024-05-01T13:48:54.437+0000 7f16abbaa640 1 mds.0.9 creating_done 2024-05-01T13:48:55.373+0000 7f16b23b7640 1 mds.a Updating MDS map to version 10 from mon.1 2024-05-01T13:48:55.373+0000 7f16b23b7640 1 mds.0.9 handle_mds_map I am now mds.0.9 2024-05-01T13:48:55.373+0000 7f16b23b7640 1 mds.0.9 handle_mds_map state change up:creating --> up:active 2024-05-01T13:48:55.373+0000 7f16b23b7640 1 mds.0.9 recovery_done -- successful recovery! 2024-05-01T13:48:55.373+0000 7f16b23b7640 1 mds.0.9 active_start 2024-05-01T13:48:55.937+0000 7f16b53bd640 1 mds.a asok_command: status {prefix=status} (starting...) 2024-05-01T13:48:59.390+0000 7f16afbb2640 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request. 2024-05-01T13:49:08.470+0000 7f16b53bd640 1 mds.a asok_command: session ls {format=json,prefix=session ls} (starting...) 2024-05-01T13:49:10.970+0000 7f16b53bd640 1 mds.a asok_command: session ls {format=json,prefix=session ls} (starting...) 2024-05-01T13:49:32.330+0000 7f16b53bd640 1 mds.a asok_command: session ls {format=json,prefix=session ls} (starting...) 2024-05-01T13:50:11.027+0000 7f16b23b7640 1 mds.a Updating MDS map to version 11 from mon.1 2024-05-01T13:50:15.179+0000 7f16b23b7640 1 mds.a Updating MDS map to version 12 from mon.1 2024-05-01T13:50:30.572+0000 7f16b23b7640 1 mds.a Updating MDS map to version 13 from mon.1 2024-05-01T13:50:34.648+0000 7f16b23b7640 1 mds.a Updating MDS map to version 14 from mon.1 2024-05-01T13:57:33.073+0000 7f16b3bba640 -1 received signal: Terminated from python ../qa/tasks/vstart_runner.py --ignore-missing-binaries --no-verbose tasks.mgr.test_dashboard tasks.mgr.dashboard.test_api tasks.mgr.dashboard.test_auth tasks.mgr.dashboard.test_cephfs tasks.mgr.dashboard.test_cluster_configuration tasks.mgr.dashboard.test_cluster tasks.mgr.dashboard.test_crush_rule tasks.mgr.dashboard.test_erasure_code_profile tasks.mgr.dashboard.test_feedback tasks.mgr.dashboard.test_health tasks.mgr.dashboard.test_host tasks.mgr.dashboard.test_logs tasks.mgr.dashboard.test_mgr_module tasks.mgr.dashboard.test_monitor tasks.mgr.dashboard.test_motd tasks.mgr.dashboard.test_orchestrator tasks.mgr.dashboard.test_osd tasks.mgr.dashboard.test_perf_counters tasks.mgr.dashboard.test_pool tasks.mgr.dashboard.test_rbd_mirroring tasks.mgr.dashboard.test_rbd tasks.mgr.dashboard.test_requests tasks.mgr.dashboard.test_rgw tasks.mgr.dashboard.test_role tasks.mgr.dashboard.test_settings tasks.mgr.dashboard.test_summary tasks.mgr.dashboard.test_telemetry tasks.mgr.dashboard.test_user tasks.mgr.test_module_selftest (PID: 3154024) UID: 1112
This is not sufficient to understand why the said mds was causing the timeout.
However, we should also review whether this is a valid approach in DashboardTestCase::setUpClass:
# To avoid any issues with e.g. unlink bugs, we destroy and recreate # the filesystem rather than just doing a rm -rf of files cls.mds_cluster.mds_stop() cls.mds_cluster.mds_fail() cls.mds_cluster.delete_all_filesystems()
We may end up having two separate issues to deal with, in which case we'd open a separate ticket for the stop timeout and have this ticket dedicated to a more stable approach to re-creating a file system
Updated by Patrick Donnelly 7 days ago
Suggest updating your tearDown/setUp procedures to mirror what CephFSTestCase is doing.