Bug #55723
octopus: dashboard failures due to timed-out or failed connections
0%
Description
Octopus runs in the teuthology rados suite are experiencing many consistent failures of this kind:
/a/yuriw-2022-05-19_14:09:24-rados-wip-yuri6-testing-2022-05-17-1603-octopus-distro-default-smithi/6841353
2022-05-19T14:30:03.022 INFO:tasks.cephfs_test_runner:======================================================================
2022-05-19T14:30:03.022 INFO:tasks.cephfs_test_runner:ERROR: test_standby (tasks.mgr.test_dashboard.TestDashboard)
2022-05-19T14:30:03.023 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-05-19T14:30:03.023 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-05-19T14:30:03.023 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_9dfe5561e7f8bbf1095613ed99b58dd72943d57a/qa/tasks/mgr/test_dashboard.py", line 62, in test_standby
2022-05-19T14:30:03.023 INFO:tasks.cephfs_test_runner: self.wait_until_webserver_available(original_uri)
2022-05-19T14:30:03.024 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_9dfe5561e7f8bbf1095613ed99b58dd72943d57a/qa/tasks/mgr/test_dashboard.py", line 39, in wait_until_webserver_available
2022-05-19T14:30:03.024 INFO:tasks.cephfs_test_runner: self.wait_until_true(_check_connection, timeout=30)
2022-05-19T14:30:03.024 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_9dfe5561e7f8bbf1095613ed99b58dd72943d57a/qa/tasks/ceph_test_case.py", line 196, in wait_until_true
2022-05-19T14:30:03.025 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0}s".format(elapsed))
2022-05-19T14:30:03.025 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s
2022-05-19T14:30:03.025 INFO:tasks.cephfs_test_runner:
2022-05-19T14:30:03.026 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
From a comment by Laura this has not reproduced in the latest QA runs so it could be just a flaky test. Decreasing prio (I'll keep it open for a month and close it if not happening again).
Related issues
History
#1 Updated by Ernesto Puerta over 1 year ago
- Category set to Testing & QA
- Assignee set to Avan Thakkar
#2 Updated by Ernesto Puerta over 1 year ago
- Status changed from New to Triaged
#3 Updated by Laura Flores over 1 year ago
In terms of the first failure on here, test_standby for Dashboard, I looked into the Octopus git history, and the most recent commit in qa/tasks/mgr/test_dashboard.py is this one, which makes a direct modification to test_standby: https://github.com/ceph/ceph/commit/a1c9e6de01da2daa76ec2f323065d38be80317c6.
However, the most recent Octopus QA run that did not contain these failures was http://pulpito.front.sepia.ceph.com/yuriw-2022-04-26_20:58:55-rados-wip-yuri2-testing-2022-04-26-1132-octopus-distro-default-smithi/. I checked the branch that this run is associated with (ci/wip-yuri2-testing-2022-04-26-1132-octopus), and it does contain the commit I linked above. And the tests that are now failing were succeeding. So this seems like a recent development that is not linked to the introduction of that commit.
As for the other failures, test_standby/Prometheus and test_selftest_command_spam, those look different. Maybe a problem with python3.6?
#4 Updated by Ernesto Puerta over 1 year ago
- Copied to Bug #55774: octopus: prometheus, and selftest failures due to timed-out or failed connections added
#5 Updated by Ernesto Puerta over 1 year ago
- Subject changed from octopus: dashboard, prometheus, and selftest failures due to timed-out or failed connections to octopus: dashboard failures due to timed-out or failed connections
- Description updated (diff)
- Priority changed from Immediate to Normal