Project

General

Profile

Actions

Bug #55723

open

octopus: dashboard failures due to timed-out or failed connections

Added by Laura Flores almost 2 years ago. Updated almost 2 years ago.

Status:
Triaged
Priority:
Normal
Assignee:
Category:
Testing & QA
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Octopus runs in the teuthology rados suite are experiencing many consistent failures of this kind:

/a/yuriw-2022-05-19_14:09:24-rados-wip-yuri6-testing-2022-05-17-1603-octopus-distro-default-smithi/6841353

2022-05-19T14:30:03.022 INFO:tasks.cephfs_test_runner:======================================================================
2022-05-19T14:30:03.022 INFO:tasks.cephfs_test_runner:ERROR: test_standby (tasks.mgr.test_dashboard.TestDashboard)
2022-05-19T14:30:03.023 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-05-19T14:30:03.023 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-05-19T14:30:03.023 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_9dfe5561e7f8bbf1095613ed99b58dd72943d57a/qa/tasks/mgr/test_dashboard.py", line 62, in test_standby
2022-05-19T14:30:03.023 INFO:tasks.cephfs_test_runner:    self.wait_until_webserver_available(original_uri)
2022-05-19T14:30:03.024 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_9dfe5561e7f8bbf1095613ed99b58dd72943d57a/qa/tasks/mgr/test_dashboard.py", line 39, in wait_until_webserver_available
2022-05-19T14:30:03.024 INFO:tasks.cephfs_test_runner:    self.wait_until_true(_check_connection, timeout=30)
2022-05-19T14:30:03.024 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_9dfe5561e7f8bbf1095613ed99b58dd72943d57a/qa/tasks/ceph_test_case.py", line 196, in wait_until_true
2022-05-19T14:30:03.025 INFO:tasks.cephfs_test_runner:    raise TestTimeoutError("Timed out after {0}s".format(elapsed))
2022-05-19T14:30:03.025 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s
2022-05-19T14:30:03.025 INFO:tasks.cephfs_test_runner:
2022-05-19T14:30:03.026 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------

From a comment by Laura this has not reproduced in the latest QA runs so it could be just a flaky test. Decreasing prio (I'll keep it open for a month and close it if not happening again).


Related issues 1 (0 open1 closed)

Copied to mgr - Bug #55774: octopus: prometheus, and selftest failures due to timed-out or failed connectionsCan't reproduce

Actions
Actions #1

Updated by Ernesto Puerta almost 2 years ago

  • Category set to Testing & QA
  • Assignee set to Avan Thakkar
Actions #2

Updated by Ernesto Puerta almost 2 years ago

  • Status changed from New to Triaged
Actions #3

Updated by Laura Flores almost 2 years ago

In terms of the first failure on here, test_standby for Dashboard, I looked into the Octopus git history, and the most recent commit in qa/tasks/mgr/test_dashboard.py is this one, which makes a direct modification to test_standby: https://github.com/ceph/ceph/commit/a1c9e6de01da2daa76ec2f323065d38be80317c6.

However, the most recent Octopus QA run that did not contain these failures was http://pulpito.front.sepia.ceph.com/yuriw-2022-04-26_20:58:55-rados-wip-yuri2-testing-2022-04-26-1132-octopus-distro-default-smithi/. I checked the branch that this run is associated with (ci/wip-yuri2-testing-2022-04-26-1132-octopus), and it does contain the commit I linked above. And the tests that are now failing were succeeding. So this seems like a recent development that is not linked to the introduction of that commit.

As for the other failures, test_standby/Prometheus and test_selftest_command_spam, those look different. Maybe a problem with python3.6?

Actions #4

Updated by Ernesto Puerta almost 2 years ago

  • Copied to Bug #55774: octopus: prometheus, and selftest failures due to timed-out or failed connections added
Actions #5

Updated by Ernesto Puerta almost 2 years ago

  • Subject changed from octopus: dashboard, prometheus, and selftest failures due to timed-out or failed connections to octopus: dashboard failures due to timed-out or failed connections
  • Description updated (diff)
  • Priority changed from Immediate to Normal
Actions

Also available in: Atom PDF