Project

General

Profile

Fix #8146

test_mon_down sometimes fails when running tests against ExternalCephController

Added by Christina Meno over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Category:
Backend (services)
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

____________________________________________________________________ TestMonitoring.test_mon_down ____________________________________________________________________

self = <tests.test_monitoring.TestMonitoring testMethod=test_mon_down>

    def test_mon_down(self):
        """ 
            Check Calamari's reaction to loss of contact with
            individual mon servers in a Ceph cluster.

            - The cluster state should continue to be updated
              as long as there is a mon quorum and
              one mon is available to calamari.
            """ 
        cluster_id = self._wait_for_cluster()
        mon_fqdns = self.ceph_ctl.get_service_fqdns(cluster_id, 'mon')

        def update_time():
            return self.api.get("cluster/%s" % cluster_id).json()['update_time']

        # I don't know which if any of the mons the calamari server
        # might be preferentially accepting data from, but I want
        # to ensure that it can survive any of them going away.
        for mon_fqdn in mon_fqdns:
            self.ceph_ctl.go_dark(cluster_id, minion_id=mon_fqdn)
            last_update_time = update_time()

            # This will give a timeout exception if calamari did not
            # re establish monitoring after the mon server went offline.
            try:
                wait_until_true(lambda: last_update_time != update_time(), timeout=NEW_FAVORITE_TIMEOUT)
            except WaitTimeout:
                self.fail("Failed to recover from killing %s in %s seconds" % (
>                   mon_fqdn, NEW_FAVORITE_TIMEOUT))
E               AssertionError: Failed to recover from killing mira108.front.sepia.ceph.com in 80 seconds

tests/test_monitoring.py:165: AssertionError
-------------------------------------------------------------------------- Captured stdout ---------------------------------------------------------------------------
503: {"detail": "Cluster configuration unavailable"}

History

#1 Updated by Christina Meno over 7 years ago

  • Tracker changed from Bug to Fix
  • Assignee set to Christina Meno
  • Target version changed from v1.2 Backlog to v1.2-dev8

Initial task for dev8 is to demonstrate that this issue is a flaw in production code.

#2 Updated by Christina Meno over 7 years ago

  • Status changed from New to In Progress

#3 Updated by Christina Meno over 7 years ago

  • Status changed from In Progress to Fix Under Review

#4 Updated by Christina Meno over 7 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF