Project

General

Profile

Bug #18920

ceph.restart.wait-for-healthy races with MONs

Added by Nathan Cutler 6 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
02/13/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

In a test that upgrades first MONs and then OSDs from hammer to jewel, when the last OSD restarts the MONs are expected to start complaining like this:

HEALTH_WARN all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set

However, in some runs the cluster reaches HEALTH_OK after restarting the OSD, presumably because it takes the MONs some time to realize that the last OSD is running jewel. The test then passes (false positive).

Example run with correct result: http://pulpito.ceph.com/teuthology-2017-02-13_18:15:01-upgrade:hammer-x-jewel-distro-basic-vps/810293/

Example run with false positive: http://pulpito.ceph.com/teuthology-2017-02-07_18:15:23-upgrade:hammer-x-jewel-distro-basic-vps/796165/

Note that the test YAML itself is fixed by https://github.com/ceph/ceph/pull/13404 - this issue is for fixing ceph.restart (probably by adding a sleep/delay to give the MONs time to update before starting to check for HEALTH_OK)

Also available in: Atom PDF