Project

General

Profile

Actions

Bug #20952

closed

Glitchy monitor quorum causes spurious test failure

Added by David Zafman over 6 years ago. Updated over 2 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

qa/standalone/mon/misc.sh failed in TEST_mon_features()

http://qa-proxy.ceph.com/teuthology/dzafman-2017-08-08_14:23:00-rados-wip-zafman-testing2-distro-basic-smithi/1498396/teuthology.log

After wait_for_quorum() which had all 3 mons a, b, c at around 2017-08-09 02:56:49.835175 (epoch 6), it looks like mon.b called for new elections and only 2 mons created quorum in epoch 8.
The "ceph mon_status" only saw 2 monitors and the test failed.

    # start third monitor
    run_mon $dir c --public-addr $MONC || return 1

    wait_for_quorum 300 3 || return 1

    timeout 300 ceph -s > /dev/null || return 1

    jqinput="$(ceph mon_status --format=json 2>/dev/null)" 
    # expect quorum to have all three monitors
    jqfilter='.quorum | length == 3'
    jq_success "$jqinput" "$jqfilter" || return 1

Could the mon.b calling for new monitor elections have raced with the new quorum causing another election? Look at the timestamp from mon.b 2017-08-09 02:56:44.568140 that arrived after the new quorum at 2017-08-09 02:56:49.835852 according to mon.a

2017-08-09 02:56:49.835175 7f77d8493700 10 mon.a@0(leader).log v44  logging 2017-08-09 02:56:49.658707 mon.a mon.0 127.0.0.1:7127/0 56 : cluster [INF] mon.a@0 won leader election with quorum 0,1,2
2017-08-09 02:56:49.835852 7f77d8493700 10 mon.a@0(leader).log v44  logging 2017-08-09 02:56:44.568140 mon.b mon.1 127.0.0.1:7128/0 37 : cluster [INF] mon.b calling new monitor election
2017-08-09 02:56:55.245367 7f77d8493700 10 mon.a@0(leader).log v44  logging 2017-08-09 02:56:55.092872 mon.a mon.0 127.0.0.1:7127/0 64 : cluster [INF] mon.a@0 won leader election with quorum 0,1
Actions #2

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions #3

Updated by Sage Weil almost 3 years ago

  • Project changed from Ceph to RADOS
Actions #4

Updated by Neha Ojha over 2 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF