Actions
Bug #20952
closedGlitchy monitor quorum causes spurious test failure
Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
qa/standalone/mon/misc.sh failed in TEST_mon_features()
After wait_for_quorum() which had all 3 mons a, b, c at around 2017-08-09 02:56:49.835175 (epoch 6), it looks like mon.b called for new elections and only 2 mons created quorum in epoch 8.
The "ceph mon_status" only saw 2 monitors and the test failed.
# start third monitor run_mon $dir c --public-addr $MONC || return 1 wait_for_quorum 300 3 || return 1 timeout 300 ceph -s > /dev/null || return 1 jqinput="$(ceph mon_status --format=json 2>/dev/null)" # expect quorum to have all three monitors jqfilter='.quorum | length == 3' jq_success "$jqinput" "$jqfilter" || return 1
Could the mon.b calling for new monitor elections have raced with the new quorum causing another election? Look at the timestamp from mon.b 2017-08-09 02:56:44.568140 that arrived after the new quorum at 2017-08-09 02:56:49.835852 according to mon.a
2017-08-09 02:56:49.835175 7f77d8493700 10 mon.a@0(leader).log v44 logging 2017-08-09 02:56:49.658707 mon.a mon.0 127.0.0.1:7127/0 56 : cluster [INF] mon.a@0 won leader election with quorum 0,1,2 2017-08-09 02:56:49.835852 7f77d8493700 10 mon.a@0(leader).log v44 logging 2017-08-09 02:56:44.568140 mon.b mon.1 127.0.0.1:7128/0 37 : cluster [INF] mon.b calling new monitor election 2017-08-09 02:56:55.245367 7f77d8493700 10 mon.a@0(leader).log v44 logging 2017-08-09 02:56:55.092872 mon.a mon.0 127.0.0.1:7127/0 64 : cluster [INF] mon.a@0 won leader election with quorum 0,1
Updated by David Zafman over 4 years ago
Seen in final point release for Luminous:
Updated by Neha Ojha over 2 years ago
- Status changed from New to Can't reproduce
Actions