Actions
Bug #22846
closed"Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log with msgr-failures/fastclose.yaml
Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/kchai-2018-01-31_01:48:16-rados-wip-kefu-testing-2018-01-31-0034-distro-basic-mira/2130028
i am not sure it's caused by the fastclose.yaml setting. mon.b failed to respond to mon.a 's paxos(begin) message in a timely manner. and also was unable to rejoin the quorum in 15 seconds. it kept trying to start the election, and didn't respond to mon.a 's probe message. mon.a was the leader before the election was started.
2018-01-31 10:11:24.574 7f894510d700 10 mon.a@0(leader).paxos(paxos updating c 1..164) sending begin to mon.1 2018-01-31 10:11:24.574 7f894510d700 10 mon.a@0(leader).paxos(paxos updating c 1..164) sending begin to mon.2 ... 2018-01-31 10:11:24.578 7f8942908700 1 -- 172.21.6.138:6789/0 <== mon.2 172.21.6.138:6790/0 489 ==== paxos(accept lc 164 fc 0 pn 300 opn 0) v4 ==== 84+0+0 (4169294484 0 0) 0x556113b51c00 con 0x556113faf500 ... 2018-01-31 10:11:33.826 7f8942908700 1 -- 172.21.6.138:6789/0 <== mon.1 172.21.7.104:6789/0 648 ==== paxos(accept lc 164 fc 0 pn 300 opn 0) v4 ==== 84+0+0 (3515085151 0 0) 0x55611429f900 con 0x556113faee00
Actions