Actions
Bug #43584
closedMON_DOWN during mon_join process
% Done:
100%
Source:
Development
Tags:
backport_processed
Backport:
pacific, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/sage-2020-01-12_21:37:03-rados-wip-sage-testing-2020-01-12-0621-distro-basic-smithi/4660691
2020-01-12T22:21:05.302+0000 7f1744ff9700 1 -- v1:172.21.15.168:6789/0 send_to--> mon v1:172.21.15.173:6789/0 -- mon_join(b v1:172.21.15.168:6789/0) v2 -- ?+0 0x557b302ac240
the leader bootstraps,
2020-01-12T22:21:05.309+0000 7f1744ff9700 1 -- v1:172.21.15.168:6789/0 <== mon.0 v1:172.21.15.173:6789/0 6 ==== election(84f14cf6-3589-11ea-99da-001a4aab830c propose rel 15 e11) v8 ==== 353+0+0 (unknown 1008848899 0 0) 0x557b30148c00 con 0x557b2f1f7180
then the joiner bootstraps,
2020-01-12T22:21:07.302+0000 7f17477fe700 10 mon.b@-1(probing) e2 bootstrap
and gets the new monmap and bootstraps again,
2020-01-12T22:21:07.302+0000 7f1744ff9700 10 mon.b@-1(probing) e2 handle_probe mon_probe(reply 84f14cf6-3589-11ea-99da-001a4aab830c name c paxos( fc 1 lc 114 ) mon_release octopus) v7 2020-01-12T22:21:07.302+0000 7f1744ff9700 10 mon.b@-1(probing) e2 handle_probe_reply mon.1 v1:172.21.15.173:6790/0 mon_probe(reply 84f14cf6-3589-11ea-99da-001a4aab830c name c paxos( fc 1 lc 114 ) mon_release octopus) v7 2020-01-12T22:21:07.302+0000 7f1744ff9700 10 mon.b@-1(probing) e2 monmap is e2: 2 mons at {a=v1:172.21.15.173:6789/0,c=v1:172.21.15.173:6790/0} 2020-01-12T22:21:07.302+0000 7f1744ff9700 10 mon.b@-1(probing) e2 got newer/committed monmap epoch 3, mine was 2 2020-01-12T22:21:07.302+0000 7f1744ff9700 10 mon.b@-1(probing) e3 bootstrap
but misses out on the first election cycle, resulting in a MON_DOWN from the leader.
Actions