Bug #18859
closedkraken monitor fails to bootstrap off jewel monitors if it has booted before
0%
Description
To reproduce; bootstrap a quorum off of jewel. Stop one of the monitors, remove it's filesystem contents, re-create initial monmap/inject keyring/mkfs with kraken, attempt to start with kraken, and it'll forever
This is relatively close to what our current install/deployment pipeline does, wipe/reinstall machine (which has served us really well until now, going firefly -> hammer -> jewel).
What seems to be what we're running into is that after initial bootstrapping, the encoded bufferlists for jewel and kraken differs, they're at the same epoch, but being a "half new" monitor we have has_ever_joined=0, so we end up doing bootstrap() forever.
...
if (!mybl.contents_equal(m->monmap_bl)) { // binary representation of kraken vs. jewel monmap is different
MonMap *newmap = new MonMap;
newmap->decode(m->monmap_bl);
if (m->has_ever_joined && (newmap->get_epoch() > monmap->get_epoch() || // First pass, epoch = -1, bootstrap() again
// Second pass, newmap->get_epoch() == monmap->get_epoch()
!has_ever_joined)) { // however, we used to be joined before our state vanished, so, we'll
// go ahead and bootstrap again, and again, and again.
...
bootstrap();
...
2017-02-07 22:15:35.150925 7f3f4fb03840 10 mon.c@-1(probing) e0 has_ever_joined = 0 2017-02-07 22:15:35.150927 7f3f4fb03840 1 mon.c@-1(probing) e0 initial_members a,b,d, filtering seed monmap 2017-02-07 22:15:35.150946 7f3f4fb03840 10 mon.c@-1(probing) e0 monmap is e0: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,d=0.0.0.0:0/1} 2017-02-07 22:15:35.150950 7f3f4fb03840 10 mon.c@-1(probing) e0 extra probe peers 127.0.0.1:6791/0 ... 2017-02-07 22:15:35.152688 7f3f483e5700 10 mon.c@-1(probing) e0 monmap is e0: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,d=0.0.0.0:0/1} 2017-02-07 22:15:35.152701 7f3f483e5700 10 mon.c@-1(probing) e0 got newer/committed monmap epoch 3, mine was 0 2017-02-07 22:15:35.152711 7f3f483e5700 10 mon.c@-1(probing) e3 bootstrap ... 2017-02-07 22:15:35.153653 7f3f483e5700 10 mon.c@2(probing) e3 monmap is e3: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0} 2017-02-07 22:15:35.153663 7f3f483e5700 10 mon.c@2(probing) e3 got newer/committed monmap epoch 3, mine was 3 2017-02-07 22:15:35.153667 7f3f483e5700 10 mon.c@2(probing) e3 bootstrap ... 2017-02-07 22:15:35.154031 7f3f483e5700 10 mon.c@2(probing) e3 monmap is e3: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0} 2017-02-07 22:15:35.154045 7f3f483e5700 10 mon.c@2(probing) e3 got newer/committed monmap epoch 3, mine was 3 2017-02-07 22:15:35.154063 7f3f483e5700 10 mon.c@2(probing) e3 bootstrap ... repeat.... ...
Patched to essentially decode/encode the received monmap with kraken giving the binary/bufferlist comparison a chance to succeed.
2017-02-07 22:21:36.684955 7fec6f077840 10 mon.c@-1(probing) e0 has_ever_joined = 0 2017-02-07 22:21:36.684957 7fec6f077840 1 mon.c@-1(probing) e0 initial_members a,b,d, filtering seed monmap 2017-02-07 22:21:36.684974 7fec6f077840 10 mon.c@-1(probing) e0 monmap is e0: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,d=0.0.0.0:0/1} 2017-02-07 22:21:36.684977 7fec6f077840 10 mon.c@-1(probing) e0 extra probe peers 127.0.0.1:6791/0 ... 2017-02-07 22:21:36.689239 7fec67959700 10 mon.c@-1(probing) e0 monmap is e0: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,d=0.0.0.0:0/1} 2017-02-07 22:21:36.689259 7fec67959700 10 mon.c@-1(probing) e0 got newer/committed monmap epoch 3, mine was 0 2017-02-07 22:21:36.689263 7fec67959700 10 mon.c@-1(probing) e3 bootstrap ... 2017-02-07 22:21:36.690486 7fec67959700 10 mon.c@2(probing) e3 monmap is e3: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0} 2017-02-07 22:21:36.690507 7fec67959700 10 mon.c@2(probing) e3 peer name is a 2017-02-07 22:21:36.690517 7fec67959700 10 mon.c@2(probing) e3 peer paxos last version 61 vs my version 0 (too far ahead) ... transition to synchronizing ...
I don't know if you depend on the binary comparison to for reasons other than "yep, it's got the same monitors/features and election epoch".
diff --git a/src/mon/Monitor.cc b/src/mon/Monitor.cc
index 4b76b9e..a3502a8 100644
--- a/src/mon/Monitor.cc
+++ b/src/mon/Monitor.cc
@@ -1842,12 +1842,14 @@ void Monitor::handle_probe_reply(MonOpRequestRef op)
// newer map, or they've joined a quorum and we haven't?
bufferlist mybl;
+ MonMap *newmap = new MonMap;
+ bufferlist otherbl;
monmap->encode(mybl, m->get_connection()->get_features());
+ newmap->decode(m->monmap_bl);
+ newmap->encode(otherbl, m->get_connection()->get_features());
// make sure it's actually different; the checks below err toward
// taking the other guy's map, which could cause us to loop.
- if (!mybl.contents_equal(m->monmap_bl)) {
- MonMap *newmap = new MonMap;
- newmap->decode(m->monmap_bl);
+ if (!mybl.contents_equal(otherbl)) {
if (m->has_ever_joined && (newmap->get_epoch() > monmap->get_epoch() ||
!has_ever_joined)) {
dout(10) << " got newer/committed monmap epoch " << newmap->get_epoch()
@@ -1858,8 +1860,8 @@ void Monitor::handle_probe_reply(MonOpRequestRef op)
bootstrap();
return;
}
- delete newmap;
}
+ delete newmap;
// rename peer?
string peer_name = monmap->get_name(m->get_source_addr());
Updated by Joao Eduardo Luis about 7 years ago
- Assignee set to Joao Eduardo Luis
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category changed from Monitor to Administration/Usability
- Component(RADOS) Monitor added
Updated by Kjetil Joergensen over 6 years ago
This is also the case for going from jewel to luminous as well.
Our question: Is this something you're planning to fix or won't fix ? If you're planning to fix it, is there a general timeline on when ? This is mostly so we can plan accordingly for upgrading from jewel to luminous.
The patch in the comment is a gigantic ugly hack, is this something that would be accepted ?
Updated by Joao Eduardo Luis over 6 years ago
Yikes. A 9 month old ticket. I'm sorry. This must have fallen through all the cracks.
Let me take a look this week, time permitting given I'm travelling, and I'll get back to you next week at the latest.