Project

General

Profile

Bug #18859

kraken monitor fails to bootstrap off jewel monitors if it has booted before

Added by Kjetil Joergensen over 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Category:
Administration/Usability
Target version:
-
Start date:
02/08/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:

Description

To reproduce; bootstrap a quorum off of jewel. Stop one of the monitors, remove it's filesystem contents, re-create initial monmap/inject keyring/mkfs with kraken, attempt to start with kraken, and it'll forever

This is relatively close to what our current install/deployment pipeline does, wipe/reinstall machine (which has served us really well until now, going firefly -> hammer -> jewel).

What seems to be what we're running into is that after initial bootstrapping, the encoded bufferlists for jewel and kraken differs, they're at the same epoch, but being a "half new" monitor we have has_ever_joined=0, so we end up doing bootstrap() forever.

  ...
  if (!mybl.contents_equal(m->monmap_bl)) { // binary representation of kraken vs. jewel monmap is different
    MonMap *newmap = new MonMap;
    newmap->decode(m->monmap_bl);
    if (m->has_ever_joined && (newmap->get_epoch() > monmap->get_epoch() || // First pass, epoch = -1, bootstrap() again
                                                                            // Second pass, newmap->get_epoch() == monmap->get_epoch()
                               !has_ever_joined)) { // however, we used to be joined before our state vanished, so, we'll
                                                    // go ahead and bootstrap again, and again, and again.
      ...
      bootstrap();
      ...
2017-02-07 22:15:35.150925 7f3f4fb03840 10 mon.c@-1(probing) e0 has_ever_joined = 0
2017-02-07 22:15:35.150927 7f3f4fb03840  1 mon.c@-1(probing) e0  initial_members a,b,d, filtering seed monmap
2017-02-07 22:15:35.150946 7f3f4fb03840 10 mon.c@-1(probing) e0  monmap is e0: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,d=0.0.0.0:0/1}
2017-02-07 22:15:35.150950 7f3f4fb03840 10 mon.c@-1(probing) e0  extra probe peers 127.0.0.1:6791/0
...
2017-02-07 22:15:35.152688 7f3f483e5700 10 mon.c@-1(probing) e0  monmap is e0: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,d=0.0.0.0:0/1}
2017-02-07 22:15:35.152701 7f3f483e5700 10 mon.c@-1(probing) e0  got newer/committed monmap epoch 3, mine was 0
2017-02-07 22:15:35.152711 7f3f483e5700 10 mon.c@-1(probing) e3 bootstrap
...
2017-02-07 22:15:35.153653 7f3f483e5700 10 mon.c@2(probing) e3  monmap is e3: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0}
2017-02-07 22:15:35.153663 7f3f483e5700 10 mon.c@2(probing) e3  got newer/committed monmap epoch 3, mine was 3
2017-02-07 22:15:35.153667 7f3f483e5700 10 mon.c@2(probing) e3 bootstrap
...
2017-02-07 22:15:35.154031 7f3f483e5700 10 mon.c@2(probing) e3  monmap is e3: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0}
2017-02-07 22:15:35.154045 7f3f483e5700 10 mon.c@2(probing) e3  got newer/committed monmap epoch 3, mine was 3
2017-02-07 22:15:35.154063 7f3f483e5700 10 mon.c@2(probing) e3 bootstrap
...
repeat....
...

Patched to essentially decode/encode the received monmap with kraken giving the binary/bufferlist comparison a chance to succeed.

2017-02-07 22:21:36.684955 7fec6f077840 10 mon.c@-1(probing) e0 has_ever_joined = 0
2017-02-07 22:21:36.684957 7fec6f077840  1 mon.c@-1(probing) e0  initial_members a,b,d, filtering seed monmap
2017-02-07 22:21:36.684974 7fec6f077840 10 mon.c@-1(probing) e0  monmap is e0: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,d=0.0.0.0:0/1}
2017-02-07 22:21:36.684977 7fec6f077840 10 mon.c@-1(probing) e0  extra probe peers 127.0.0.1:6791/0
...
2017-02-07 22:21:36.689239 7fec67959700 10 mon.c@-1(probing) e0  monmap is e0: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,d=0.0.0.0:0/1}
2017-02-07 22:21:36.689259 7fec67959700 10 mon.c@-1(probing) e0  got newer/committed monmap epoch 3, mine was 0
2017-02-07 22:21:36.689263 7fec67959700 10 mon.c@-1(probing) e3 bootstrap
...
2017-02-07 22:21:36.690486 7fec67959700 10 mon.c@2(probing) e3  monmap is e3: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0}
2017-02-07 22:21:36.690507 7fec67959700 10 mon.c@2(probing) e3  peer name is a
2017-02-07 22:21:36.690517 7fec67959700 10 mon.c@2(probing) e3  peer paxos last version 61 vs my version 0 (too far ahead)
...
transition to synchronizing
...

I don't know if you depend on the binary comparison to for reasons other than "yep, it's got the same monitors/features and election epoch".

diff --git a/src/mon/Monitor.cc b/src/mon/Monitor.cc
index 4b76b9e..a3502a8 100644
--- a/src/mon/Monitor.cc
+++ b/src/mon/Monitor.cc
@@ -1842,12 +1842,14 @@ void Monitor::handle_probe_reply(MonOpRequestRef op)

   // newer map, or they've joined a quorum and we haven't?
   bufferlist mybl;
+  MonMap *newmap = new MonMap;
+  bufferlist otherbl;
   monmap->encode(mybl, m->get_connection()->get_features());
+  newmap->decode(m->monmap_bl);
+  newmap->encode(otherbl, m->get_connection()->get_features());
   // make sure it's actually different; the checks below err toward
   // taking the other guy's map, which could cause us to loop.
-  if (!mybl.contents_equal(m->monmap_bl)) {
-    MonMap *newmap = new MonMap;
-    newmap->decode(m->monmap_bl);
+  if (!mybl.contents_equal(otherbl)) {
     if (m->has_ever_joined && (newmap->get_epoch() > monmap->get_epoch() ||
                               !has_ever_joined)) {
       dout(10) << " got newer/committed monmap epoch " << newmap->get_epoch()
@@ -1858,8 +1860,8 @@ void Monitor::handle_probe_reply(MonOpRequestRef op)
       bootstrap();
       return;
     }
-    delete newmap;
   }
+  delete newmap;

   // rename peer?
   string peer_name = monmap->get_name(m->get_source_addr());

History

#1 Updated by Joao Eduardo Luis over 2 years ago

  • Assignee set to Joao Eduardo Luis

#2 Updated by Greg Farnum over 2 years ago

  • Project changed from Ceph to RADOS
  • Category changed from Monitor to Administration/Usability
  • Component(RADOS) Monitor added

#3 Updated by Kjetil Joergensen almost 2 years ago

This is also the case for going from jewel to luminous as well.

Our question: Is this something you're planning to fix or won't fix ? If you're planning to fix it, is there a general timeline on when ? This is mostly so we can plan accordingly for upgrading from jewel to luminous.

The patch in the comment is a gigantic ugly hack, is this something that would be accepted ?

#4 Updated by Joao Eduardo Luis almost 2 years ago

Yikes. A 9 month old ticket. I'm sorry. This must have fallen through all the cracks.

Let me take a look this week, time permitting given I'm travelling, and I'll get back to you next week at the latest.

Also available in: Atom PDF