Project

General

Profile

Actions

Bug #52488

open

Pacific mon won't join Octopus mons

Added by Chris Dunlop over 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm in the midst of an upgrade from Octopus to Pacific. Due to issues with the version of docker available on Debian Buster, the "ceph orch upgrade" failed at the point of upgrading the the mons, and the situation degraded into leaving me with a single Octopus mon that wasn't able to form a quorum by itself. I manually brought up 2 more Octopus mons to get back a quorum. At that point I was wary of simply trying to upgrade the existing mons and thought to maintain full quorum by by first bringing up Pacific mons to join the existing quorum, then remove the Octopus mons.

I have 3 mons still on octopus and in quorum. When I try to bring up a new Pacific mon it stays permanently (over 10 minutes) in "probing" state with the Pacific ceph-mon chewing up >300% CPU on it's host, and causing the Octopus ceph-mons on the other hosts to chew up >100% CPU. I've left it for 10 minutes in that state and it hasn't progressed past "probing".

Per Gregory Farnum on ceph-devel, "I would expect and want this to work so you should file a bug": [[https://www.spinics.net/lists/ceph-devel/msg52192.html]]

The Pacific mon, "b5", is v16.2.5 running under podman:

docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb

The Octopus mons are all v15.2.14. The lead Octopus mon ("k2") is running under podman:

quay.io/ceph/ceph:v15

The other 2 Octopus mons ("b2" and "b4") are standalone 15.2.14-1~bpo10+1. These are manually started due to the cephadm upgrade failing at the point of upgrading the mons and leaving me with only one cephadm mon running.

On start up of the Pacific mon with debug_mon=20, over a 30 second period I see 765 "handle_probe_reply" sequences by the Pacific mon, with the lead mon replying 200 times and the other two mons replying 279 times each.

In the very first "handle_probe_reply" sequence we see an expected (I assume) "bootstrap" as the starting up mon doesn't yet have a monmap ("got newer/committed monmap epoch 35, mine was 0"):

Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0  monmap is e0: 3 mons at {noname-a=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],noname-b=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],noname-c=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e0  got newer/committed monmap epoch 35, mine was 0
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 bootstrap
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 sync_reset_requester
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5646293a0a20
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 _reset
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 timecheck_finish
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_tick_stop
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_interval_stop
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_event_cancel
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_reset
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5646293a0a20 after 10 seconds
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 probing other monitors
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 _ms_dispatch new session 0x5646293aafc0 MonSession(mon.1 [v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0] is open , features 0x3f01cfb8ffedffff (luminous)) features 0x3f01cfb8ffedffff
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700  5 mon.b5@-1(probing) e35 _ms_dispatch setting monitor caps on this connection
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 mon.b5@-1(probing) e35  entity_name  global_id 0 (none) caps allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 is_capable service=mon command= read addr v2:10.200.63.130:3300/0 on cap allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20  allow so far , doing grant allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20  allow all

However every single "handle_probe_reply" sequence after that also does a "bootstrap", even though the mon at this point has a monmap which the same epoch ("got newer/committed monmap epoch 35, mine was 35"), e.g.:

Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 handle_probe mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 handle_probe_reply mon.2 v2:10.200.63.132:3300/0 mon_probe(reply c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 name b4 quorum 0,1,2 leader 0 paxos( fc 365132587 lc 365133304 ) mon_release octopus) v7
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35  monmap is e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35  got newer/committed monmap epoch 35, mine was 35
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 bootstrap
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 sync_reset_requester
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 unregister_cluster_logger - not registered
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout 0x5646293a0a20
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 monmap e35: 3 mons at {b2=[v2:10.200.63.130:3300/0,v1:10.200.63.130:6789/0],b4=[v2:10.200.63.132:3300/0,v1:10.200.63.132:6789/0],k2=[v2:192.168.254.251:3300/0,v1:192.168.254.251:6789/0]}
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 _reset
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing).auth v0 _set_mon_num_rank num 0 rank 0
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 timecheck_finish
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_tick_stop
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 15 mon.b5@-1(probing) e35 health_interval_stop
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_event_cancel
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 scrub_reset
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 cancel_probe_timeout (none scheduled)
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 reset_probe_timeout 0x5646293a0a20 after 10 seconds
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 10 mon.b5@-1(probing) e35 probing other monitors
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 mon.b5@-1(probing) e35 _ms_dispatch existing session 0x5646293aafc0 for mon.1
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 mon.b5@-1(probing) e35  entity_name  global_id 0 (none) caps allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20 is_capable service=mon command= read addr v2:10.200.63.130:3300/0 on cap allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20  allow so far , doing grant allow *
Aug 31 07:05:01 b5 conmon[3496814]: debug 2021-08-30T21:05:01.377+0000 7f43ead11700 20  allow all

That same sequence keeps repeating, just with the "name" changing between the Octopus mons, i.e. "name b4", "name b2" and "name k2".

Referencing Monitor::handle_probe_reply() in src/mon/Monitor.cc (see also below), if we ever get past the bootstrap() we should get a message mentioning "peer", but we never see one.

That implies "mybl.contents_equal(m->monmap_bl)" is never true, and "has_ever_joined" is never true.

void Monitor::handle_probe_reply(MonOpRequestRef op)
{
...
   // newer map, or they've joined a quorum and we haven't?
   bufferlist mybl;
   monmap->encode(mybl, m->get_connection()->get_features());
   // make sure it's actually different; the checks below err toward
   // taking the other guy's map, which could cause us to loop.
   if (!mybl.contents_equal(m->monmap_bl)) {
     MonMap *newmap = new MonMap;
     newmap->decode(m->monmap_bl);
     if (m->has_ever_joined && (newmap->get_epoch() > monmap->get_epoch() ||
                                !has_ever_joined)) {
       dout(10) << " got newer/committed monmap epoch " << newmap->get_epoch()
                << ", mine was " << monmap->get_epoch() << dendl;
       delete newmap;
       monmap->decode(m->monmap_bl);
       notify_new_monmap(false);

       bootstrap();
       return;
     }
     delete newmap;
   }

   // rename peer?
   string peer_name = monmap->get_name(m->get_source_addr());
   if (monmap->get_epoch() == 0 && peer_name.compare(0, 7, "noname-") == 0) {
     dout(10) << " renaming peer " << m->get_source_addr() << " " 
              << peer_name << " -> " << m->name << " in my monmap" 
              << dendl;
     monmap->rename(peer_name, m->name);

     if (is_electing()) {
       bootstrap();
       return;
     }
   } else if (peer_name.size()) {
     dout(10) << " peer name is " << peer_name << dendl;
   } else {
     dout(10) << " peer " << m->get_source_addr() << " not in map" << dendl;
   }
...
}

Cheers,

Chris


Related issues 1 (1 open0 closed)

Related to RADOS - Bug #52489: Adding a Pacific MON to an Octopus cluster: All PGs inactiveNew

Actions
Actions #1

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #52489: Adding a Pacific MON to an Octopus cluster: All PGs inactive added
Actions #2

Updated by Sebastian Wagner over 2 years ago

  • Project changed from Orchestrator to RADOS
Actions #3

Updated by Michael Uleysky over 2 years ago

There is the same problem with migrating to Pacific from Nautilus

Actions #4

Updated by Josh Beaman about 2 years ago

Also confirming the same problem attempting to add either new Octopus or Pacific monitor to existing Nautilus cluster fails. This is a significant pain point for us, as we are stuck on ubuntu 16.04/xenial and Nautilus. Trying to upgrade to 20.04/focal and Pacific, but there are no Nautilus packages for 20.04, and no Octopus or Pacific packages for 16.04.

Actions #5

Updated by Radoslaw Zarzynski about 2 years ago

Hello!

Could you please provide the logs with extra debugs: debug_mon = 20 and debug_ms = 1 from: 1) the new, pacific monitor and 2) (ideally all, minimum from the leader) octopus mons?

Actions

Also available in: Atom PDF