Project

General

Profile

Bug #23403

Mon cannot join quorum

Added by Gauvain Pocentek 9 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
-
Target version:
Start date:
03/19/2018
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

Hi all,

On a 3-mon cluster running infernalis one of the mon left the quorum and we are unable to make it come back (it is in the monmap though). The quorum keeps looping into 'electing' mode. The cluster is running OK with 2 mons now, but it's obviously not the best situation.

We have checked and tried several things:

  • there is no clock skew
  • network connections are OK (tested connections between all the nodes, both ways)
  • we completely removed the mon from the monmap, and re-created it (ceph-mon mkfs), on the same server
  • we also tried to add an additional mon, it failed to join the quorum as well

We upgraded to jewel (the upgrade was scheduled) to see if it would help: it didn't.

We've enabled debug logs for mon and paxos (files attached, they are probably too large but we're not sure what is relevant and what isn't). controller02 is the mon that can't join the quorum: logs start after a complete reset (ceph-mon mkfs) and process startup, and end after removal of this mon from the monmap.

At this point we don't really know what could be the problem, or if this is a bug. Let us know if you need more information.

Thanks for any help/advice.

ceph-mon.controller02.log.bz2 (13.6 KB) Gauvain Pocentek, 03/19/2018 10:24 AM

ceph-mon.controller03.log.bz2 (36.5 KB) Gauvain Pocentek, 03/19/2018 10:25 AM

ceph-mon.controller01.log.bz2 (113 KB) Gauvain Pocentek, 03/19/2018 10:25 AM

controller03-quorum_status.log View (924 Bytes) Julien Lavesque, 03/29/2018 09:14 AM

controller03-mon_status.log View (959 Bytes) Julien Lavesque, 03/29/2018 09:14 AM

controller01-mon_status.log View (959 Bytes) Julien Lavesque, 03/29/2018 09:14 AM

controller01-quorum_status.log View (834 Bytes) Julien Lavesque, 03/29/2018 09:14 AM

controller02-quorum_status.log View (834 Bytes) Julien Lavesque, 03/29/2018 09:14 AM

controller02-mon_status.log View (925 Bytes) Julien Lavesque, 03/29/2018 09:14 AM

History

#1 Updated by John Spray 9 months ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)

#2 Updated by Julien Lavesque 9 months ago

Hi all,

As asked on the ceph-users mailing list, here are the results of the following commands on the 3 monitors:

$ ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok mon_status
$ ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok quorum_status

There seems to have some inconsistencies in quorum_status on the 2 working controllers (1 & 3)

#3 Updated by Brad Hubbard 8 months ago

  • Status changed from New to Triaged
  • Assignee set to Brad Hubbard
  • Priority changed from Normal to Low
  • Source set to Community (user)
2018-03-19 11:03:50.819493 7f842ed47640  0 mon.controller02 does not exist in monmap, will attempt to join an existing cluster                                                                                                              
2018-03-19 11:03:50.820323 7f842ed47640  0 starting mon.controller02 rank -1 at 172.18.8.6:6789/0 mon_data /var/lib/ceph/mon/ceph-controller02 fsid f37f31b1-92c5-47c8-9834-1757a677d020

We are called 'mon.controller02' and we can not find our name in the local copy of the monmap.

2018-03-19 11:03:52.346318 7f842735d700 10 mon.controller02@-1(probing) e68  ready to join, but i'm not in the monmap or my addr is blank, trying to join

Our name is not in the copy of the monmap we got from peer controller01 either.

$ cat ../controller02-mon_status.log                                                        
[root@controller02 ~]# ceph --admin-daemon /var/run/ceph/ceph-mon.controller02.asok mon_status                      
{                                                        
    "name": "controller02",                              
    "rank": 1,                                            
    "state": "electing",                                  
    "election_epoch": 32749,                              
    "quorum": [],                                        
    "outside_quorum": [],                                
    "extra_probe_peers": [],                              
    "sync_provider": [],                                  
    "monmap": {                                          
        "epoch": 71,                                      
        "fsid": "f37f31b1-92c5-47c8-9834-1757a677d020",  
        "modified": "2018-03-29 10:48:06.371157",        
        "created": "0.000000",                            
        "mons": [                                        
            {                                            
                "rank": 0,                                
                "name": "controller01",                  
                "addr": "172.18.8.5:6789\/0"              
            },                                            
            {                                            
                "rank": 1,                                
                "name": "controller02",                  
                "addr": "172.18.8.6:6789\/0"              
            },                                            
            {                                            
                "rank": 2,                                
                "name": "controller03",                  
                "addr": "172.18.8.7:6789\/0"              
            }                                            
        ]                                                
    }                                                    
}

In the monmaps we are called 'controller02', not 'mon.controller02'. These names need to be identical.

#4 Updated by Brad Hubbard 8 months ago

My apologies. It appears my previous analysis was incorrect.

I've pored over the logs and it appears the issue is something to do with the network between the mons.

$ grep "fault, initiating reconnect" ceph-mon.controller0?.log
ceph-mon.controller01.log:2018-03-19 11:03:52.349494 7f691e4cc700 0 -- 172.18.8.5:6789/0 >> 172.18.8.7:6789/0 pipe(0x7f695ca55400 sd=323 :55894 s=2 pgs=67368 cs=1 l=0 c=0x7f695c154880).fault, initiating reconnect
ceph-mon.controller01.log:2018-03-19 11:03:54.347077 7f6918169700 0 -- 172.18.8.5:6789/0 >> 172.18.8.6:6789/0 pipe(0x7f695ea2e800 sd=366 :6789 s=2 pgs=296 cs=1 l=0 c=0x7f695c154400).fault, initiating reconnect
ceph-mon.controller03.log:2018-03-19 11:03:54.346721 7f1b6b971700 0 -- 172.18.8.7:6789/0 >> 172.18.8.6:6789/0 pipe(0x7f1b8a2bf400 sd=24 :55468 s=2 pgs=299 cs=1 l=0 c=0x7f1b87953780).fault, initiating reconnect

$ grep RESETSESSION ceph-mon.controller0?.log
ceph-mon.controller01.log:2018-03-19 11:03:54.348257 7f6936130700 0 -- 172.18.8.5:6789/0 >> 172.18.8.6:6789/0 pipe(0x7f695ea2e800 sd=366 :48456 s=4 pgs=296 cs=0 l=0 c=0x7f695c154400).connect got RESETSESSION but no longer connecting
ceph-mon.controller02.log:2018-03-19 11:03:54.347650 7f842222a700 0 -- 172.18.8.6:6789/0 >> 172.18.8.5:6789/0 pipe(0x7f843a658000 sd=8 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f843a4b9a80).accept we reset (peer sent cseq 2, 0x7f843a659400.cseq = 0), sending RESETSESSION
ceph-mon.controller02.log:2018-03-19 11:03:54.347854 7f8422129700 0 -- 172.18.8.6:6789/0 >> 172.18.8.7:6789/0 pipe(0x7f843aae8000 sd=22 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f843a4b9900).accept we reset (peer sent cseq 2, 0x7f843ae4e000.cseq = 0), sending RESETSESSION
ceph-mon.controller03.log:2018-03-19 11:03:54.348255 7f1b6934b700 0 -- 172.18.8.7:6789/0 >> 172.18.8.6:6789/0 pipe(0x7f1b8a2bf400 sd=8 :55844 s=4 pgs=299 cs=0 l=0 c=0x7f1b87953780).connect got RESETSESSION but no longer connecting

These seem to be causing the election timer to expire and therefore controller02 does not get to join the quorum.

Can you double check your network please?

#5 Updated by Brad Hubbard 8 months ago

  • Status changed from Triaged to Need More Info

#6 Updated by Gauvain Pocentek 8 months ago

Thanks for the investigation Brad.

The "fault, initiating reconnect" and "RESETSESSION" messages only appear when we start or stop the "controller02" mon. But the mon clearly starts communicating with the other ones (verified with a tcpdump/wireshark trace).

We have some networking issues though (too many TCP retransmissions) so we are still investigating that part as well.

#7 Updated by Gauvain Pocentek 8 months ago

After more investigation we discovered that one of the bonds on the machine was not behaving properly. We removed the broken interface and the mon joined the quorum.

Thanks a lot for the help on this problem! This issue can be closed (not sure about the protocol).

#8 Updated by Brad Hubbard 8 months ago

  • Status changed from Need More Info to Closed

Thanks for letting us know.

Also available in: Atom PDF