Project

General

Profile

Actions

Bug #12569

closed

"ceph mon add <mon-id>" takes forever in a one-monitor cluster

Added by Kefu Chai over 8 years ago. Updated over 8 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

in a single-monitor cluster, if user tries to add another monitor using (take the vstart cluster as an example)

$ CEPH_NUM_MDS=0 CEPH_NUM_OSD=0 CEPH_NUM_MON=1 ./vstart.sh -n -l -x
$ ./ceph auth get mon. -o /tmp/keyring
$ ./ceph mon getmap -o /tmp/monmap
$ ./ceph-mon -i b --mkfs --monmap /tmp/monmap --keyring /tmp/keyring
$ ./ceph mon add b 127.0.0.1:6790

the last command does not return at all. because after receiving the command, the existing monitor tries to form a quorum, but keeps sending probe messages to other monitors in vain. only after over n/2 peers replies it, it will be good. but it never gets the reply from the new monitor which is not yet started. so it is trapped in a dead loop until the new joiner is up and running, and is found by the existing monitor. and they finish the election and accept the proposal.


Related issues 1 (0 open1 closed)

Copied to Ceph - Documentation #12620: note the behaviour that "ceph mon add <mon-id>" takes forever in a one-monitor clusterResolvedKefu Chai08/03/2015

Actions
Actions #1

Updated by Kefu Chai over 8 years ago

probably we can live with this, but instead, add a note to the document that the "ceph mon add" command will not return until the monitors form a quorum.

should also update document to change the order of

 ceph mon add <mon-id> <ip>[:<port>]

and
 ceph-mon -i {mon-id} --public-addr {ip:port}

see http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-monitors

Actions #2

Updated by Joao Eduardo Luis over 8 years ago

  • Category set to Monitor
  • Status changed from 12 to Won't Fix

This is expected behavior. There is little we can do without resorting to hackish behavior on update_from_paxos() and bootstrap, and I think the benefits of said hack do not outweigh the ugliness of the solution.

I do think this needs to be documented though, but different tickets should be filed for that though.

Actions

Also available in: Atom PDF