Bug #12569
closed"ceph mon add <mon-id>" takes forever in a one-monitor cluster
0%
Description
in a single-monitor cluster, if user tries to add another monitor using (take the vstart cluster as an example)
$ CEPH_NUM_MDS=0 CEPH_NUM_OSD=0 CEPH_NUM_MON=1 ./vstart.sh -n -l -x $ ./ceph auth get mon. -o /tmp/keyring $ ./ceph mon getmap -o /tmp/monmap $ ./ceph-mon -i b --mkfs --monmap /tmp/monmap --keyring /tmp/keyring $ ./ceph mon add b 127.0.0.1:6790
the last command does not return at all. because after receiving the command, the existing monitor tries to form a quorum, but keeps sending probe messages to other monitors in vain. only after over n/2 peers replies it, it will be good. but it never gets the reply from the new monitor which is not yet started. so it is trapped in a dead loop until the new joiner is up and running, and is found by the existing monitor. and they finish the election and accept the proposal.
Updated by Kefu Chai over 8 years ago
probably we can live with this, but instead, add a note to the document that the "ceph mon add" command will not return until the monitors form a quorum.
should also update document to change the order of
ceph mon add <mon-id> <ip>[:<port>]
and
ceph-mon -i {mon-id} --public-addr {ip:port}
see http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-monitors
Updated by Joao Eduardo Luis over 8 years ago
- Category set to Monitor
- Status changed from 12 to Won't Fix
This is expected behavior. There is little we can do without resorting to hackish behavior on update_from_paxos() and bootstrap, and I think the benefits of said hack do not outweigh the ugliness of the solution.
I do think this needs to be documented though, but different tickets should be filed for that though.