Subtask #2674
Feature #2611: mon: Single-Paxos
mon: Single-Paxos: mon commits suicide after remove&add
0%
Description
Pre-conditions:
3 mons: a=127.0.0.1:6789 ; b=127.0.0.1:6790 ; c=127.0.0.1:6791
- remove 'c' with ./ceph mon remove c
- change ceph.conf, make c=127.0.0.1:1024
- add 'c' with ./ceph mon add c 127.0.0.1:1024
- start 'c' with ./ceph-mon -d -i c -c ceph.conf
on mon.c:
2012-06-29 03:02:44.890294 7f46367fc700 10 mon.c@2(probing) e1 monmap is e1: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0} 2012-06-29 03:02:44.890351 7f46367fc700 10 mon.c@2(probing) e1 got newer/committed monmap epoch 3, mine was 1 2012-06-29 03:02:44.890371 7f46367fc700 10 mon.c@2(probing) e3 bootstrap 2012-06-29 03:02:44.890375 7f46367fc700 10 mon.c@2(probing) e3 unregister_cluster_logger - not registered 2012-06-29 03:02:44.890378 7f46367fc700 10 mon.c@2(probing) e3 cancel_probe_timeout 0xf16160 2012-06-29 03:02:44.890387 7f46367fc700 0 mon.c@2(probing) e3 removed from monmap, suicide.
./ceph -s:
ubuntu@plana41:~/ceph/src$ ./ceph -s health HEALTH_WARN 24 pgs degraded; 24 pgs stuck unclean; recovery 53/106 degraded (50.000%); 1 mons down, quorum 1,2 a,b monmap e3: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:1024/0}
History
#1 Updated by Greg Farnum about 11 years ago
I believe this is intended behavior, note the last line:
2012-06-29 03:02:44.890387 7f46367fc700 0 mon.c@2(probing) e3 removed from monmap, suicide.
The monitor looks at its on-disk state and finds a cookie there that does not match what's in the MonMap, which means that (while it's in the map) it has no idea how out-of-date its store is, if there's another monitor that expects to be able to bind to this ip/port combo, etc.
My inclination is to mark this as "Rejected"; am I missing something?
#2 Updated by Sage Weil about 11 years ago
Yeah.. basically we're changing the mon's ip by removing and re-adding it, and the mon isn't smart enough to realize that it is being reborn with a new addr. I think that's okay.
If anything, this is a feature request to support online mon ip changes.
Does that make sense, Joao?
#3 Updated by Joao Eduardo Luis about 11 years ago
- Status changed from New to Rejected
Yep. Makes sense. I was afraid this was cause by my changes.
Rejecting it then.
#4 Updated by Joao Eduardo Luis about 11 years ago
- Status changed from Rejected to 4
Tried this on master. Although at first I triggered something else, the bottom line is that this works, and the monitor binds to port 1024, calls for new election and (as expected) becomes the leader.
ubuntu@plana41:~/ceph/src$ ./ceph -s health HEALTH_ERR 24 pgs stuck inactive; 24 pgs stuck unclean; no osds monmap e7: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:1024/0}, election epoch 18, quorum 0,1,2 a,b,c
#5 Updated by Joao Eduardo Luis almost 10 years ago
- Status changed from 4 to Rejected
- translation missing: en.field_remaining_hours set to 0.0
works just fine on current next