Project

General

Profile

Actions

Subtask #2674

closed

Feature #2611: mon: Single-Paxos

mon: Single-Paxos: mon commits suicide after remove&add

Added by Joao Eduardo Luis almost 12 years ago. Updated over 10 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Pre-conditions:

3 mons: a=127.0.0.1:6789 ; b=127.0.0.1:6790 ; c=127.0.0.1:6791

  • remove 'c' with ./ceph mon remove c
  • change ceph.conf, make c=127.0.0.1:1024
  • add 'c' with ./ceph mon add c 127.0.0.1:1024
  • start 'c' with ./ceph-mon -d -i c -c ceph.conf

on mon.c:

2012-06-29 03:02:44.890294 7f46367fc700 10 mon.c@2(probing) e1  monmap is e1: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0}
2012-06-29 03:02:44.890351 7f46367fc700 10 mon.c@2(probing) e1  got newer/committed monmap epoch 3, mine was 1
2012-06-29 03:02:44.890371 7f46367fc700 10 mon.c@2(probing) e3 bootstrap
2012-06-29 03:02:44.890375 7f46367fc700 10 mon.c@2(probing) e3 unregister_cluster_logger - not registered
2012-06-29 03:02:44.890378 7f46367fc700 10 mon.c@2(probing) e3 cancel_probe_timeout 0xf16160
2012-06-29 03:02:44.890387 7f46367fc700  0 mon.c@2(probing) e3  removed from monmap, suicide.

./ceph -s:

ubuntu@plana41:~/ceph/src$ ./ceph -s
   health HEALTH_WARN 24 pgs degraded; 24 pgs stuck unclean; recovery 53/106 degraded (50.000%); 1 mons down, quorum 1,2 a,b
   monmap e3: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:1024/0}

Actions #1

Updated by Greg Farnum almost 12 years ago

I believe this is intended behavior, note the last line:

2012-06-29 03:02:44.890387 7f46367fc700  0 mon.c@2(probing) e3  removed from monmap, suicide.

The monitor looks at its on-disk state and finds a cookie there that does not match what's in the MonMap, which means that (while it's in the map) it has no idea how out-of-date its store is, if there's another monitor that expects to be able to bind to this ip/port combo, etc.

My inclination is to mark this as "Rejected"; am I missing something?

Actions #2

Updated by Sage Weil almost 12 years ago

Yeah.. basically we're changing the mon's ip by removing and re-adding it, and the mon isn't smart enough to realize that it is being reborn with a new addr. I think that's okay.

If anything, this is a feature request to support online mon ip changes.

Does that make sense, Joao?

Actions #3

Updated by Joao Eduardo Luis almost 12 years ago

  • Status changed from New to Rejected

Yep. Makes sense. I was afraid this was cause by my changes.

Rejecting it then.

Actions #4

Updated by Joao Eduardo Luis almost 12 years ago

  • Status changed from Rejected to 4

Tried this on master. Although at first I triggered something else, the bottom line is that this works, and the monitor binds to port 1024, calls for new election and (as expected) becomes the leader.

ubuntu@plana41:~/ceph/src$ ./ceph -s
   health HEALTH_ERR 24 pgs stuck inactive; 24 pgs stuck unclean; no osds
   monmap e7: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:1024/0}, election epoch 18, quorum 0,1,2 a,b,c
Actions #5

Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from 4 to Rejected
  • Translation missing: en.field_remaining_hours set to 0.0

works just fine on current next

Actions

Also available in: Atom PDF