Project

General

Profile

Bug #3550

mon: Ceph fails to work when IP address is changed on the host

Added by Anonymous over 11 years ago. Updated about 11 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
-
% Done:

0%

Spent time:
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I had an incident, where my DHCP server changed all the IP addresses of the nodes in my ceph cluster.

I looked up the new ip addresses that were assigned, and changed the /etc/ceph/ceph.conf file to reflect the new information. I then stopped and restarted Ceph on each node. (3 nodes).

I have found that the monmap has the IP addresses hard coded in them, and you have to remove and re-add each monitor to the monmap by hand.
every OSD also got thrown out of the cluster. I had to manually add them back into the cluster and reweight each one in the crush map.

We need a clean way for the node to change ip addresses. I would expect that if the conf file has changed, that the maps all read the conf file on a startup and update their data. Why is this not happening?

This happened on two different VM clusters in the Sunnyvale office (network outage).
These steps are hazardous for a user to have to do when moving a cluster to another data center or in the case of a network failure like we had here.

ceph.conf View - configuration file from damaged cell (1.33 KB) Anonymous, 11/28/2012 02:33 PM

History

#1 Updated by Joao Eduardo Luis over 11 years ago

  • Category set to Monitor
  • Status changed from New to 12
  • Assignee set to Joao Eduardo Luis

This has been an recurrent issue among users on IRC, although it's often prompted by misunderstanding of how the monitor cluster works and its tight relationship with the monmap and its ip:port.

Granted, this tight relationship may prove problematic when the necessity to move the monitors to different ips arise, specially if it isn't done the right way. However, the strict enforcement of ip:port on the monmap is also crucial to avoid other mistakes such as bringing up multiple monitors with the same id on different machines, which could prove to be problematic due to clashes.

I agree that something should be done to facilitate recovery in such circumstances as the ones described in the issue, or even plain migration, but I believe it should be done in a much more explicit way than just updating ceph.conf.

Input and ideas are welcome.

#2 Updated by Joao Eduardo Luis over 11 years ago

  • Subject changed from Ceph fails to work when IP address is changed on the host to mon: Ceph fails to work when IP address is changed on the host

#3 Updated by Sage Weil over 11 years ago

  • Status changed from 12 to Won't Fix

Right. There is a manual process for adjusting monitor ips, but it is not very friendly.

We don't want to blindly slurp up new ips from the config because the ips are critical to identifying monitors and doing so would very easily break the strict quorum requirements for changing things; this would make it trivial for a config typo to break the monitor cluster. Mon ips need to be published/approved via the existing quorum in an incremental fashion, or done offline by manually injecting a new monmap vai ceph-mon --inject-monmap <path> (or whatever it is).

Having IPs change out from under you isn't handled, but that's ok; no real cluster can survive that because the monitors are how clients find/join the system. If they change, everything is down/broken. These are meant to be (mostly) fixed, probably published via DNS or something for convenience.

#4 Updated by Anonymous over 11 years ago

Sage,

In all due respect, I disagree.

I can see how you do not want a single typo in a config file to mess up the entire cluster, but now you are asking someone that has the possibility of making that typo, to re-make that typo, multiple times, across each node, over and over in how many nodes?

I had just 3 nodes, but one was down for hours, because the changes i thought i made did not get picked up.... if I had 100 nodes, and had to migrate them, the cluster would be down for weeks.

If you do not think we should have the maps slurp from the config, then what is the config for? It should be the master source for all configuration.

Second, if you are not willing to implement this, then we should add a command, that you can type in the proper new settings, and it will go and update the config file and any maps or other locations that need to change. Currently, it is so convoluted to find what needs changing and why, that a customer would become very frustrated and walk away. At this time, there is too much chance left to know what needs to be done, and too many manual changes that can go very wrong if not done properly. Did you realize that all OSD disappear when the IPs went away? did you know the only instructions are how to make new ones, not recover old ones? these are the mistakes i would like us to not have the customer make....

#5 Updated by Sage Weil over 11 years ago

Deb Barba wrote:

Sage,

In all due respect, I disagree.

I can see how you do not want a single typo in a config file to mess up the entire cluster, but now you are asking someone that has the possibility of making that typo, to re-make that typo, multiple times, across each node, over and over in how many nodes?

I had just 3 nodes, but one was down for hours, because the changes i thought i made did not get picked up.... if I had 100 nodes, and had to migrate them, the cluster would be down for weeks.

This would never happen on a 100 node cluster. You wouldn't run the monitors on machines picking up dynamic IPs via DHCP in anything in production.

In any case, though, this is a manual process for each monitor (i.e., 3-5 nodes).

If you do not think we should have the maps slurp from the config, then what is the config for? It should be the master source for all configuration.

It's for feeding config to start daemons. There is a separation between 'configuration' (conf file, daemon behavior) and 'cluster state'. Cluster state is zealously protected by paxos on the monitors because if it is disrupted then all consistency bets are off and everything breaks spectacularly (as opposed to daemons just being down).

Second, if you are not willing to implement this, then we should add a command, that you can type in the proper new settings, and it will go and update the config file and any maps or other locations that need to change. Currently, it is so convoluted to find what needs changing and why, that a customer would become very frustrated and walk away. At this time, there is too much chance left to know what needs to be done, and too many manual changes that can go very wrong if not done properly. Did you realize that all OSD disappear when the IPs went away? did you know the only instructions are how to make new ones, not recover old ones? these are the mistakes i would like us to not have the customer make....

Agreed. The customer error is in using a dynamic ip for the monitors, though.. I think that is the first/most important thing to help them avoid.

Once they are operating vaguely within the range of what will actually work/can be supported/makes sense, we want to build tools and docs to help them with common procedures/problems. If they are outside of what makes sense or we would vaguely want to support (dynamic ips for monitors), then it's not a good use of our time...

#6 Updated by Anonymous over 11 years ago

Real world example of needing to change ip numbers....

john roman via hq.newdream.net
11:25 AM (7 minutes ago)

to all
Heads up! Destro is being physically moved to garland at 11:30 this thursday as part of our 2012 lax migration project.

during this migration, the machine and its services will receive new ips :)

#7 Updated by Joao Eduardo Luis over 11 years ago

That's a once or twice thing in a life time. Shouldn't happen often. As Sage pointed out before, there are ways to deal with this kind of thing. They may be complex, but then again, they're not something you should be doing every other week anyway.

In any case, I intend to update the docs with a section explaining how this whole thing should go down if one really needs to change the monitor's ip addresses. For further reference, the proper way would be to incrementally add new monitors with new ips before taking down/removing the ones you want to get rid of (with the obvious benefit of maintaining availability); the other solution would be to create a new monmap and injecting it on the monitors.

#8 Updated by Anonymous about 11 years ago

Joao,
thanks for the update.
Since mine came about due to a testing environment build on DHCP, I did not have the luxury of slowing injecting the new IP addresses.

I did do the more extreme suggestion of injecting a new monmap with new ip addresses into the cluster. Which allowed me to bring up the monitors again.

But the OSD's never recovered. I suspected that the IP addresses were somehow linked inside an osdmap or something, but could never find it.

do the OSD's have knowledge of the IP addresses? and if so, can you include the details on how to fix those as well?

Thanks
deb

Also available in: Atom PDF