Bug #893
closed
no filesystem created if all mdses are configured for standby-replay
Added by Alexandre Oliva about 13 years ago.
Updated about 13 years ago.
Description
If the [mds] section contains:
mds standby replay = true
then, once nodes are started after mkcephfs, all of the mdses will log that they failed to create '/' because it already exists and go into standby, never moving to “creating”. Dropping this setting from a single mds will get the filesystem created properly, and then the setting can be re-enabled.
Ideally, it should not fail to create the filesystem just because all mdses are configured for standby-replay, or this should be documented and the failure mode should be a bit less confusing. It looked like a major regression to me, and I almost went back to 0.24.3 to re-create the filesystem.
Hmm. Is http://ceph.newdream.net/wiki/Standby-replay_modes not clear enough?
mds standby replay
If this is set, then on startup the MDS will ask the monitor to make it a standby-replay for an active MDS. You can set this flag independently of specifying an MDS to follow; if you do so, the monitor will try to assign it to follow an MDS which has no standby-replay followers. If the monitor can't find an MDS without a follower, an MDS in this mode will remain in standby mode until the monitor finds one.
That is quite clear. What gave me incorrect expectations is that IIRC it started the cluster successfully from a full-stop scenario, picking one of the standby-replay nodes to become active, except when the filesystem hadn't been created yet. Whether or not it's a bug, IMHO it would be desirable for a standby-replay node to take over the creation of the filesystem if no other mdses are available.
It sounds like the monitor needs to make the mds as up:creating or up:starting (or up:replay) if the cluster isn't yet complete or is failed. Only if the cluster is complete should it leave the mds in standby...
Well, the intention was that if you specified standby-replay that meant you didn't want it going active unless the MDS it was following died. This way you can set your most powerful machine as the main MDS, etc.
If you have multiple MDSes and want one to be active and the others to be in standby/standby-replay, you can pick one of them to be the startup node and set it as "mds standby for rank = 0" (without an "mds standby replay" setting) and it will start up the cluster and the other MDSes will be standby or standby-replay. If your initial MDS then crashes then on restart it will go into standby-replay for the new leader.
We could set up some kind of timeout-based thing for making a standby-replay MDS an active MDS, I suppose, but I don't want to just take the first MDS to report to the monitor as an active MDS because then the assignment of roles depends on boot order and overrides the config file...
- Assignee set to Greg Farnum
- Status changed from New to Resolved
'fraid this didn't quite work. Just tried creating a new filesystem with 0.26. An MDS marked as standby-reply does indeed get promoted to “creating”, but it crashes before completing the job, and other MDSes don't get another shot at creating the filesystem.
Can you post the backtrace? We fixed a few bugs with standby-replay in the master branch already.
Also available in: Atom
PDF