Feature #20609: MDSMonitor: add new command `ceph fs set <fs_name> down` to bring the cluster down - CephFS - Ceph

Actions

Copy link

Feature #20609

closed

Feature #20606: mds: improve usability of cluster rank manipulation and setting cluster up/down

MDSMonitor: add new command `ceph fs set <fs_name> down` to bring the cluster down

Added by Patrick Donnelly almost 7 years ago. Updated about 5 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Douglas Fuller

Category:

Target version:

Ceph - v13.0.0

% Done:

Source:

Development

Tags:

Backport:

Reviewed:

Affected Versions:

Component(FS):

MDSMonitor

Labels (FS):

multimds

Pull request ID:

Description

This will cause the MDSMonitor to start stopping ranks (what deactivate does) beginning at the highest rank. Only one rank is stopped at a time. This implicitly means the cluster is not joinable.

Note: this process is manually enforced by 2c08f58ee8353322a342ce043150aafc8dd9c381. Let's just simplify things and execute the process in the monitor.

Actions

Copy link

Updated by Patrick Donnelly almost 7 years ago

Tracker changed from Bug to Feature

Actions

Copy link

Updated by Douglas Fuller almost 7 years ago

So really this would just set max_mds to 0. I do think this should trigger HEALTH_ERR unless and until the user deletes the filesystem or activates at least 1 MDS.

I'd like to work "mds" into this command somehow to make it clear to the user that they are shutting down the MDSs and not doing something to clients or data.

Actions

Copy link

Updated by Patrick Donnelly over 6 years ago

Douglas Fuller wrote:

So really this would just set max_mds to 0. I do think this should trigger HEALTH_ERR unless and until the user deletes the filesystem or activates at least 1 MDS.

Right.

I'd like to work "mds" into this command somehow to make it clear to the user that they are shutting down the MDSs and not doing something to clients or data.

We aren't really shutting down the MDSs (they will come back as standbys). The key thing is we're shutting down ranks. Feel free to think of a better wording for this and `ceph mds deactivate` (#20607).

Actions

Copy link

Updated by Douglas Fuller over 6 years ago

Patrick Donnelly wrote:

I'd like to work "mds" into this command somehow to make it clear to the user that they are shutting down the MDSs and not doing something to clients or data.

We aren't really shutting down the MDSs (they will come back as standbys). The key thing is we're shutting down ranks. Feel free to think of a better wording for this and `ceph mds deactivate` (#20607).

Sure, but I don't want the user to have to understand that distinction (at least, not strictly). I'll kick around the wording, but I think it should be 'mds' even if we add a further subcommand. Maybe something that mirrors the 'up/down' and 'in/out' used by the OSDs?

Actions

Copy link

Updated by Patrick Donnelly over 6 years ago

Douglas Fuller wrote:

Patrick Donnelly wrote:

I'd like to work "mds" into this command somehow to make it clear to the user that they are shutting down the MDSs and not doing something to clients or data.

We aren't really shutting down the MDSs (they will come back as standbys). The key thing is we're shutting down ranks. Feel free to think of a better wording for this and `ceph mds deactivate` (#20607).

Sure, but I don't want the user to have to understand that distinction (at least, not strictly). I'll kick around the wording, but I think it should be 'mds' even if we add a further subcommand. Maybe something that mirrors the 'up/down' and 'in/out' used by the OSDs?

OSDs join the entire RADOS cluster while MDSs join particular file systems as ranks.

We already use "rank" throughout our documentation. I feel we have an opportunity to reduce confusion by using the term where appropriate in the administrative commands.

Actions

Copy link

Updated by Douglas Fuller over 6 years ago

Patrick Donnelly wrote:

Douglas Fuller wrote:

Sure, but I don't want the user to have to understand that distinction (at least, not strictly). I'll kick around the wording, but I think it should be 'mds' even if we add a further subcommand. Maybe something that mirrors the 'up/down' and 'in/out' used by the OSDs?

OSDs join the entire RADOS cluster while MDSs join particular file systems as ranks.

Exactly, that's the analogy I was going for.

We already use "rank" throughout our documentation. I feel we have an opportunity to reduce confusion by using the term where appropriate in the administrative commands.

The entire concept is confusing, whether it's documented or not. The user doesn't really care about ranks; they really only care about the state (active, standby, or standby_replay).

Maybe we only have need one command here:

ceph fs set <fs> max_mds <N> # resize the mds cluster, activating new or deactivating old ranks as necessary. If changed, HEALTH_INFO

(Perhaps we could rename max_mds to max_active_mds to clarify that other daemons will be standbys.)

No more activate/deactivate. I think we should still keep 'fail,' though.

If the number of active MDS ranks is less than max{_active}_mds, HEALTH_WARN
If the number of active MDS ranks is 0, HEALTH_ERR

(edited to add: this may be more automagic than we want since it wouldn't allow fine-grained control of which MDS gets activated on what filesystem. Maybe we could add some more advanced command for that. Feedback welcome.)

Actions

Copy link

Updated by Patrick Donnelly over 6 years ago

Douglas Fuller wrote:

Patrick Donnelly wrote:

Douglas Fuller wrote:

Sure, but I don't want the user to have to understand that distinction (at least, not strictly). I'll kick around the wording, but I think it should be 'mds' even if we add a further subcommand. Maybe something that mirrors the 'up/down' and 'in/out' used by the OSDs?

OSDs join the entire RADOS cluster while MDSs join particular file systems as ranks.

Exactly, that's the analogy I was going for.

We already use "rank" throughout our documentation. I feel we have an opportunity to reduce confusion by using the term where appropriate in the administrative commands.

The entire concept is confusing, whether it's documented or not. The user doesn't really care about ranks; they really only care about the state (active, standby, or standby_replay).

Maybe we only have need one command here:

ceph fs set <fs> max_mds <N> # resize the mds cluster, activating new or deactivating old ranks as necessary. If changed, HEALTH_INFO

(Perhaps we could rename max_mds to max_active_mds to clarify that other daemons will be standbys.)

No more activate/deactivate. I think we should still keep 'fail,' though.

Ah, now we're getting to a healthy amount minimalism :)

If the number of active MDS ranks is less than max{_active}_mds, HEALTH_WARN

Conceivably we could have a max_mds set to a "large number" so that any extra standbys join the file system immediately. I would consider this to be an error though because then there can be no standbys and thus no failover. The new standby warning should catch this though and inform the admin their cluster is in a dangerous state.

If the number of active MDS ranks is 0, HEALTH_ERR

(edited to add: this may be more automagic than we want since it wouldn't allow fine-grained control of which MDS gets activated on what filesystem.

That's currently achieved with standby_for_fscid.

I like the idea of getting rid of deactivate. Users can't even choose to deactivate arbitrary ranks anymore (it must be max_mds-1).

Actions

Copy link

Updated by Douglas Fuller over 6 years ago

Patrick Donnelly wrote:

Douglas Fuller wrote:

(edited to add: this may be more automagic than we want since it wouldn't allow fine-grained control of which MDS gets activated on what filesystem.

That's currently achieved with standby_for_fscid.

Yeah, I suppose that's probably good enough. I was thinking of a way to make it work more directly with a command, but a better way isn't really coming to me.

Actions

Copy link