Project

General

Profile

Bug #54432

it is unclear to disable mon_osd_down_out_subtree_limit function

Added by Yuma Ogami 11 months ago. Updated 7 months ago.

Status:
Fix Under Review
Priority:
Normal
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'd like to make any DOWN OSDs OUT regardless of the status of other OSDs. In other words, I want to disable the effect of mon_osd_down_out_subtree_limit parameter.

I found that setting the following two values to that parameter seems to work well.
A. Specify "root". It means that all OSDs are prevent to be out if all OSDs get down. So it practically accomplish my purpose.
B. Specify "osd" (*1). However, from the official document (*2), it looks like all OSDs shouldn't be OUT in this case.

Please let me know what is the proper way to achieve my goal.

(*1) https://github.com/ceph/ceph/blob/5cdf8929e9f857a53820c4690ccfe30288b6ca91/src/mon/OSDMonitor.cc#L5189
(*2) https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_subtree_limit

History

#1 Updated by Yuma Ogami 11 months ago

Can anyone reply this?

#2 Updated by Dan van der Ster 11 months ago

"root" should achieve what you want. But keep in mind this other relevant param:

Option("mon_osd_min_up_ratio", Option::TYPE_FLOAT, Option::LEVEL_ADVANCED)
.set_default(.3)
.add_service("mon")
.set_description("do not automatically mark OSDs 'out' if fewer than this many OSDs are 'up'")

(the mon will not mark more than 70% of the osds out by default).

#3 Updated by Yuma Ogami 11 months ago

Thank you for your reply. I understood.

Isn't the relevant parameter "mon_osd_min_in_ratio" instead of "mon_osd_min_up_ratio"? "mon_osd_min_up_ratio" is used in `can_mark_down()`.(*1) And "mon_osd_min_in_ratio" is used in `can_mark_out()`.(*2)

(*1) https://github.com/ceph/ceph/blob/v16.2.7/src/mon/OSDMonitor.cc#L3119
(*2) https://github.com/ceph/ceph/blob/v16.2.7/src/mon/OSDMonitor.cc#L3158

#4 Updated by Dan van der Ster 11 months ago

Yuma Ogami wrote:

Thank you for your reply. I understood.

Isn't the relevant parameter "mon_osd_min_in_ratio" instead of "mon_osd_min_up_ratio"? "mon_osd_min_up_ratio" is used in `can_mark_down()`.(*1) And "mon_osd_min_in_ratio" is used in `can_mark_out()`.(*2)

(*1) https://github.com/ceph/ceph/blob/v16.2.7/src/mon/OSDMonitor.cc#L3119
(*2) https://github.com/ceph/ceph/blob/v16.2.7/src/mon/OSDMonitor.cc#L3158

Correct, in fact both of those are relevant!

#5 Updated by Yuma Ogami 11 months ago

By the way, isn't the following behavior a bug?

B. Specify "osd" (*1). However, from the official document (*2), it looks like all OSDs shouldn't be OUT in this case.

#6 Updated by Dan van der Ster 11 months ago

Yuma Ogami wrote:

By the way, isn't the following behavior a bug?

B. Specify "osd" (*1). However, from the official document (*2), it looks like all OSDs shouldn't be OUT in this case.

Let's see the code. Here is the implementation, and keep in mind that crush type for "osd" is 0:

        string down_out_subtree_limit = g_conf().get_val<string>("mon_osd_down_out_subtree_limit");
...
        // is this an entire large subtree down?
        if (down_out_subtree_limit.length()) {
          int type = osdmap.crush->get_type_id(down_out_subtree_limit);
          if (type > 0) {
            if (osdmap.containing_subtree_is_down(cct, o, type, &down_cache)) {
              dout(10) << "tick entire containing " << down_out_subtree_limit
                       << " subtree for osd." << o
                       << " is down; resetting timer" << dendl;
              // reset timer, too.
              down_pending_out[o] = now;
              continue;
            }
          }
        }

So, what I said earlier was wrong. It seems the correct way to disable the subtree check is to set mon_osd_down_out_subtree_limit = "".
Setting to "osd", or any other invalid string, should also effectively disable it. (But I think ="osd" is confusing, so I would rather propose that the documentation is fixed to say that the valid settings are "" or type 1 and higher crush items, aka "CRUSH buckets")

BTW setting mon_osd_down_out_subtree_limit = "root" will disable the feature in all cases except when in one single osd tick all osds would be marked down -> out.

#7 Updated by Dan van der Ster 11 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to Dan van der Ster
  • Pull request ID set to 45475

#8 Updated by Ilya Dryomov 7 months ago

  • Target version deleted (v16.2.8)

Also available in: Atom PDF