Project

General

Profile

Actions

Bug #37940

closed

upmap balancer won't refill underfull osds if zero overfull found

Added by Dan van der Ster over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
balancer module
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
mimic, luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The following was seen on v12.2.10.

One pool has been upmap balanced for awhile, so there are now zero overfull osds. But there is a new osd (557) which gets stuck severely underfull forever.
Here is the relevant log in calc_pg_upmaps:

2019-01-16 14:17:49.103173 7f203d36d700 20  osd.553     pgs 85  target 85.5565  deviation -0.556458
2019-01-16 14:17:49.103176 7f203d36d700 20  osd.554     pgs 58  target 57.0376  deviation 0.962406
2019-01-16 14:17:49.103178 7f203d36d700 20  osd.555     pgs 57  target 57.0376  deviation -0.0375938
2019-01-16 14:17:49.103180 7f203d36d700 20  osd.556     pgs 58  target 57.0376  deviation 0.962406
2019-01-16 14:17:49.103183 7f203d36d700 20  osd.557     pgs 24  target 57.0376  deviation -33.0376
2019-01-16 14:17:49.103185 7f203d36d700 20  osd.558     pgs 58  target 57.0376  deviation 0.962406
2019-01-16 14:17:49.103188 7f203d36d700 20  osd.559     pgs 58  target 57.0376  deviation 0.962406
2019-01-16 14:17:49.103190 7f203d36d700 20  osd.560     pgs 58  target 57.0376  deviation 0.962406
2019-01-16 14:17:49.103192 7f203d36d700 20  osd.561     pgs 58  target 57.0376  deviation 0.962406
2019-01-16 14:17:49.103194 7f203d36d700 20  osd.562     pgs 58  target 57.0376  deviation 0.962406
2019-01-16 14:17:49.103197 7f203d36d700 20  osd.563     pgs 57  target 57.0376  deviation -0.0375938
2019-01-16 14:17:49.103199 7f203d36d700 20  osd.564     pgs 57  target 57.0376  deviation -0.0375938
2019-01-16 14:17:49.103201 7f203d36d700 20  osd.565     pgs 57  target 57.0376  deviation -0.0375938
2019-01-16 14:17:49.103203 7f203d36d700 20  osd.566     pgs 57  target 57.0376  deviation -0.0375938
2019-01-16 14:17:49.103206 7f203d36d700 20  osd.567     pgs 56  target 57.0376  deviation -1.03759
2019-01-16 14:17:49.103208 7f203d36d700 20  osd.568     pgs 57  target 57.0376  deviation -0.0375938
2019-01-16 14:17:49.103217 7f203d36d700 20  osd.569     pgs 57  target 57.0376  deviation -0.0375938
2019-01-16 14:17:49.103223 7f203d36d700 10  total_deviation 144.336 overfull  underfull [557,466,469,471,478,483,485,542,567]
2019-01-16 14:17:49.104091 7f203d36d700 10  start deviation 144.336
2019-01-16 14:17:49.104096 7f203d36d700 10  end deviation 144.336

In this case, we have no overfull osds, one underfull osd 557 with a large negation deviation. (The handful of other underfull osds have deviation just under minus 1).

But because of this break, that single underfull osd is never re-filled:

@@ -4088,8 +4088,8 @@ int OSDMap::calc_pg_upmaps(
    if (overfull.empty() || underfull.empty())
      break;

One way to fix this would be to populate overfull more aggressively:

diff --git a/src/osd/OSDMap.cc b/src/osd/OSDMap.cc
index 2bb8beb94e..51bc4e7bdf 100644
--- a/src/osd/OSDMap.cc
+++ b/src/osd/OSDMap.cc
@@ -4067,7 +4067,7 @@ int OSDMap::calc_pg_upmaps(
                     << dendl;
       osd_deviation[i.first] = deviation;
       deviation_osd.insert(make_pair(deviation, i.first));
-      if (deviation >= 1.0)
+      if (deviation >= 0.5)  // magic number, maybe 0.1 is better, maybe a configurable
        overfull.insert(i.first);
       total_deviation += abs(deviation);
     }

This way, the balancing would continue as long as there are underfull osds.

I can imagine a similar scenario with few outlier overfull and zero underfull osds, but I haven't seen that in the wild yet.

Thoughts?


Related issues 2 (0 open2 closed)

Copied to mgr - Backport #38036: mimic: upmap balancer won't refill underfull osds if zero overfull foundResolvedxie xingguoActions
Copied to mgr - Backport #38037: luminous: upmap balancer won't refill underfull osds if zero overfull foundResolvedxie xingguoActions
Actions

Also available in: Atom PDF