Project

General

Profile

Actions

Bug #42971

open

mgr hangs with upmap balancer

Added by Bryan Stillwell over 4 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Manager (RADOS bits)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On multiple clusters we are seeing the mgr hang frequently when the balancer is enabled. It seems that the balancer is getting caught in some kind of infinite loop which chews up all the CPU for the mgr which causes problems with other modules like prometheus (we don't have the devicehealth module enabled yet).

I've been able to reproduce the issue doing an offline balance as well using the osdmaptool:

osdmaptool --debug-osd 10 osd.map --upmap balance-upmaps.sh --upmap-pool default.rgw.buckets.data --upmap-max 100

It seems to loop over the same group of PGs of ~7,000 PGs over and over again like this without finding any new upmaps that can be added:

2019-11-19 16:39:11.131518 7f85a156f300 10 trying 24.d91
2019-11-19 16:39:11.138035 7f85a156f300 10 trying 24.2e3c
2019-11-19 16:39:11.144162 7f85a156f300 10 trying 24.176b
2019-11-19 16:39:11.149671 7f85a156f300 10 trying 24.ac6
2019-11-19 16:39:11.155115 7f85a156f300 10 trying 24.2cb2
2019-11-19 16:39:11.160508 7f85a156f300 10 trying 24.129c
2019-11-19 16:39:11.166287 7f85a156f300 10 trying 24.181f
2019-11-19 16:39:11.171737 7f85a156f300 10 trying 24.3cb1
2019-11-19 16:39:11.177260 7f85a156f300 10 24.2177 already has pg_upmap_items [368,271]
2019-11-19 16:39:11.177268 7f85a156f300 10 trying 24.2177
2019-11-19 16:39:11.182590 7f85a156f300 10 trying 24.a4
2019-11-19 16:39:11.188053 7f85a156f300 10 trying 24.2583
2019-11-19 16:39:11.193545 7f85a156f300 10 24.93e already has pg_upmap_items [80,27]
2019-11-19 16:39:11.193553 7f85a156f300 10 trying 24.93e
2019-11-19 16:39:11.198858 7f85a156f300 10 trying 24.e67
2019-11-19 16:39:11.204224 7f85a156f300 10 trying 24.16d9
2019-11-19 16:39:11.209844 7f85a156f300 10 trying 24.11dc
2019-11-19 16:39:11.215303 7f85a156f300 10 trying 24.1f3d
2019-11-19 16:39:11.221074 7f85a156f300 10 trying 24.2a57

While this cluster is running Luminous (12.2.12), I've reproduced the loop using the same osdmap on Nautilus (14.2.4).


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #42718: Improve OSDMap::calc_pg_upmaps() efficiencyResolvedDavid Zafman

Actions
Actions #1

Updated by Josh Durgin over 4 years ago

  • Project changed from Ceph to RADOS
  • Component(RADOS) Manager (RADOS bits) added

Hey Bryan, David's been fixing a couple issues in the balancer that sound like what you're running into:

1) https://tracker.ceph.com/issues/42432
2) https://tracker.ceph.com/issues/42718

for the 2nd one, we found the major inefficiency was from crush maps with device classes, where it was not considering the right crush root and failing to find something to change, causing it to continue trying when it should have succeeded in a few seconds. Does your crushmap use device classes or separate roots?

Actions #2

Updated by Bryan Stillwell over 4 years ago

We are using device classes.

Actions #3

Updated by Josh Durgin over 4 years ago

  • Related to Bug #42718: Improve OSDMap::calc_pg_upmaps() efficiency added
Actions #4

Updated by Bryan Stillwell over 4 years ago

So I wrote my own upmap balancer this weekend and after running it for a bit I found the same problem. It appears that this cluster only has three racks and one of the racks is 20TB smaller than the other two. So there is literally no possible way to achieve a balanced cluster where all OSDs are within 1% of each other.

Is there a way to detect this scenario in the balancer module so that it stops attempting to balance when it isn't possible?

Actions #5

Updated by David Zafman over 4 years ago

  • Assignee set to David Zafman
Actions

Also available in: Atom PDF