Bug #25183: The ceph-mgr balancer stopped hangs when attempting to balance cluster - mgr - Ceph

Actions

Copy link

Bug #25183

closed

The ceph-mgr balancer stopped hangs when attempting to balance cluster

Added by Bryan Stillwell almost 6 years ago. Updated almost 3 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

balancer module

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v13.2.1

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Problem:
When using the ceph-mgr balancer in 13.2.1 (and 12.2.5 previously), trying to create an optimized plan results in the ceph-mgr hanging.

I'm using the upmap mode for the balancer:

ceph balancer status {
"active": false,
"plans": [],
"mode": "upmap"
}

Log messages look like this:
2018-07-30 15:45:14.979 7fe096cca700 1 mgr[balancer] Handling command: '{'prefix': 'balancer optimize', 'plan': 'run20180730', 'target': [
'mgr', '']}'
2018-07-30 15:45:15.063 7fe096cca700 4 mgr[balancer] Optimize plan run20180730
2018-07-30 15:45:15.063 7fe096cca700 4 mgr get_config get_config key: mgr/balancer/mode
2018-07-30 15:45:15.063 7fe096cca700 4 mgr get_config get_config key: mgr/balancer/max_misplaced
2018-07-30 15:45:15.063 7fe096cca700 4 mgr[balancer] Mode upmap, max misplaced 0.010000
2018-07-30 15:45:15.063 7fe096cca700 4 mgr[balancer] do_upmap
2018-07-30 15:45:15.063 7fe096cca700 4 mgr get_config get_config key: mgr/balancer/upmap_max_iterations
2018-07-30 15:45:15.063 7fe096cca700 4 ceph_config_get upmap_max_iterations not found
2018-07-30 15:45:15.067 7fe096cca700 4 mgr get_config get_config key: mgr/balancer/upmap_max_deviation
2018-07-30 15:45:15.067 7fe096cca700 4 ceph_config_get upmap_max_deviation not found
2018-07-30 15:45:15.067 7fe096cca700 4 mgr[balancer] pools ['rbd', 'cephfs_data_ec42', 'cephfs_data', 'cephfs_metadata']

Nothing else related to balancing is seen after that.

Expected result:
Another pass is done by the balancer to bring the cluster a step closer to being balanced.

Additional notes:
Trying to manually optimize the cluster results in a segfault:

ceph osd getmap -o osdmap-20180730.bin
got osdmap epoch 101015
osdmaptool osdmap-20180730.bin --upmap upmaps-20180730.txt
osdmaptool: osdmap file 'osdmap-20180730.bin'
writing upmap command output to: upmaps-20180730.txt
checking for upmap cleanups
upmap, max-count 100, max deviation 0.01
- Caught signal (Segmentation fault) **
  in thread 7fe69b6da8c0 thread_name:osdmaptool
  Segmentation fault (core dumped)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Custom queries

Bug #25183

The ceph-mgr balancer stopped hangs when attempting to balance cluster

Updated by Bryan Stillwell almost 6 years ago

Updated by John Spray almost 6 years ago

Updated by Bryan Stillwell almost 6 years ago

Updated by Sebastian Wagner about 5 years ago

Updated by Bryan Stillwell about 5 years ago

Updated by Konstantin Shalygin almost 3 years ago