Bug #24161: [rbd-mirror] simple image map policy doesn't always level-load instances - rbd - Ceph

Actions

Copy link

Bug #24161

closed

[rbd-mirror] simple image map policy doesn't always level-load instances

Added by Jason Dillaman almost 6 years ago. Updated almost 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Venky Shankar

Target version:

% Done:

Source:

Tags:

Backport:

mimic

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

One observation though (I think not related to this PR but rather how the policy currently works). If I remove images from the pool that are replaying by the same instance there is no any shuffling and you can end up with one completely idle instance. When adding new images it will add them evenly to the instances, so you still will have that previously idle instance underloaded. It looks like the same thing happens when stopping an instance: the policy will try to distribute its images evenly among other instances, not taking into account how each instance is currently loaded.
So you can end up with image distribution like below:

% for i in 0 1 2 3; do ceph --admin-daemon /tmp/tmp.rbd_mirror/rbd-mirror.cluster1-client.mirror.$i.asok help |grep  -c 'status mirror/'; done
7
3
10
2
It looks like the only way to reshuffle them evenly is to restart instances. I suppose this is not what users will expect from "simple" policy -- I think they will want even distribution not depending on the history.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Jason Dillaman almost 6 years ago

Backport set to mimic

Actions

Copy link

Updated by Venky Shankar almost 6 years ago

Assignee set to Venky Shankar

Actions

Copy link

Updated by Venky Shankar almost 6 years ago

As of now, simple policy does not reshuffle mapped images when images are removed -- that's only done when instances are added or removed.

Actions

Copy link

Updated by Venky Shankar almost 6 years ago

PR https://github.com/ceph/ceph/pull/22304

Adding images after a bunch of image removals should pick an instance which is least loaded -- I think the reason this was not observed in Mykola's test setup was due the fact the on-disk (and in-memory) image map is not purged when removing images. This can be confirmed by checking the number of image map keys (`image_map_*`) in `rbd_mirror_leader` object after removing some images.

Actions

Copy link