Bug #7328: osd: reweight-by-utilization ended up with stuck remapped pgs - Ceph - Ceph

Actions

Copy link

Bug #7328

closed

osd: reweight-by-utilization ended up with stuck remapped pgs

Added by Tyler Brekke about 10 years ago. Updated about 10 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Support

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Running ceph osd reweight-by-utilization resulted in stuck pgs.

health HEALTH_WARN 204 pgs stuck unclean; recovery 4136/163619494 objects degraded (0.003%) 
monmap e1: 1 mons at {cs-compute03=192.168.181.13:6789/0}, election epoch 1, quorum 0 cs-compute03 
osdmap e2996: 120 osds: 120 up, 120 in 
pgmap v665045: 6128 pgs, 4 pools, 164 TB data, 79892 kobjects 
329 TB used, 106 TB / 435 TB avail 
4136/163619494 objects degraded (0.003%) 
5917 active+clean 
204 active+remapped

I believe the cause is related to the crush configuration. 2 copies split across 2 rooms. Weighting the OSDs back up to 1 resolves the remapped pgs.

root root2 {
    id -7        # do not change unnecessarily
    # weight 435.600
    alg straw
    hash 0    # rjenkins1
    item room1 weight 217.800
    item room2 weight 217.800
}

rule testrule {
    ruleset 3
    type replicated
    min_size 2
    max_size 4
    step take root2
    step chooseleaf firstn 0 type room
    step emit
}

If this is just caused from the crush configuration reweight-by-utilization should be smarter about weighting down OSDs.

More cluster information is available in ZD Ticket #928

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #7328

osd: reweight-by-utilization ended up with stuck remapped pgs

Updated by Ian Colle about 10 years ago

Updated by Sage Weil about 10 years ago