Bug #2214: crush: pgs only mapped to 2 devices with replication level 3 - RADOS - Ceph

Actions

Copy link

Bug #2214

closed

crush: pgs only mapped to 2 devices with replication level 3

Added by Josh Durgin about 12 years ago. Updated almost 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This is from #2173. Note that all 3 osds are up.

./osdmaptool --print osdmap
./osdmaptool: osdmap file 'osdmap'
epoch 3212
fsid a743a194-fa91-48fb-8778-e294483273d9
created 2012-03-01 02:06:10.677024
modifed 2012-03-15 17:31:02.260488
flags 

pool 0 'data' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 3172 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 3 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 3162 owner 0
pool 2 'rbd' rep size 3 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 lpg_num 2 lpgp_num 2 last_change 3160 owner 0

max_osd 3
osd.0 up   in  weight 1 up_from 3037 up_thru 3211 down_at 3035 last_clean_interval [2865,3034) 192.168.10.205:6800/6301 192.168.10.205:6801/6301 192.168.10.205:6802/6301 exists,up
osd.1 up   in  weight 1 up_from 3055 up_thru 3211 down_at 3054 last_clean_interval [3013,3053) lost_at 358 192.168.10.201:6800/20518 192.168.10.201:6801/20518 192.168.10.201:6802/20518 exists,up
osd.2 up   in  weight 1 up_from 3211 up_thru 3211 down_at 3209 last_clean_interval [3207,3208) 192.168.10.201:6803/26378 192.168.10.201:6806/26378 192.168.10.201:6807/26378 exists,up

pg_temp 0.7 [0,2,1]
pg_temp 1.6 [0,2,1]
pg_temp 1.1c [0,2,1]

$ ./osdmaptool --test-map-pg 0.6 osdmap
./osdmaptool: osdmap file 'osdmap'
 parsed '0.6' -> 0.6
0.6 raw [2,1] up [2,1] acting [2,1]

$ ./osdmaptool --test-map-pg 2.4 osdmap
./osdmaptool: osdmap file 'osdmap'
 parsed '2.4' -> 2.4
2.4 raw [2,1] up [2,1] acting [2,1]

$ ./osdmaptool --test-map-pg 2.6 osdmap
./osdmaptool: osdmap file 'osdmap'
 parsed '2.6' -> 2.6
2.6 raw [0,2,1] up [0,2,1] acting [0,2,1]

The crushmap in the osdmap is:

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2

# types
type 0 osd
type 1 host
type 2 rack
type 3 pool

# buckets
host server01 {
    id -4        # do not change unnecessarily
    # weight 2.000
    alg straw
    hash 0    # rjenkins1
    item osd.1 weight 1.000
    item osd.2 weight 1.000
}
host server02 {
    id -2        # do not change unnecessarily
    # weight 1.000
    alg straw
    hash 0    # rjenkins1
    item osd.0 weight 1.000
}
rack unknownrack {
    id -3        # do not change unnecessarily
    # weight 2.000
    alg straw
    hash 0    # rjenkins1
    item server01 weight 1.000
    item server02 weight 1.000
}
pool default {
    id -1        # do not change unnecessarily
    # weight 1.000
    alg straw
    hash 0    # rjenkins1
    item unknownrack weight 1.000
}

# rules
rule data {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step choose firstn 0 type osd
    step emit
}
rule metadata {
    ruleset 1
    type replicated
    min_size 1
    max_size 10
    step take default
    step choose firstn 0 type osd
    step emit
}
rule rbd {
    ruleset 2
    type replicated
    min_size 1
    max_size 10
    step take default
    step choose firstn 0 type osd
    step emit
}

Is this due to the local retry behavior?

Files

osdmap (2.28 KB) osdmap

Josh Durgin, 03/26/2012 09:58 AM

Related issues 1 (0 open — 1 closed)