Bug #24005: pg-upmap can break the crush rule in some cases - Ceph - Ceph

Actions

Copy link

Bug #24005

closed

pg-upmap can break the crush rule in some cases

Added by huang jun almost 6 years ago. Updated over 5 years ago.

Status:

Closed

Priority:

Normal

Assignee:

xie xingguo

Category:

OSDMap

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

The test script is

./bin/init-ceph stop
killall ceph-mon ceph-osd
killall ceph-mon ceph-osd
OSD=9 MON=1 MGR=0 MDS=0 ../src/vstart.sh -X -n

./bin/ceph osd crush add-bucket test root
./bin/ceph osd crush add-bucket huangjun-1 host 
./bin/ceph osd crush add-bucket huangjun-2 host 
./bin/ceph osd crush add-bucket huangjun-3 host

./bin/ceph osd crush move huangjun-1 root=test
./bin/ceph osd crush move huangjun-2 root=test
./bin/ceph osd crush move huangjun-3 root=test

./bin/ceph osd crush add osd.0 1.0 host=huangjun-1
./bin/ceph osd crush add osd.1 1.0 host=huangjun-1
./bin/ceph osd crush add osd.2 1.0 host=huangjun-1

./bin/ceph osd crush add osd.3 1.0 host=huangjun-2
./bin/ceph osd crush add osd.4 1.0 host=huangjun-2
./bin/ceph osd crush add osd.5 1.0 host=huangjun-2

./bin/ceph osd crush add osd.7 1.0 host=huangjun-3
./bin/ceph osd crush add osd.6 1.0 host=huangjun-3
./bin/ceph osd crush add osd.8 1.0 host=huangjun-3

#unlink all osds from default crush bucket
for i in `seq 0 8`; do
./bin/ceph osd crush unlink osd.$i huangjun
done

./bin/ceph osd erasure-code-profile set test k=4 m=2 crush-failure-domain=osd
./bin/ceph osd getcrushmap -o crush
./bin/crushtool -d crush -o crush.txt
echo " 
rule test {
id 1
type erasure
min_size 1
max_size 10
step take huangjun-1
step chooseleaf indep 2 type osd
step emit
step take huangjun-2
step chooseleaf indep 2 type osd
step emit
step take huangjun-3
step chooseleaf indep 2 type osd
step emit
}
" >> crush.txt
./bin/crushtool -c crush.txt -o crush.new
./bin/ceph osd setcrushmap -i crush.new

./bin/ceph osd pool create test 256 256 erasure test test

./bin/ceph osd set-require-min-compat-client luminous

The cluster topology is

ID CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF 
-5       9.00000 root test                                   
-7       3.00000     host huangjun-1                         
 0   hdd 1.00000         osd.0           up  1.00000 1.00000 
 1   hdd 1.00000         osd.1           up  1.00000 1.00000 
 2   hdd 1.00000         osd.2           up  1.00000 1.00000 
-8       3.00000     host huangjun-2                         
 3   hdd 1.00000         osd.3           up  1.00000 1.00000 
 4   hdd 1.00000         osd.4           up  1.00000 1.00000 
 5   hdd 1.00000         osd.5           up  1.00000 1.00000 
-9       3.00000     host huangjun-3                         
 6   hdd 1.00000         osd.6           up  1.00000 1.00000 
 7   hdd 1.00000         osd.7           up  1.00000 1.00000 
 8   hdd 1.00000         osd.8           up  1.00000 1.00000 
-1             0 root default                                
-2             0     host huangjun

I choose pg 1.1 to test, the origin map is

osdmap e50 pg 1.1 (1.1) -> up [2,0,5,4,7,8] acting [2,0,5,4,7,8]

Then i use 'ceph osd pg-upmap-items' to set upmaps

./bin/ceph osd pg-upmap-items 1.1 2 1 4 2

and check the pg1.1 map again

[root@huangjun /usr/src/ceph-int/build]$ ./bin/ceph pg map 1.1                                                                                                                                                                                                            
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osdmap e52 pg 1.1 (1.1) -> up [1,0,5,2,7,8] acting [1,0,5,2,7,8]

My question is:
1. the crush rule said we take 2 osds from each host, but after this upmap,
we can see, pg1.1 have 3 osds in host huangjun-1, 1 osd in host huangjun-2,
which i think break the crush rule, and if the host huangjun-1 is halt, there
can be 3 osds down, that will result pg1.1 in down state.