Actions
Bug #11119
closeddata placement is a function of OSD id
Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
While looking closely at straw vs. straw2 buckets I realized that one property of CRUSH/straw that I thought was true is in fact not true. What I expected is, given the following:
- two OSDs with ids x and y
- OSD x fails and is replaced
- the replacement OSD gets a new id y
- OSD x is removed from CRUSH
- OSD y is added to CRUSH at the same location and with the same weight that x had
then:
- OSD y should get the same PGs that x had
- there should be no data movement on other OSDs in the cluster
But this turns out to be not true. And since we rely on this falsehood in our operations procedures, our disk replacements are moving a lot more data than they should.
Here is my example.
We start with crush.txt.orig:
# begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable straw_calc_version 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 # types type 0 device type 1 host type 2 default # buckets host host0 { id -1 # do not change unnecessarily # weight 2.000 alg straw hash 0 # rjenkins1 item osd.0 weight 1.000 item osd.1 weight 1.000 } host host1 { id -2 # do not change unnecessarily # weight 2.000 alg straw hash 0 # rjenkins1 item osd.2 weight 1.000 item osd.3 weight 1.000 } default default { id -3 # do not change unnecessarily # weight 4.000 alg straw hash 0 # rjenkins1 item host0 weight 2.000 item host1 weight 2.000 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map
Then after replacing osd.0 with osd.4 (to make crush.txt.new):
# begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable straw_calc_version 1 # devices device 0 device0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 # types type 0 device type 1 host type 2 default # buckets host host0 { id -1 # do not change unnecessarily # weight 2.000 alg straw hash 0 # rjenkins1 item osd.4 weight 1.000 item osd.1 weight 1.000 } host host1 { id -2 # do not change unnecessarily # weight 2.000 alg straw hash 0 # rjenkins1 item osd.2 weight 1.000 item osd.3 weight 1.000 } default default { id -3 # do not change unnecessarily # weight 4.000 alg straw hash 0 # rjenkins1 item host0 weight 2.000 item host1 weight 2.000 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map
Then we test the new maps vs expected:
crushtool -c crush.txt.orig -o cm.orig crushtool -c crush.txt.new -o cm.new crushtool -i cm.orig --num-rep 2 --test --show-mappings > orig.mappings 2>&1 cat orig.mappings | sed -e 's/\[0/\[4/' | sed -e 's/0\]/4\]/' > expected.mappings crushtool -i cm.new --num-rep 2 --test --show-mappings > actual.mappings 2>&1 wc -l orig.mappings diff -u expected.mappings actual.mappings | grep -c ^+
I get 344/1024 PGs which move. Comments?
Files
Actions