Bug #15653
Updated by Sage Weil almost 8 years ago
CRUSH will correctly choose items with relative weights with the right probabilities for each independent choice. However, when choosing multiple replicas, each choice is *not* indepent, since it
has to be unique. The result is that low-weighted devices get too many items.
Simple example:
<pre>
maetl:src (master) 03:20 PM $ cat cm.txt
# begin crush map
# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3
device 4 device4
# types
type 0 osd
type 1 domain
type 2 pool
# buckets
domain root {
id -1 # do not change unnecessarily
# weight 5.000
alg straw2
hash 0 # rjenkins1
item device0 weight 10.00
item device1 weight 10.0
item device2 weight 10.0
item device3 weight 10.0
item device4 weight 1.000
}
# rules
rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take root
step choose firstn 0 type osd
step emit
}
# end crush map
maetl:src (master) 03:20 PM $ ./crushtool -c cm.txt -o cm
maetl:src (master) 03:20 PM $ ./crushtool -i cm --test --show-utilization --num-rep 1 --min-x 1 --max-x 1000000 --num-rep 1
rule 0 (data), x = 1..1000000, numrep = 1..1
rule 0 (data) num_rep 1 result size == 1: 1000000/1000000
device 0: stored : 243456 expected : 200000
device 1: stored : 243624 expected : 200000
device 2: stored : 244486 expected : 200000
device 3: stored : 243881 expected : 200000
device 4: stored : 24553 expected : 200000
maetl:src (master) 03:20 PM $ ./crushtool -i cm --test --show-utilization --num-rep 1 --min-x 1 --max-x 1000000 --num-rep 3
rule 0 (data), x = 1..1000000, numrep = 3..3
rule 0 (data) num_rep 3 result size == 3: 1000000/1000000
device 0: stored : 723984 expected : 600000
device 1: stored : 722923 expected : 600000
device 2: stored : 723153 expected : 600000
device 3: stored : 723394 expected : 600000
device 4: stored : 106546 expected : 600000
</pre>
Note that in the 1x case, we get 1/10th the items on device 4, as expected. For 3x, it grows to 1/7th. For lower weights the amplification is more pronounced.