Project

General

Profile

Bug #15653

Updated by Loïc Dachary about 7 years ago

h3. discussion

* "crush multipick anomaly":http://marc.info/?l=ceph-devel&m=148539995928656&w=2

h3.
description, with example

CRUSH will correctly choose items with relative weights with the right probabilities for each independent choice. However, when choosing multiple replicas, each choice is *not* indepent, since it
has to be unique. The result is that low-weighted devices get too many items.

Simple example:
<pre>

maetl:src (master) 03:20 PM $ cat cm.txt
# begin crush map

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3
device 4 device4

# types
type 0 osd
type 1 domain
type 2 pool

# buckets
domain root {
id -1 # do not change unnecessarily
# weight 5.000
alg straw2
hash 0 # rjenkins1
item device0 weight 10.00
item device1 weight 10.0
item device2 weight 10.0
item device3 weight 10.0
item device4 weight 1.000
}

# rules
rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take root
step choose firstn 0 type osd
step emit
}

# end crush map
maetl:src (master) 03:20 PM $ ./crushtool -c cm.txt -o cm
maetl:src (master) 03:20 PM $ ./crushtool -i cm --test --show-utilization --num-rep 1 --min-x 1 --max-x 1000000 --num-rep 1
rule 0 (data), x = 1..1000000, numrep = 1..1
rule 0 (data) num_rep 1 result size == 1: 1000000/1000000
device 0: stored : 243456 expected : 200000
device 1: stored : 243624 expected : 200000
device 2: stored : 244486 expected : 200000
device 3: stored : 243881 expected : 200000
device 4: stored : 24553 expected : 200000
maetl:src (master) 03:20 PM $ ./crushtool -i cm --test --show-utilization --num-rep 1 --min-x 1 --max-x 1000000 --num-rep 3
rule 0 (data), x = 1..1000000, numrep = 3..3
rule 0 (data) num_rep 3 result size == 3: 1000000/1000000
device 0: stored : 723984 expected : 600000
device 1: stored : 722923 expected : 600000
device 2: stored : 723153 expected : 600000
device 3: stored : 723394 expected : 600000
device 4: stored : 106546 expected : 600000
</pre>

Note that in the 1x case, we get 1/10th the items on device 4, as expected. For 3x, it grows to 1/7th. For lower weights the amplification is more pronounced.

h3. detailed explanation

The chances of getting a particular device during the first draw is the weight of the device divided by the sum of the weight of all devices. For example let say there are 5 devices in a bucket, with the following weights a = 10, b = 10, c = 10, d = 10, e = 1. The chances of getting e is 1/41 and the chances of getting a is 10/41.

Things get more complicated for the second draw because we have to account for a first draw that does not include a given device: it is the sum of the weight of all devices except the one we're interested in, divided by the weight of all devices. So, if we want to know the chances of e showing up in the second draw, the first draw must not include it and this has a 40/41 chance of happening. Also, during the second draw, the chance of getting e is increased because there is one less device to chose from (the one that was picked during the first draw): 1/31 (i.e. 41 - the weight of the device that was chosen). Because the second draw depends on the first draw, the probability must be multiplied: 40/41 * 1/31.

Since the chance of getting the device e in a first draw or getting the device e in a second draw are independent, the chances of getting the device e in both situations is the sum of their probability: 1/41 (first draw) + (40/41 * 1/31) (second draw).

This is a special case because all devices have the same weight except for e. If we are to calculate the probability of a being selected in the second draw, we have to sum the case where e is selected and the case where b, c or d is selected during the first draw, because they do not have the same weight. If e is selected during the first draw, a will be selected during the second draw with a probability of (1/41 * 10/40). If b, c, or d is selected during the first draw, a will be selected during the second draw with a probability of (30/41 * 10/31).

The chances of getting the device a in the first draw and the second draw is therefore: 10/41+(30/41 * 10/31)+(1/41 * 10/40)

To summarize:

* probability of getting e : 1/41 * (40/41 * 1/31) = .05586
* probability of getting a : 10/41+(30/41 * 10/31)+(1/41 * 10/40) = .48603

We are therefore 8.7 ( 0.48603/0.05586 ) more likely to get e than to get a.

From the point of view of the users, this is counter intuitive because they expect that the weight reflects the probability, which is only true for a single draw. With just one draw a is (10/41)/(1/41) = 10 times more likely to be selected than e. With two draws, a is only 8.7 times more likely to be selected than e, as shown above.

Back