Bug #57348
crush map fails: (1) chose and choseleaf for type OSD not identical and (2) returns mapping containing down+out OSD
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We observe two issues with crush. Related ceph-user thread (split up into two for some reason): https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LCHPEKXS6OXKMPDR3BDMJM6MQZX3F3WL and https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AF3HPNYV7RTDZCDB4A5DVFNVUKVM537E
1. Step choose and chooseleaf for type OSD are not identical but should be:¶
osdmaptool: osdmap file 'osdmap-orig.bin' parsed '4.1c' -> 4.1c 4.1c raw ([6,1,4,5,3,2147483647], p6) up ([6,1,4,5,3,2147483647], p6) acting ([6,1,4,5,3,1], p6)
osdmaptool: osdmap file 'osdmap-chooseleaf.bin' parsed '4.1c' -> 4.1c 4.1c raw ([6,1,4,5,3,8], p6) up ([6,1,4,5,3,8], p6) acting ([6,1,4,5,3,1], p6)
Expected behaviour: Always return the same mapping.
Related to issue: https://tracker.ceph.com/issues/55169
2. Crush can emit a mapping containing a down+out OSD that should always be rejected:¶
osdmaptool: osdmap file 'osdmap-more-tries.bin' parsed '4.1c' -> 4.1c 4.1c raw ([6,1,4,5,3,7], p6) up ([6,1,4,5,3,2147483647], p6) acting ([6,1,4,5,3,1], p6)
Expected behaviour: Either return a valid mapping with all OSDs up+in or raw=[6,1,4,5,3,2147483647].
This smells like a premature termination of a recursion by accepting a mapping that should be rejected (a test for up+in is missing).
How to reproduce¶
1. Create EC profile and pool:¶
ceph osd erasure-code-profile set ec-4-2 k=4 m=2 crush-failure-domain=osd ceph osd pool create fs-data 128 erasure ec-4-2
2. This automatically creates this crush rule using "step choose":¶
rule fs-data { id 1 type erasure min_size 3 max_size 6 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step choose indep 0 type osd step emit }
3. Create this OSD tree (all osdmaps attached):¶
# osdmaptool osdmap-orig.bin --tree osdmaptool: osdmap file 'osdmap-orig.bin' ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2.44798 root default -7 0.81599 host tceph-01 0 hdd 0.27199 osd.0 up 0.87999 1.00000 3 hdd 0.27199 osd.3 up 0.98000 1.00000 6 hdd 0.27199 osd.6 up 0.92999 1.00000 -3 0.81599 host tceph-02 2 hdd 0.27199 osd.2 up 0.95999 1.00000 4 hdd 0.27199 osd.4 up 0.89999 1.00000 8 hdd 0.27199 osd.8 up 0.89999 1.00000 -5 0.81599 host tceph-03 1 hdd 0.27199 osd.1 up 0.89999 1.00000 5 hdd 0.27199 osd.5 up 1.00000 1.00000 7 hdd 0.27199 osd.7 destroyed 0 1.00000
4. Create and import variations of the default rule (all osdmaps attached):¶
Diffs between individual crush rules:
# diff crush-orig.txt crush-chooseleaf.txt 96c96 < step choose indep 0 type osd --- > step chooseleaf indep 0 type osd
# diff crush-chooseleaf.txt crush-more-tries.txt 94c94 < step set_choose_tries 100 --- > step set_choose_tries 200
5. Run the "--test-map-pg 4.1c" command on all osdmaps:
# for map in orig chooseleaf more-tries; do osdmaptool --test-map-pg 4.1c "osdmap-$map.bin" ; done # Output pasted above.