Project

General

Profile

Bug #57348

crush map fails: (1) chose and choseleaf for type OSD not identical and (2) returns mapping containing down+out OSD

Added by Frank Schilder over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We observe two issues with crush. Related ceph-user thread (split up into two for some reason): https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LCHPEKXS6OXKMPDR3BDMJM6MQZX3F3WL and https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AF3HPNYV7RTDZCDB4A5DVFNVUKVM537E

1. Step choose and chooseleaf for type OSD are not identical but should be:

osdmaptool: osdmap file 'osdmap-orig.bin'
 parsed '4.1c' -> 4.1c
4.1c raw ([6,1,4,5,3,2147483647], p6) up ([6,1,4,5,3,2147483647], p6) acting ([6,1,4,5,3,1], p6)
osdmaptool: osdmap file 'osdmap-chooseleaf.bin'
 parsed '4.1c' -> 4.1c
4.1c raw ([6,1,4,5,3,8], p6) up ([6,1,4,5,3,8], p6) acting ([6,1,4,5,3,1], p6)

Expected behaviour: Always return the same mapping.

Related to issue: https://tracker.ceph.com/issues/55169

2. Crush can emit a mapping containing a down+out OSD that should always be rejected:

osdmaptool: osdmap file 'osdmap-more-tries.bin'
 parsed '4.1c' -> 4.1c
4.1c raw ([6,1,4,5,3,7], p6) up ([6,1,4,5,3,2147483647], p6) acting ([6,1,4,5,3,1], p6)

Expected behaviour: Either return a valid mapping with all OSDs up+in or raw=[6,1,4,5,3,2147483647].

This smells like a premature termination of a recursion by accepting a mapping that should be rejected (a test for up+in is missing).

How to reproduce

1. Create EC profile and pool:

ceph osd erasure-code-profile set ec-4-2 k=4 m=2 crush-failure-domain=osd
ceph osd pool create fs-data 128 erasure ec-4-2

2. This automatically creates this crush rule using "step choose":

rule fs-data {
    id 1
    type erasure
    min_size 3
    max_size 6
    step set_chooseleaf_tries 5
    step set_choose_tries 100
    step take default
    step choose indep 0 type osd
    step emit
}

3. Create this OSD tree (all osdmaps attached):

# osdmaptool osdmap-orig.bin --tree
osdmaptool: osdmap file 'osdmap-orig.bin'
ID  CLASS  WEIGHT   TYPE NAME          STATUS     REWEIGHT  PRI-AFF
-1         2.44798  root default                                   
-7         0.81599      host tceph-01                              
 0    hdd  0.27199          osd.0             up   0.87999  1.00000
 3    hdd  0.27199          osd.3             up   0.98000  1.00000
 6    hdd  0.27199          osd.6             up   0.92999  1.00000
-3         0.81599      host tceph-02                              
 2    hdd  0.27199          osd.2             up   0.95999  1.00000
 4    hdd  0.27199          osd.4             up   0.89999  1.00000
 8    hdd  0.27199          osd.8             up   0.89999  1.00000
-5         0.81599      host tceph-03                              
 1    hdd  0.27199          osd.1             up   0.89999  1.00000
 5    hdd  0.27199          osd.5             up   1.00000  1.00000
 7    hdd  0.27199          osd.7      destroyed         0  1.00000

4. Create and import variations of the default rule (all osdmaps attached):

Diffs between individual crush rules:

# diff crush-orig.txt crush-chooseleaf.txt
96c96
<     step choose indep 0 type osd
---
>     step chooseleaf indep 0 type osd
# diff crush-chooseleaf.txt crush-more-tries.txt 
94c94
<     step set_choose_tries 100
---
>     step set_choose_tries 200

5. Run the "--test-map-pg 4.1c" command on all osdmaps:

# for map in orig chooseleaf more-tries; do osdmaptool --test-map-pg 4.1c "osdmap-$map.bin" ; done
# Output pasted above.

osdmaps.tgz - All osdmaps used plus text version of crush rules. (3.05 KB) Frank Schilder, 08/31/2022 12:14 PM

Also available in: Atom PDF