Project

General

Profile

Actions

Bug #51729

open

Upmap verification fails for multi-level crush rule

Added by Andras Pataki almost 3 years ago. Updated 29 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

70%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
CRUSH
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a 6+3 EC crush rule that looks like this:

rule cephfs_ec63 {
        id 2
        type erasure
        min_size 3
        max_size 9
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take root-disk
        step choose indep 3 type pod
        step choose indep 3 type rack
        step chooseleaf indep 1 type osd
        step emit
}


When I try to add any upmap for PGs that use this crush rule, I get a verification error from ceph-mon:
Jul 19 12:37:21 cephmon00 ceph-mon[16894]: 2021-07-19 12:37:21.856 7fffe1e96700 -1 verify_upmap number of buckets 9 exceeds desired 3

I've verified that the upmap is correct. An example PG:

9.7ef4 13774 0 0 13740 0 336168034922 0 0 3091 3091 active+remapped+backfill_wait 2021-07-19 12:30:36.136523 1884465'416385 1884831:2962474 [64,2354,1364,3718,252,1265,2505,1093,2759] 64 [64,2354,1364,3718,252,1265,2505,1122,2759] 64 1883485'416346 2021-07-17 22:49:20.660394 1883485'416346 2021-07-17 22:49:20.660394 0

with upmap:

ceph osd pg-upmap-items 9.7ef4 1093 1122

which is correct since osd.1093 and osd.1122 live on the same host (i.e. it doesn't violate crush rules to do this upmap).

The cluster is on 14.2.20.


Files

osdmap.bin.gz (281 KB) osdmap.bin.gz binary OSD map Andras Pataki, 07/23/2021 10:09 PM
osdmap.bad.bin (14.1 KB) osdmap.bad.bin bad osdmap preventing upmap insertion Chris Durham, 10/25/2022 11:09 PM
osdmap.good.bin (14.1 KB) osdmap.good.bin good osdmap with upmap inserted Chris Durham, 10/25/2022 11:09 PM

Related issues 1 (1 open0 closed)

Related to RADOS - Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor logNeed More Info

Actions
Actions

Also available in: Atom PDF