Bug #48065: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree - RADOS - Ceph

Actions

Copy link

Bug #48065

closed

"ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree

Added by Mykola Golub over 3 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Tags:

Backport:

pacific,octopus,nautilus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

39629

Crash signature (v1):

Crash signature (v2):

Description

We noticed that if one set an osd crush weight using the command

ceph osd crush set $id $weight host=$host

it updates the osd weight on $host bucket, but does not update it on the device class bucket (${host}~hdd or ${host}~ssd), and as the result the old weight is still used until one runs `ceph osd crush reweight-all` or do some other changes that cause the crushmap recalculation.

The same behavior is for `ceph osd crush reweight-subtree <name> <weight>` command.

At the moment I am not sure if it is a bug. I just would like to report this for discussion. The current behavior might be ok if there were a way to set the desired weight on a device class subtee, but I don't know a way. When I try the above commands with host=${host}~ssd it complains about the invalid char "~" in the bucket name.

Files

48065.tar.gz (211 KB) 48065.tar.gz

`ceph report`, `ceph osd dump`, `ceph osd df tree`, etc

Mykola Golub, 11/24/2020 08:48 AM

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Neha Ojha over 3 years ago

Status changed from New to Need More Info
Priority changed from Normal to High

This does sound like a bug. Can you please share the osdmap?

Actions

Copy link

Updated by Mykola Golub over 3 years ago

Actually, the problem with the weight not updated on the class subtree is easily reproducible on a vstart cluster (see the details below). But it turned out, on my vstart cluster the problem looks rather cosmetic: although the weight is not updated on the class subtree (adonis~ssd bucket in my case) the new weight is actually used to distribute pgs (according to `ceph osd df`).

This is not exactly what we observed for our customer (running nautilus). In their case they were redeploying osds (two hosts at once) with "osd crush initial weight = 0" in the config. Then they used "ceph osd crush set|reweight-subtree" commands to set a non-zero weight but were observing that osds were still not used. And only after they did some modifications to the crush map (redeployed another osds or just created/deleted a fake bucket in the crush map) the osds started to be used. As we noticed that "ceph osd crush set|reweight-subtree" commands did not change the weight on the class subtree we decided it was the cause that the osds were not used in their case, and we just recommended them to use "ceph osd crush reweight" command that properly updates weight on all subtrees. Unfortunately they do not need to redeploy in near future so we can not check at the moment. And as I failed to reproduce the case locally right now I am not exactly sure the problem was due to not updated weight on the class subtree. I am going to dig this further and will ask the customer if it is ok to share details about their cluster (osdmap) here.

Instructions for vstart cluster:

adonis:~/ceph/ceph/build% ../src/vstart.sh -n
...
adonis:~/ceph/ceph/build% ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME        STATUS  REWEIGHT  PRI-AFF
-1         0.29576  root default                              
-3         0.29576      host adonis                           
 0    ssd  0.09859          osd.0        up   1.00000  1.00000
 1    ssd  0.09859          osd.1        up   1.00000  1.00000
 2    ssd  0.09859          osd.2        up   1.00000  1.00000
adonis:~/ceph/ceph/build% ceph osd crush set 0 0.666 host=adonis
set item id 0 name 'osd.0' weight 0.666 at location {host=adonis} to crush map
adonis:~/ceph/ceph/build% ceph osd tree                         
ID  CLASS  WEIGHT   TYPE NAME        STATUS  REWEIGHT  PRI-AFF
-1         0.86316  root default                              
-3         0.86316      host adonis                           
 0    ssd  0.66599          osd.0        up   1.00000  1.00000
 1    ssd  0.09859          osd.1        up   1.00000  1.00000
 2    ssd  0.09859          osd.2        up   1.00000  1.00000
adonis:~/ceph/ceph/build% ceph osd crush dump
{
    ...
    "buckets": [
    ...
        {
            "id": -3,
            "name": "adonis",
            "type_id": 1,
            "type_name": "host",
            "weight": 56568,
            "alg": "straw2",
            "hash": "rjenkins1",
            "items": [
                {
                    "id": 0,
                    "weight": 43646,
                    "pos": 0
                },
                {
                    "id": 1,
                    "weight": 6461,
                    "pos": 1
                },
                {
                    "id": 2,
                    "weight": 6461,
                    "pos": 2
                }
            ]
        },
        {
            "id": -4,
            "name": "adonis~ssd",
            "type_id": 1,
            "type_name": "host",
            "weight": 19383,
            "alg": "straw2",
            "hash": "rjenkins1",
            "items": [
                {
                    "id": 0,
                    "weight": 6461,
                    "pos": 0
                },
                {
                    "id": 1,
                    "weight": 6461,
                    "pos": 1
                },
                {
                    "id": 2,
                    "weight": 6461,
                    "pos": 2
                }
            ]
        }
    ],

Note, the weight for the item id=0 is updated in "adonis" bucket but it is not updated in the "adonis~ssd" bucket.

Actions

Copy link

Updated by Mykola Golub over 3 years ago

File 48065.tar.gz 48065.tar.gz added

I eventually have got approve from the customer to publish their data.

I have attached a tarball that includes `ceph report`, `ceph osd dump`, `ceph osd df tree`, collected after two nodes (carl and hugo) had been redeployed. Also it includes an extract from the audit log listing the operations executed during the redeploy.

The osds were deployed with the initial weight 0 (osd_crush_initial_weight=0 in ceph.conf). And after all osds had been deployed their crush weight was updated to the target with `ceph osd crush set` command. From `ceph osd df tree` you can see that although it reports a non-zero weight for the osds they are not used. And in the crush map (can be found in `ceph report` output) you can see the expected non-zero weight for the osds in "carl" and "hugo" buckets and zero weight in "carl~hdd" and "hugo~hdd" buckets.

Eventually, after some other hosts were redeployed, the weights on "carl~hdd" and "hugo~hdd" buckets updated and these osds started to be used all right (though I don't claim that the second was a consequence of the first, because I was not able to reproduce the situation on a simple test setup).

ceph version 14.2.10-408-gdd63475ce0

Actions

Copy link

Updated by Mykola Golub over 3 years ago

Status changed from Need More Info to New

Actions

Copy link

Updated by Neha Ojha over 3 years ago

Priority changed from High to Urgent

Actions

Copy link

Updated by Neha Ojha over 3 years ago

Priority changed from Urgent to High

Actions

Copy link

Updated by Neha Ojha about 3 years ago

Assignee set to Sage Weil
Priority changed from High to Urgent

Actions

Copy link

Updated by Sage Weil about 3 years ago

Status changed from New to Fix Under Review
Backport set to pacific,octopus,nautilus
Pull request ID set to 39629

Actions

Copy link

Updated by Sage Weil about 3 years ago

BTW Mykola I would suggest using 'ceph osd crush reweight osd.N' (which works fine already) instead of the 'ceph osd crush set ...' syntax (which is harder to use and suffers from this bug)

Actions

Copy link

#10

Updated by Mykola Golub about 3 years ago

Sage Weil wrote:

BTW Mykola I would suggest using 'ceph osd crush reweight osd.N' (which works fine already) instead of the 'ceph osd crush set ...' syntax (which is harder to use and suffers from this bug)

Yes, that this what we recommended the customer, as I wrote in the comment #2. And that is why I was not sure it was a bug for that low level commands. Thank you for fixing this!

Actions

Copy link

#11

Updated by Sage Weil about 3 years ago

Status changed from Fix Under Review to Pending Backport

pacific backport: https://github.com/ceph/ceph/pull/39736

Actions

Copy link

#12

Updated by Backport Bot about 3 years ago

Copied to Backport #49528: pacific: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree added

Actions

Copy link

#13

Updated by Backport Bot about 3 years ago

Copied to Backport #49529: nautilus: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree added

Actions

Copy link

#14

Updated by Backport Bot about 3 years ago

Copied to Backport #49530: octopus: "ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree added

Actions

Copy link

#15

Updated by Loïc Dachary about 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #48065

"ceph osd crush set|reweight-subtree" commands do not set weight on device class subtree

Updated by Neha Ojha over 3 years ago

Updated by Mykola Golub over 3 years ago

Updated by Mykola Golub over 3 years ago

Updated by Mykola Golub over 3 years ago

Updated by Neha Ojha over 3 years ago

Updated by Neha Ojha over 3 years ago

Updated by Neha Ojha about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Mykola Golub about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Loïc Dachary about 3 years ago