Bug #17138: crush: inconsistent ruleset/ruled_id are difficult to figure out - RADOS - Ceph

Actions

Copy link

Bug #17138

closed

crush: inconsistent ruleset/ruled_id are difficult to figure out

Added by Michael Hackett over 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Low

Assignee:

Category:

Administration/Usability

Target version:

% Done:

Source:

Support

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v0.94.6

ceph-qa-suite:

rbd

Component(RADOS):

CRUSH

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Attempting to unprotect a Ceph RBD snapshot that does not have any children with 'rbd -p <pool name> snap unprotect' causes the command to hang and the snapshot remains in a protected state so that it cannot be deleted. The snapshot does not have any children.
This issue is seen when a pool that was created after the CRUSH map was manually edited to remove a ruleset has an invalid rule_id that does not exist due to the manual edit of the CRUSH map. This unprotect likely fails because snap_unprotect() in librbd/internal.cc: loops over all existing pools to look for children, when it reaches the errorerd pool with invalid rule_id we get stuck in loop.

Actions

Copy link

Updated by Michael Hackett over 7 years ago

I was successfully able to reproduce the issue when manually manipulating the CRUSH map:

- Details:

[admin@admin ceph-config]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)

[admin@admin ceph-config]$ uname -r
3.10.0-327.el7.x86_64

[admin@admin ceph-config]$ ceph -v
ceph version 0.94.5-14.el7cp (ff6967ce0543fb0b60fe23f10da7b4a35bf046a0)

- Cluster status prior to creation of new pool and rulesets:

[admin@admin tmp]$ ceph -s
    cluster f41252bd-9952-4ca0-9808-297500ef3385
     health HEALTH_OK
     monmap e4: 3 mons at {mon1=10.18.49.60:6789/0,mon2=10.18.49.39:6789/0,mon3=10.18.49.62:6789/0}
            election epoch 30, quorum 0,1,2 mon2,mon1,mon3
     osdmap e664: 12 osds: 12 up, 12 in
      pgmap v20668: 584 pgs, 10 pools, 1816 bytes data, 51 objects
            603 MB used, 12164 GB / 12164 GB avail
                 584 active+clean

[admin@admin tmp]$ ceph osd dump |grep pool
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 123 flags hashpspool stripe_width 0
pool 3 '.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 349 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 4 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 641 flags hashpspool stripe_width 0
pool 5 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 643 flags hashpspool stripe_width 0
pool 6 '.rgw' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 646 flags hashpspool stripe_width 0
pool 7 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 647 flags hashpspool stripe_width 0
pool 8 '.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 648 flags hashpspool stripe_width 0
pool 9 '.users' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 651 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 10 '.users.swift' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags hashpspool stripe_width 0
pool 11 '.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 owner 18446744073709551615 flags hashpspool stripe_width 0

[admin@admin tmp]$ ceph osd crush rule ls
[
    "replicated_ruleset" 
]

[admin@admin tmp]$ ceph osd crush rule dump
[
    {
        "rule_id": 0,
        "rule_name": "replicated_ruleset",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "set_choose_tries",
                "num": 100
            },
            {
                "op": "take",
                "item": -1,
                "item_name": "default" 
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host" 
            },
            {
                "op": "emit" 
            }
        ]
    }
]

- Creation of two new rulesets:

[admin@admin tmp]$ ceph osd crush rule create-simple replicated_ruleset1 default host firstn
[admin@admin tmp]$ ceph osd crush rule create-simple replicated_ruleset2 default host firstn

[admin@admin tmp]$ ceph osd crush rule ls
[
    "replicated_ruleset",
    "replicated_ruleset1",
    "replicated_ruleset2" 
]

[admin@admin tmp]$ rados lspools
rbd
.log
.rgw.root
.rgw.control
.rgw
.rgw.gc
.users.uid
.users
.users.swift
.rgw.buckets.index

[admin@admin tmp]$ rados df
pool name                 KB      objects       clones     degraded      unfound           rd        rd KB           wr        wr KB
.log                       0            0            0            0           0            0            0            0            0
.rgw                       1            2            0            0           0            3            2            5            2
.rgw.buckets.index            0            1            0            0           0            5            4            1            0
.rgw.control               0            8            0            0           0            0            0            0            0
.rgw.gc                    0           32            0            0           0        16128        16096        10752            0
.rgw.root                  1            3            0            0           0           54           36            3            3
.users                     1            2            0            0           0            2            1            2            2
.users.swift               1            1            0            0           0            2            1            1            1
.users.uid                 1            2            0            0           0           32           28           11            3
rbd                        0            0            0            0           0            0            0            2            1
  total used          617956           51
  total avail    12755159996
  total space    12755777952

[admin@admin tmp]$ ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    12164G     12164G         603M             0
POOLS:
    NAME                   ID     USED     %USED     MAX AVAIL     OBJECTS
    rbd                    2         0         0         4054G           0
    .log                   3         0         0         4054G           0
    .rgw.root              4       848         0         4054G           3
    .rgw.control           5         0         0         4054G           8
    .rgw                   6       364         0         4054G           2
    .rgw.gc                7         0         0         4054G          32
    .users.uid             8       568         0         4054G           2
    .users                 9        24         0         4054G           2
    .users.swift           10       12         0         4054G           1
    .rgw.buckets.index     11        0         0         4054G           1

- Manually manipulating the CRUSH map to remove replicated_ruleset1

[admin@osd1 tmp]$ ceph osd getcrushmap -o test_crushmap

got crush map from osdmap epoch 666

Manually removed replicated_ruleset1 from CRUSH map.

[admin@osd1 tmp]$ crushtool -c test_crushmap_d -o test_crushmap_c

[admin@osd1 tmp]$ ceph osd setcrushmap -i test_crushmap_c
set crush map

[admin@admin tmp]$ ceph osd crush rule ls
[
    "replicated_ruleset",
    "replicated_ruleset2" 
]

[admin@admin tmp]$ ceph osd crush rule dump
[
    {
        "rule_id": 0,
        "rule_name": "replicated_ruleset",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "set_choose_tries",
                "num": 100
            },
            {
                "op": "take",
                "item": -1,
                "item_name": "default" 
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host" 
            },
            {
                "op": "emit" 
            }
        ]
    },
    {
        "rule_id": 1,
        "rule_name": "replicated_ruleset2",
        "ruleset": 2,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default" 
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host" 
            },
            {
                "op": "emit" 
            }
        ]
    }
]

- Create test pool with ruleset defined as replicated_ruleset2

[admin@admin tmp]$ ceph osd pool create test 16 16 replicated replicated_ruleset2
pool 'test' created

- Pool 13 'test' is seen with crush_ruleset 1, which doesn't exist

[admin@admin tmp]$ ceph osd dump |grep pool
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 123 flags hashpspool stripe_width 0
pool 3 '.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 349 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 4 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 641 flags hashpspool stripe_width 0
pool 5 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 643 flags hashpspool stripe_width 0
pool 6 '.rgw' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 646 flags hashpspool stripe_width 0
pool 7 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 647 flags hashpspool stripe_width 0
pool 8 '.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 648 flags hashpspool stripe_width 0
pool 9 '.users' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 651 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 10 '.users.swift' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags hashpspool stripe_width 0
pool 11 '.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 13 'test' replicated size 3 min_size 2 crush_ruleset 1 object_hash rjenkins pg_num 16 pgp_num 16 last_change 668 flags hashpspool stripe_width 0

- Ceph df and rados df don't show pool 'test' as there is no on-disk details about the pool.

[admin@admin tmp]$ ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED 
    12164G     12164G         603M             0 
POOLS:
    NAME                   ID     USED     %USED     MAX AVAIL     OBJECTS 
    rbd                    2         0         0         4054G           0 
    .log                   3         0         0         4054G           0 
    .rgw.root              4       848         0         4054G           3 
    .rgw.control           5         0         0         4054G           8 
    .rgw                   6       364         0         4054G           2 
    .rgw.gc                7         0         0         4054G          32 
    .users.uid             8       568         0         4054G           2 
    .users                 9        24         0         4054G           2 
    .users.swift           10       12         0         4054G           1 
    .rgw.buckets.index     11        0         0         4054G           1 

[admin@admin tmp]$ rados df
pool name                 KB      objects       clones     degraded      unfound           rd        rd KB           wr        wr KB
.log                       0            0            0            0           0            0            0            0            0
.rgw                       1            2            0            0           0            3            2            5            2
.rgw.buckets.index            0            1            0            0           0            5            4            1            0
.rgw.control               0            8            0            0           0            0            0            0            0
.rgw.gc                    0           32            0            0           0        16128        16096        10752            0
.rgw.root                  1            3            0            0           0           54           36            3            3
.users                     1            2            0            0           0            2            1            2            2
.users.swift               1            1            0            0           0            2            1            1            1
.users.uid                 1            2            0            0           0           32           28           11            3
rbd                        0            0            0            0           0            0            0            2            1
  total used          618380           51
  total avail    12755159572
  total space    12755777952

- rados lspools does show the pool 'test' as this info is pulled from the monitor

[admin@admin tmp]$ rados lspools
rbd
.log
.rgw.root
.rgw.control
.rgw
.rgw.gc
.users.uid
.users
.users.swift
.rgw.buckets.index
test

- You can see in the ceph s the new pool PG's were actually never even created as I still have 584 (the 16 PG's were not created). You can also see that we only show 10 pools in the status, not 11.

[admin@admin tmp]$ ceph -s
    cluster f41252bd-9952-4ca0-9808-297500ef3385
     health HEALTH_OK
     monmap e4: 3 mons at {mon1=10.18.49.60:6789/0,mon2=10.18.49.39:6789/0,mon3=10.18.49.62:6789/0}
            election epoch 30, quorum 0,1,2 mon2,mon1,mon3
     osdmap e668: 12 osds: 12 up, 12 in
      pgmap v20797: 584 pgs, 10 pools, 1816 bytes data, 51 objects
            604 MB used, 12164 GB / 12164 GB avail
                 584 active+clean

- This issue is ONLY seen when manually editing the CRUSH map to remove a ruleset, I could not reproduce when using 'ceph osd crush rule rm' to remove the ruleset.

Actions

Copy link

Updated by Jason Dillaman over 7 years ago

Priority changed from Normal to Low

Actions

Copy link

Updated by Loïc Dachary over 7 years ago

I think the problem is that

        "rule_id": 1,
        "rule_name": "replicated_ruleset2",
        "ruleset": 2,

was manually edited and created a ruleset where the rule_id does not match the ruleset. This is not forbidden because prior to https://github.com/ceph/ceph/pull/2288 they could be different and some people / scripts may rely on that. If someone manually edits the crushmap, it seems reasonable to assume that (s)he is also able to figure out why a pool does not get created when the crushmap does not provide the expected rules / mappings.

What do you think ?

Actions

Copy link

Updated by Loïc Dachary over 7 years ago

Project changed from rbd to Ceph
Subject changed from When attempting to unprotect a RBD snapshot the command hangs to crush: inconsistent ruleset/ruled_id are difficult to figure out