Bug #17138
closedcrush: inconsistent ruleset/ruled_id are difficult to figure out
Added by Michael Hackett over 7 years ago. Updated over 6 years ago.
0%
Description
Attempting to unprotect a Ceph RBD snapshot that does not have any children with 'rbd -p <pool name> snap unprotect' causes the command to hang and the snapshot remains in a protected state so that it cannot be deleted. The snapshot does not have any children.
This issue is seen when a pool that was created after the CRUSH map was manually edited to remove a ruleset has an invalid rule_id that does not exist due to the manual edit of the CRUSH map. This unprotect likely fails because snap_unprotect() in librbd/internal.cc: loops over all existing pools to look for children, when it reaches the errorerd pool with invalid rule_id we get stuck in loop.
Updated by Michael Hackett over 7 years ago
I was successfully able to reproduce the issue when manually manipulating the CRUSH map:
- Details:
[admin@admin ceph-config]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) [admin@admin ceph-config]$ uname -r 3.10.0-327.el7.x86_64 [admin@admin ceph-config]$ ceph -v ceph version 0.94.5-14.el7cp (ff6967ce0543fb0b60fe23f10da7b4a35bf046a0)
- Cluster status prior to creation of new pool and rulesets:
[admin@admin tmp]$ ceph -s cluster f41252bd-9952-4ca0-9808-297500ef3385 health HEALTH_OK monmap e4: 3 mons at {mon1=10.18.49.60:6789/0,mon2=10.18.49.39:6789/0,mon3=10.18.49.62:6789/0} election epoch 30, quorum 0,1,2 mon2,mon1,mon3 osdmap e664: 12 osds: 12 up, 12 in pgmap v20668: 584 pgs, 10 pools, 1816 bytes data, 51 objects 603 MB used, 12164 GB / 12164 GB avail 584 active+clean [admin@admin tmp]$ ceph osd dump |grep pool pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 123 flags hashpspool stripe_width 0 pool 3 '.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 349 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 4 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 641 flags hashpspool stripe_width 0 pool 5 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 643 flags hashpspool stripe_width 0 pool 6 '.rgw' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 646 flags hashpspool stripe_width 0 pool 7 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 647 flags hashpspool stripe_width 0 pool 8 '.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 648 flags hashpspool stripe_width 0 pool 9 '.users' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 651 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 10 '.users.swift' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags hashpspool stripe_width 0 pool 11 '.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 owner 18446744073709551615 flags hashpspool stripe_width 0 [admin@admin tmp]$ ceph osd crush rule ls [ "replicated_ruleset" ] [admin@admin tmp]$ ceph osd crush rule dump [ { "rule_id": 0, "rule_name": "replicated_ruleset", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "set_choose_tries", "num": 100 }, { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } ]
- Creation of two new rulesets:
[admin@admin tmp]$ ceph osd crush rule create-simple replicated_ruleset1 default host firstn [admin@admin tmp]$ ceph osd crush rule create-simple replicated_ruleset2 default host firstn [admin@admin tmp]$ ceph osd crush rule ls [ "replicated_ruleset", "replicated_ruleset1", "replicated_ruleset2" ] [admin@admin tmp]$ rados lspools rbd .log .rgw.root .rgw.control .rgw .rgw.gc .users.uid .users .users.swift .rgw.buckets.index [admin@admin tmp]$ rados df pool name KB objects clones degraded unfound rd rd KB wr wr KB .log 0 0 0 0 0 0 0 0 0 .rgw 1 2 0 0 0 3 2 5 2 .rgw.buckets.index 0 1 0 0 0 5 4 1 0 .rgw.control 0 8 0 0 0 0 0 0 0 .rgw.gc 0 32 0 0 0 16128 16096 10752 0 .rgw.root 1 3 0 0 0 54 36 3 3 .users 1 2 0 0 0 2 1 2 2 .users.swift 1 1 0 0 0 2 1 1 1 .users.uid 1 2 0 0 0 32 28 11 3 rbd 0 0 0 0 0 0 0 2 1 total used 617956 51 total avail 12755159996 total space 12755777952 [admin@admin tmp]$ ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 12164G 12164G 603M 0 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 2 0 0 4054G 0 .log 3 0 0 4054G 0 .rgw.root 4 848 0 4054G 3 .rgw.control 5 0 0 4054G 8 .rgw 6 364 0 4054G 2 .rgw.gc 7 0 0 4054G 32 .users.uid 8 568 0 4054G 2 .users 9 24 0 4054G 2 .users.swift 10 12 0 4054G 1 .rgw.buckets.index 11 0 0 4054G 1
- Manually manipulating the CRUSH map to remove replicated_ruleset1
[admin@osd1 tmp]$ ceph osd getcrushmap -o test_crushmap
got crush map from osdmap epoch 666
Manually removed replicated_ruleset1 from CRUSH map.
[admin@osd1 tmp]$ crushtool -c test_crushmap_d -o test_crushmap_c [admin@osd1 tmp]$ ceph osd setcrushmap -i test_crushmap_c set crush map [admin@admin tmp]$ ceph osd crush rule ls [ "replicated_ruleset", "replicated_ruleset2" ] [admin@admin tmp]$ ceph osd crush rule dump [ { "rule_id": 0, "rule_name": "replicated_ruleset", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "set_choose_tries", "num": 100 }, { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] }, { "rule_id": 1, "rule_name": "replicated_ruleset2", "ruleset": 2, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } ]
- Create test pool with ruleset defined as replicated_ruleset2
[admin@admin tmp]$ ceph osd pool create test 16 16 replicated replicated_ruleset2 pool 'test' created
- Pool 13 'test' is seen with crush_ruleset 1, which doesn't exist
[admin@admin tmp]$ ceph osd dump |grep pool pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 123 flags hashpspool stripe_width 0 pool 3 '.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 349 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 4 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 641 flags hashpspool stripe_width 0 pool 5 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 643 flags hashpspool stripe_width 0 pool 6 '.rgw' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 646 flags hashpspool stripe_width 0 pool 7 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 647 flags hashpspool stripe_width 0 pool 8 '.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 648 flags hashpspool stripe_width 0 pool 9 '.users' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 651 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 10 '.users.swift' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags hashpspool stripe_width 0 pool 11 '.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 13 'test' replicated size 3 min_size 2 crush_ruleset 1 object_hash rjenkins pg_num 16 pgp_num 16 last_change 668 flags hashpspool stripe_width 0
- Ceph df and rados df don't show pool 'test' as there is no on-disk details about the pool.
[admin@admin tmp]$ ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 12164G 12164G 603M 0 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 2 0 0 4054G 0 .log 3 0 0 4054G 0 .rgw.root 4 848 0 4054G 3 .rgw.control 5 0 0 4054G 8 .rgw 6 364 0 4054G 2 .rgw.gc 7 0 0 4054G 32 .users.uid 8 568 0 4054G 2 .users 9 24 0 4054G 2 .users.swift 10 12 0 4054G 1 .rgw.buckets.index 11 0 0 4054G 1 [admin@admin tmp]$ rados df pool name KB objects clones degraded unfound rd rd KB wr wr KB .log 0 0 0 0 0 0 0 0 0 .rgw 1 2 0 0 0 3 2 5 2 .rgw.buckets.index 0 1 0 0 0 5 4 1 0 .rgw.control 0 8 0 0 0 0 0 0 0 .rgw.gc 0 32 0 0 0 16128 16096 10752 0 .rgw.root 1 3 0 0 0 54 36 3 3 .users 1 2 0 0 0 2 1 2 2 .users.swift 1 1 0 0 0 2 1 1 1 .users.uid 1 2 0 0 0 32 28 11 3 rbd 0 0 0 0 0 0 0 2 1 total used 618380 51 total avail 12755159572 total space 12755777952
- rados lspools does show the pool 'test' as this info is pulled from the monitor
[admin@admin tmp]$ rados lspools rbd .log .rgw.root .rgw.control .rgw .rgw.gc .users.uid .users .users.swift .rgw.buckets.index test
- You can see in the ceph s the new pool PG's were actually never even created as I still have 584 (the 16 PG's were not created). You can also see that we only show 10 pools in the status, not 11.
[admin@admin tmp]$ ceph -s cluster f41252bd-9952-4ca0-9808-297500ef3385 health HEALTH_OK monmap e4: 3 mons at {mon1=10.18.49.60:6789/0,mon2=10.18.49.39:6789/0,mon3=10.18.49.62:6789/0} election epoch 30, quorum 0,1,2 mon2,mon1,mon3 osdmap e668: 12 osds: 12 up, 12 in pgmap v20797: 584 pgs, 10 pools, 1816 bytes data, 51 objects 604 MB used, 12164 GB / 12164 GB avail 584 active+clean
- This issue is ONLY seen when manually editing the CRUSH map to remove a ruleset, I could not reproduce when using 'ceph osd crush rule rm' to remove the ruleset.
Updated by Loïc Dachary over 7 years ago
I think the problem is that
"rule_id": 1, "rule_name": "replicated_ruleset2", "ruleset": 2,
was manually edited and created a ruleset where the rule_id does not match the ruleset. This is not forbidden because prior to https://github.com/ceph/ceph/pull/2288 they could be different and some people / scripts may rely on that. If someone manually edits the crushmap, it seems reasonable to assume that (s)he is also able to figure out why a pool does not get created when the crushmap does not provide the expected rules / mappings.
What do you think ?
Updated by Loïc Dachary over 7 years ago
- Project changed from rbd to Ceph
- Subject changed from When attempting to unprotect a RBD snapshot the command hangs to crush: inconsistent ruleset/ruled_id are difficult to figure out
Updated by Josh Durgin almost 7 years ago
Some work in progress on this here: https://github.com/ceph/ceph/pull/13683
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category set to Administration/Usability
- Component(RADOS) CRUSH added