Bug #43124
closedProbably legal crush rules cause upmaps to be cleaned
0%
Description
I've seen multiple user sites with crush rules for EC pools which will trigger the verify_upmap() to detect an error. At that point the clean upmaps mechanism will purge all upmaps from their EC pool PGs.
Pull request https://github.com/ceph/ceph/pull/31131
commit 712a39e5c9d9848f618ad55a768103d84c0a460f "crush: remove invalid upmap items”
{
"rule_id": 5,
"rule_name": "ecrule",
"ruleset": 5,
"type": 3,
"min_size": 1,
"max_size": 15,
"steps": [
{
"op": "take",
"item": -417,
"item_name": "default~ssd"
},
{
"op": "choose_firstn",
"num": 4,
"type": "rack"
},
{
"op": "chooseleaf_indep",
"num": 3, <<<<<< This triggers the problem
"type": "host"
},
{
"op": "emit"
}
]
}
I added some extra logging information. This is what happens on every upmap for a PG in this pool. It triggers the removal of the upmap.
2019-12-03 18:57:33.919715 7f57c70a63c0 10 verify_upmap rule_id 5 pool_size 11
2019-12-03 18:57:33.919717 7f57c70a63c0 10 verify_upmap step 0 op 1 arg1 -417 arg2 0
2019-12-03 18:57:33.919718 7f57c70a63c0 10 verify_upmap step 1 op 2 arg1 4 arg2 3
2019-12-03 18:57:33.919754 7f57c70a63c0 10 verify_upmap step 2 op 7 arg1 3 arg2 1
2019-12-03 18:57:33.919905 7f57c70a63c0 10 verify_upmap osds_by_parent {-633=2190,-618=2084,-582=1775,-579=1754,-561=1607,-468=2580,-438=2374,-432=2331,-72=588,-60=582,-3=20}
2019-12-03 18:57:33.919937 7f57c70a63c0 -1 verify_upmap expected 3 items in bucket -417 real 11
2019-12-03 18:57:33.919939 7f57c70a63c0 0 check_pg_upmaps verify_upmap of pg 7.23 returning -22
Updated by David Zafman over 4 years ago
We are reverting the original pull request which changed verify_upmaps(): https://github.com/ceph/ceph/pull/31131
This tracker could be used to track a re-implementation of that change.