Bug #57937
pg autoscaler of rgw pools doesn't work after creating otp pool
0%
Description
It's about the following my post to ceph-users ML.
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Q7A3TM6Z3XMRJPRBSHWGGACR653ICWXT/
The pg autoscale settings of pools about RGW were disabled in my clusters.
This problem was caused by overlapping roots. I created two device classes, "hdd" and "ssd".
I use the former for data and the other for bucket index.
After running `radosgw-admin mfa list --uid=rgw-admin-ops-user`, pg autoscaler of the rgw-related pools
was disabled due to overlapping roots. It's because `radosgw-admin mfa list` created a pool suffixed
by ".rgw.otp" and the root of its rule is not one of the roots of {hdd,ssd} device classes but the default root.
The possible fix is to create "*.rgw.otp" pool with the roots corresponding to device classes.
In addition, I'm glad if there is a workaround of my situation, e.g. changing the root of the crush rule
of "*.rgw.otp" pool. Since I just issued `radosgw-admin mfa list` to know the output and I'm not planning to use mfa,
I'm OK to delete "*.rgw.otp" pool if necessary.
Here is the detail (mostly the same as the above-mentioned post):
- software versions
- ceph: 16.2.10
- rook: 1.9.6
The result of `ceph osd pool ls`.
$ kubectl -n ceph-poc exec deploy/rook-ceph-tools -- ceph osd pool ls ceph-poc-block-pool ceph-poc-object-store-ssd-index.rgw.control ceph-poc-object-store-ssd-index.rgw.meta ceph-poc-object-store-ssd-index.rgw.log ceph-poc-object-store-ssd-index.rgw.buckets.index ceph-poc-object-store-ssd-index.rgw.buckets.non-ec .rgw.root ceph-poc-object-store-ssd-index.rgw.buckets.data ceph-poc-object-store-hdd-index.rgw.control ceph-poc-object-store-hdd-index.rgw.meta ceph-poc-object-store-hdd-index.rgw.log ceph-poc-object-store-hdd-index.rgw.buckets.index ceph-poc-object-store-hdd-index.rgw.buckets.non-ec ceph-poc-object-store-hdd-index.rgw.buckets.data device_health_metrics ceph-poc-object-store-ssd-index.rgw.otp
Some pools are missing in the result of `ceph osd pool autoscale-status`.
$ kubectl -n ceph-poc exec deploy/rook-ceph-tools -- ceph osd pool autoscale-status POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK ceph-poc-object-store-ssd-index.rgw.control 0 3.0 6144G 0.0000 1.0 8 on False ceph-poc-object-store-ssd-index.rgw.meta 3910 3.0 6144G 0.0000 1.0 8 on False ceph-poc-object-store-ssd-index.rgw.log 29328M 3.0 6144G 0.0140 1.0 8 on False ceph-poc-object-store-ssd-index.rgw.buckets.index 4042 3.0 6144G 0.0000 1.0 128 8 off False ceph-poc-object-store-ssd-index.rgw.buckets.non-ec 0 3.0 6144G 0.0000 1.0 8 on False .rgw.root 9592 3.0 6144G 0.0000 1.0 8 on False device_health_metrics 8890k 3.0 6144G 0.0000 1.0 32 on Fals
CRUSH rules are as follows.
$ kubectl -n ceph-poc exec deploy/rook-ceph-tools -- ceph osd crush tree --show-shadow ID CLASS WEIGHT TYPE NAME -3 ssd 6.00000 root default~ssd -9 ssd 2.00000 zone rack0~ssd -8 ssd 1.00000 host 10-69-0-10~ssd 14 ssd 1.00000 osd.14 -51 ssd 0 host 10-69-0-22~ssd ... ... -2 hdd 781.32037 root default~hdd -7 hdd 130.99301 zone rack0~hdd -6 hdd 0 host 10-69-0-10~hdd -50 hdd 14.55478 host 10-69-0-22~hdd 8 hdd 7.27739 osd.8 ... -1 787.32037 root default -5 132.99301 zone rack0 -4 1.00000 host 10-69-0-10 14 ssd 1.00000 osd.14 -49 14.55478 host 10-69-0-22 8 hdd 7.27739 osd.8 ...
The rule of `...rgw.otp` pools is "replicated_rule".
$ kubectl -n ceph-poc exec deploy/rook-ceph-tools -- ceph osd pool get ceph-poc-object-store-ssd-index.rgw.otp crush_rule crush_rule: replicated_rule
The root of this rule uses "default".
$ kubectl -n ceph-poc exec deploy/rook-ceph-tools -- ceph osd crush rule dump replicated_rule { "rule_id": 0, "rule_name": "replicated_rule", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] }
mgr daemon claims the overlapping root. Here is the log of mgr.
... 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.720+0000 7fe64d5d5700 0 [progress INFO root] Processing OSDMap change 175926..175926 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.445+0000 7fe64de96700 0 [pg_autoscaler WARNING root] pool 17 contains an overlapping root -1... skipping scaling 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.444+0000 7fe64de96700 0 [pg_autoscaler INFO root] Pool 'device_health_metrics' root_id -3 using 4.139842076256173e-06 of space, bias 1.0, pg target 0.000816928836381218 quantized to 32 (current 32) 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.444+0000 7fe64de96700 0 [pg_autoscaler INFO root] effective_target_ratio 0.0 0.0 0 6597069766656 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.442+0000 7fe64de96700 0 [pg_autoscaler WARNING root] pool 15 contains an overlapping root -2... skipping scaling 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.441+0000 7fe64de96700 0 [pg_autoscaler WARNING root] pool 14 contains an overlapping root -2... skipping scaling 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.440+0000 7fe64de96700 0 [pg_autoscaler WARNING root] pool 13 contains an overlapping root -2... skipping scaling 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.438+0000 7fe64de96700 0 [pg_autoscaler WARNING root] pool 12 contains an overlapping root -2... skipping scaling 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.437+0000 7fe64de96700 0 [pg_autoscaler WARNING root] pool 11 contains an overlapping root -2... skipping scaling 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.436+0000 7fe64de96700 0 [pg_autoscaler WARNING root] pool 10 contains an overlapping root -2... skipping scaling 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.434+0000 7fe64de96700 0 [pg_autoscaler WARNING root] pool 9 contains an overlapping root -2... skipping scaling 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.433+0000 7fe64de96700 0 [pg_autoscaler INFO root] Pool '.rgw.root' root_id -3 using 4.361936589702964e-09 of space, bias 1.0, pg target 8.607554870347182e-07 quantized to 8 (current 8) 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.433+0000 7fe64de96700 0 [pg_autoscaler INFO root] effective_target_ratio 0.0 0.0 0 6597069766656 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.432+0000 7fe64de96700 0 [pg_autoscaler INFO root] Pool 'ceph-poc-object-store-ssd-index.rgw.buckets.non-ec' root_id -3 using 0.0 of space, bias 1.0, pg target 0.0 quantized to 8 (current 8) 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.432+0000 7fe64de96700 0 [pg_autoscaler INFO root] effective_target_ratio 0.0 0.0 0 6597069766656 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.430+0000 7fe64de96700 0 [pg_autoscaler INFO root] Pool 'ceph-poc-object-store-ssd-index.rgw.buckets.index' root_id -3 using 1.838088792283088e-09 of space, bias 1.0, pg target 3.627161883438627e-07 quantized to 8 (current 128) 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.430+0000 7fe64de96700 0 [pg_autoscaler INFO root] effective_target_ratio 0.0 0.0 0 6597069766656 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.429+0000 7fe64de96700 0 [pg_autoscaler INFO root] Pool 'ceph-poc-object-store-ssd-index.rgw.log' root_id -3 using 0.013985070909257047 of space, bias 1.0, pg target 2.7970141818514094 quantized to 8 (current 8) 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.429+0000 7fe64de96700 0 [pg_autoscaler INFO root] effective_target_ratio 0.0 0.0 0 6597069766656 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.427+0000 7fe64de96700 0 [pg_autoscaler INFO root] Pool 'ceph-poc-object-store-ssd-index.rgw.meta' root_id -3 using 1.7780621419660747e-09 of space, bias 1.0, pg target 3.5561242839321494e-07 quantized to 8 (current 8) 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.427+0000 7fe64de96700 0 [pg_autoscaler INFO root] effective_target_ratio 0.0 0.0 0 6597069766656 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.426+0000 7fe64de96700 0 [pg_autoscaler INFO root] Pool 'ceph-poc-object-store-ssd-index.rgw.control' root_id -3 using 0.0 of space, bias 1.0, pg target 0.0 quantized to 8 (current 8) 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.426+0000 7fe64de96700 0 [pg_autoscaler INFO root] effective_target_ratio 0.0 0.0 0 6597069766656 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.425+0000 7fe64de96700 0 [pg_autoscaler WARNING root] pool 2 contains an overlapping root -2... skipping scaling 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.421+0000 7fe64de96700 0 [pg_autoscaler ERROR root] pool 17 has overlapping roots: {-2, -1} 2022-10-24T10:42:31+09:00 debug 2022-10-24T01:42:31.356+0000 7fe64de96700 0 [pg_autoscaler INFO root] _maybe_adjust ...
History
#1 Updated by Casey Bodley 3 months ago
- Project changed from rgw to RADOS
#2 Updated by Kamoltat (Junior) Sirivadhna 3 months ago
- Assignee set to Kamoltat (Junior) Sirivadhna
#3 Updated by Satoru Takeuchi 3 months ago
Is there any updates? Please let me know if I can do something.
#4 Updated by Satoru Takeuchi about 2 months ago
This problem was fixed in Rook v1.10.2. I updated my Rook/Ceph cluster to v1.10.5 and confirmed that this problem disappeared.
Please close this ticket.
#5 Updated by Radoslaw Zarzynski about 1 month ago
- Status changed from New to Rejected
Not a Ceph issue per the last comment.