Bug #42111
max_size from crushmap ignored when increasing size on pool
% Done:
0%
Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
CRUSH
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hello,
when the crushmap-rule has "max_size=2" for example, and you set size=3 on the pool, all I/O stops without error, `ceph -s` still reports everything fine.
When setting size back to 2 on the pool, I/O comes back again with rebalancing operations.
When trying to create a pool with the same crush-rule, an error is thrown, which reports correctly that max_size is too small for this operation.
kind regards
Related issues
History
#1 Updated by Greg Farnum over 4 years ago
- Project changed from Ceph to RADOS
- Component(RADOS) CRUSH added
#2 Updated by Vikhyat Umrao over 4 years ago
- Status changed from New to In Progress
- Assignee set to Vikhyat Umrao
#3 Updated by Vikhyat Umrao over 4 years ago
- I was able to reproduce in the master branch in `vstart` cluster.
# rules rule replicated_rule { id 0 type replicated min_size 1 max_size 2 step take default step choose firstn 0 type osd step emit } [root@3cfea3e7c7e2 build]# bin/ceph osd pool create testpool --pg_num 8 --size 2 pool 'testpool' created [root@3cfea3e7c7e2 build]# bin/rados bench -p testpool 300 write --nocleanup hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 300 seconds or 0 objects Object prefix: benchmark_data_3cfea3e7c7e2_15620 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 16 0 0 0 - 0 2 16 26 10 19.9985 20 1.94489 1.62174 3 16 41 25 33.3302 60 1.12973 1.63792 4 16 53 37 36.9962 48 0.984256 1.49435 5 16 67 51 40.7958 56 0.99228 1.41103 6 16 80 64 42.6621 52 1.19066 1.36211 7 16 89 73 41.7098 36 1.32314 1.33608 8 16 104 88 43.9951 60 1.3878 1.35979 9 16 114 98 43.5507 40 1.21856 1.3341 10 16 131 115 45.995 68 0.766124 1.31445 11 16 150 134 48.7219 76 0.514468 1.28399 12 16 151 135 44.9951 4 1.20535 1.28341 13 16 167 151 46.4565 64 0.74922 1.32088 14 16 175 159 45.4237 32 0.952676 1.31934 15 16 188 172 45.8618 52 0.975865 1.31626 16 16 204 188 46.995 64 1.22408 1.31506 17 16 212 196 46.1127 32 1.84238 1.30969 18 16 226 210 46.6617 56 2.08285 1.33022 19 16 239 223 46.9424 52 1.13944 1.33001 2019-10-04T16:00:57.486199+0000 min lat: 0.514468 max lat: 2.86502 avg lat: 1.32044 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 20 16 250 234 46.795 44 0.978277 1.32044 21 16 264 248 47.233 56 0.607187 1.32294 22 16 271 255 46.3586 28 1.30823 1.32669 23 16 284 268 46.6037 52 1.15358 1.33424 24 16 297 281 46.8282 52 1.22483 1.33498 25 16 307 291 46.5549 40 1.0493 1.33528 26 16 322 306 47.0717 60 0.928293 1.33525 27 16 333 317 46.9579 44 1.51883 1.33139 28 16 344 328 46.8521 44 1.67149 1.33666 29 16 357 341 47.0294 52 1.03467 1.33586 30 16 368 352 46.9282 44 1.39664 1.33083 31 16 380 364 46.9626 48 1.46466 1.33254 [root@3cfea3e7c7e2 build]# bin/ceph osd pool set testpool size 3 set pool 3 size to 3 32 16 380 364 45.495 0 - 1.33254 33 16 380 364 44.1163 0 - 1.33254 34 16 380 364 42.8188 0 - 1.33254 35 16 380 364 41.5954 0 - 1.33254 36 16 380 364 40.44 0 - 1.33254 37 16 380 364 39.347 0 - 1.33254 38 16 380 364 38.3115 0 - 1.33254 39 16 380 364 37.3292 0 - 1.33254 2019-10-04T16:01:17.488557+0000 min lat: 0.514468 max lat: 2.86502 avg lat: 1.33254 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 40 16 380 364 36.3959 0 - 1.33254 41 16 380 364 35.5082 0 - 1.33254 42 16 380 364 34.6628 0 - 1.33254 43 16 380 364 33.8566 0 - 1.33254 44 16 380 364 33.0872 0 - 1.33254 45 16 380 364 32.3519 0 - 1.33254 46 16 380 364 31.6486 0 - 1.33254 47 16 380 364 30.9752 0 - 1.33254 48 16 380 364 30.3299 0 - 1.33254 49 16 380 364 29.7109 0 - 1.33254 50 16 380 364 29.1167 0 - 1.33254 51 16 380 364 28.5458 0 - 1.33254 52 16 380 364 27.9968 0 - 1.33254 53 16 380 364 27.4686 0 - 1.33254 54 16 380 364 26.9599 0 - 1.33254 55 16 380 364 26.4698 0 - 1.33254 56 16 380 364 25.9971 0 - 1.33254 57 16 380 364 25.541 0 - 1.33254 58 16 380 364 25.1006 0 - 1.33254 59 16 380 364 24.6752 0 - 1.33254 2019-10-04T16:01:37.490856+0000 min lat: 0.514468 max lat: 2.86502 avg lat: 1.33254 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 60 16 380 364 24.2639 0 - 1.33254 61 16 380 364 23.8662 0 - 1.33254 62 16 380 364 23.4812 0 - 1.33254 63 16 380 364 23.1085 0 - 1.33254 64 16 380 364 22.7474 0 - 1.33254 65 16 380 364 22.3975 0 - 1.33254 66 16 380 364 22.0581 0 - 1.33254 67 16 380 364 21.7289 0 - 1.33254 68 16 380 364 21.4093 0 - 1.33254 69 16 380 364 21.0991 0 - 1.33254 70 16 380 364 20.7977 0 - 1.33254 71 16 380 364 20.5047 0 - 1.33254 72 16 380 364 20.2199 0 - 1.33254 73 16 380 364 19.943 0 - 1.33254 74 16 380 364 19.6735 0 - 1.33254 75 16 380 364 19.4111 0 - 1.33254 76 16 380 364 19.1557 0 - 1.33254 77 16 380 364 18.907 0 - 1.33254 78 16 380 364 18.6646 0 - 1.33254 79 16 380 364 18.4283 0 - 1.33254 2019-10-04T16:01:57.493135+0000 min lat: 0.514468 max lat: 2.86502 avg lat: 1.33254 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 80 16 380 364 18.1979 0 - 1.33254 81 16 380 364 17.9733 0 - 1.33254 82 16 380 364 17.7541 0 - 1.33254 83 16 380 364 17.5402 0 - 1.33254 84 16 380 364 17.3314 0 - 1.33254 85 16 380 364 17.1275 0 - 1.33254 86 16 380 364 16.9283 0 - 1.33254 87 16 380 364 16.7337 0 - 1.33254 [root@3cfea3e7c7e2 build]# bin/ceph -s cluster: id: 60d38ef0-8889-4fdc-bee7-91436a098a70 health: HEALTH_WARN 1 pool(s) do not have an application enabled services: mon: 3 daemons, quorum a,b,c (age 12h) mgr: x(active, since 12h) mds: a:1 {0=a=up:active} 2 up:standby osd: 3 osds: 3 up (since 12h), 3 in (since 12h) task status: scrub status: mds.0: idle data: pools: 3 pools, 24 pgs objects: 350 objects, 1.3 GiB usage: 9.0 GiB used, 3.0 TiB / 3.0 TiB avail pgs: 24 active+clean [root@3cfea3e7c7e2 build]# bin/ceph osd pool set testpool size 2 set pool 3 size to 2 88 16 389 373 16.9526 0.631579 57.6862 2.69472 89 16 391 375 16.852 8 58.3044 2.99112 90 16 405 389 17.2869 56 58.9121 3.67076 91 16 417 401 17.6244 48 1.49254 3.60002 92 16 433 417 18.1284 64 1.34342 3.50895 93 16 449 433 18.6216 64 0.963417 3.42239 94 16 453 437 18.5936 16 1.24534 3.40225 95 16 467 451 18.9873 56 1.24581 3.34853 96 16 480 464 19.3311 52 1.14728 3.28931 97 16 489 473 19.5029 36 1.39027 3.25261 98 16 495 479 19.5488 24 1.96242 3.23465 99 16 505 489 19.7553 40 1.91222 3.20803 2019-10-04T16:02:17.495392+0000 min lat: 0.514468 max lat: 58.9121 avg lat: 3.15763 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 100 16 519 503 20.1177 56 0.721501 3.15763 101 16 528 512 20.2749 36 1.09394 3.12528 102 16 536 520 20.3898 32 1.79339 3.10347 103 16 548 532 20.6578 48 1.42166 3.07108 104 16 565 549 21.113 68 0.760797 3.00833 105 16 573 557 21.2166 32 1.32553 2.98381 106 16 590 574 21.6579 68 0.788553 2.93913 107 16 600 584 21.8293 40 1.30258 2.91203 108 16 612 596 22.0716 48 0.8999 2.88111 109 16 624 608 22.3094 48 1.69133 2.85472 110 16 637 621 22.5792 52 0.998792 2.8219 111 16 651 635 22.8791 56 1.46139 2.78935 112 16 658 642 22.9248 28 0.89574 2.77032 113 16 667 651 23.0405 36 1.50485 2.75217 114 16 682 666 23.3646 60 1.27387 2.72094 115 16 695 679 23.6136 52 1.18959 2.69588 ^C [root@3cfea3e7c7e2 build]#
#4 Updated by Vikhyat Umrao over 4 years ago
- Pull request ID set to 30723
#5 Updated by Vikhyat Umrao over 4 years ago
- With fix:
# bin/ceph-mon -v ceph version v15.0.0-5750-gdc473ec733 (dc473ec73336ca4c55bcac5c43496d63ca43f0d1) octopus (dev) # bin/ceph osd pool set testpool size 3 Error EINVAL: pool size is bigger than the crush rule max size
#6 Updated by Vikhyat Umrao over 4 years ago
- Status changed from In Progress to Fix Under Review
#7 Updated by Vikhyat Umrao over 4 years ago
- Backport set to nautilus
#8 Updated by Vikhyat Umrao over 4 years ago
- Status changed from Fix Under Review to 17
#9 Updated by Kefu Chai over 4 years ago
- Status changed from 17 to Pending Backport
#10 Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42326: nautilus: max_size from crushmap ignored when increasing size on pool added
#11 Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".