Project

General

Profile

Bug #42111

max_size from crushmap ignored when increasing size on pool

Added by Alex Masteo over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
CRUSH
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

when the crushmap-rule has "max_size=2" for example, and you set size=3 on the pool, all I/O stops without error, `ceph -s` still reports everything fine.
When setting size back to 2 on the pool, I/O comes back again with rebalancing operations.

When trying to create a pool with the same crush-rule, an error is thrown, which reports correctly that max_size is too small for this operation.

kind regards


Related issues

Copied to RADOS - Backport #42326: nautilus: max_size from crushmap ignored when increasing size on pool Resolved

History

#1 Updated by Greg Farnum over 4 years ago

  • Project changed from Ceph to RADOS
  • Component(RADOS) CRUSH added

#2 Updated by Vikhyat Umrao over 4 years ago

  • Status changed from New to In Progress
  • Assignee set to Vikhyat Umrao

#3 Updated by Vikhyat Umrao over 4 years ago

- I was able to reproduce in the master branch in `vstart` cluster.


# rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 2
        step take default
        step choose firstn 0 type osd
        step emit
}

[root@3cfea3e7c7e2 build]# bin/ceph osd pool create testpool --pg_num 8 --size 2
pool 'testpool' created

[root@3cfea3e7c7e2 build]# bin/rados bench -p testpool 300 write --nocleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 300 seconds or 0 objects
Object prefix: benchmark_data_3cfea3e7c7e2_15620
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        16         0         0         0           -           0
    2      16        26        10   19.9985        20     1.94489     1.62174
    3      16        41        25   33.3302        60     1.12973     1.63792
    4      16        53        37   36.9962        48    0.984256     1.49435
    5      16        67        51   40.7958        56     0.99228     1.41103
    6      16        80        64   42.6621        52     1.19066     1.36211
    7      16        89        73   41.7098        36     1.32314     1.33608
    8      16       104        88   43.9951        60      1.3878     1.35979
    9      16       114        98   43.5507        40     1.21856      1.3341
   10      16       131       115    45.995        68    0.766124     1.31445
   11      16       150       134   48.7219        76    0.514468     1.28399
   12      16       151       135   44.9951         4     1.20535     1.28341
   13      16       167       151   46.4565        64     0.74922     1.32088
   14      16       175       159   45.4237        32    0.952676     1.31934
   15      16       188       172   45.8618        52    0.975865     1.31626
   16      16       204       188    46.995        64     1.22408     1.31506
   17      16       212       196   46.1127        32     1.84238     1.30969
   18      16       226       210   46.6617        56     2.08285     1.33022
   19      16       239       223   46.9424        52     1.13944     1.33001
2019-10-04T16:00:57.486199+0000 min lat: 0.514468 max lat: 2.86502 avg lat: 1.32044
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   20      16       250       234    46.795        44    0.978277     1.32044
   21      16       264       248    47.233        56    0.607187     1.32294
   22      16       271       255   46.3586        28     1.30823     1.32669
   23      16       284       268   46.6037        52     1.15358     1.33424
   24      16       297       281   46.8282        52     1.22483     1.33498
   25      16       307       291   46.5549        40      1.0493     1.33528
   26      16       322       306   47.0717        60    0.928293     1.33525
   27      16       333       317   46.9579        44     1.51883     1.33139
   28      16       344       328   46.8521        44     1.67149     1.33666
   29      16       357       341   47.0294        52     1.03467     1.33586
   30      16       368       352   46.9282        44     1.39664     1.33083
   31      16       380       364   46.9626        48     1.46466     1.33254

[root@3cfea3e7c7e2 build]# bin/ceph osd pool set testpool size 3
set pool 3 size to 3

 32      16       380       364    45.495         0           -     1.33254
   33      16       380       364   44.1163         0           -     1.33254
   34      16       380       364   42.8188         0           -     1.33254
   35      16       380       364   41.5954         0           -     1.33254
   36      16       380       364     40.44         0           -     1.33254
   37      16       380       364    39.347         0           -     1.33254
   38      16       380       364   38.3115         0           -     1.33254
   39      16       380       364   37.3292         0           -     1.33254
2019-10-04T16:01:17.488557+0000 min lat: 0.514468 max lat: 2.86502 avg lat: 1.33254
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   40      16       380       364   36.3959         0           -     1.33254
   41      16       380       364   35.5082         0           -     1.33254
   42      16       380       364   34.6628         0           -     1.33254
   43      16       380       364   33.8566         0           -     1.33254
   44      16       380       364   33.0872         0           -     1.33254
   45      16       380       364   32.3519         0           -     1.33254
   46      16       380       364   31.6486         0           -     1.33254
   47      16       380       364   30.9752         0           -     1.33254
   48      16       380       364   30.3299         0           -     1.33254
   49      16       380       364   29.7109         0           -     1.33254
   50      16       380       364   29.1167         0           -     1.33254
   51      16       380       364   28.5458         0           -     1.33254
   52      16       380       364   27.9968         0           -     1.33254
   53      16       380       364   27.4686         0           -     1.33254
   54      16       380       364   26.9599         0           -     1.33254
   55      16       380       364   26.4698         0           -     1.33254
   56      16       380       364   25.9971         0           -     1.33254
   57      16       380       364    25.541         0           -     1.33254
   58      16       380       364   25.1006         0           -     1.33254
   59      16       380       364   24.6752         0           -     1.33254
2019-10-04T16:01:37.490856+0000 min lat: 0.514468 max lat: 2.86502 avg lat: 1.33254
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   60      16       380       364   24.2639         0           -     1.33254
   61      16       380       364   23.8662         0           -     1.33254
   62      16       380       364   23.4812         0           -     1.33254
   63      16       380       364   23.1085         0           -     1.33254
   64      16       380       364   22.7474         0           -     1.33254
   65      16       380       364   22.3975         0           -     1.33254
   66      16       380       364   22.0581         0           -     1.33254
   67      16       380       364   21.7289         0           -     1.33254
   68      16       380       364   21.4093         0           -     1.33254
   69      16       380       364   21.0991         0           -     1.33254
   70      16       380       364   20.7977         0           -     1.33254
   71      16       380       364   20.5047         0           -     1.33254
   72      16       380       364   20.2199         0           -     1.33254
   73      16       380       364    19.943         0           -     1.33254
   74      16       380       364   19.6735         0           -     1.33254
   75      16       380       364   19.4111         0           -     1.33254
   76      16       380       364   19.1557         0           -     1.33254
   77      16       380       364    18.907         0           -     1.33254
   78      16       380       364   18.6646         0           -     1.33254
   79      16       380       364   18.4283         0           -     1.33254
2019-10-04T16:01:57.493135+0000 min lat: 0.514468 max lat: 2.86502 avg lat: 1.33254
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   80      16       380       364   18.1979         0           -     1.33254
   81      16       380       364   17.9733         0           -     1.33254
   82      16       380       364   17.7541         0           -     1.33254
   83      16       380       364   17.5402         0           -     1.33254
   84      16       380       364   17.3314         0           -     1.33254
   85      16       380       364   17.1275         0           -     1.33254
   86      16       380       364   16.9283         0           -     1.33254
   87      16       380       364   16.7337         0           -     1.33254

[root@3cfea3e7c7e2 build]# bin/ceph -s
  cluster:
    id:     60d38ef0-8889-4fdc-bee7-91436a098a70
    health: HEALTH_WARN
            1 pool(s) do not have an application enabled

  services:
    mon: 3 daemons, quorum a,b,c (age 12h)
    mgr: x(active, since 12h)
    mds: a:1 {0=a=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 12h), 3 in (since 12h)

  task status:
    scrub status:
        mds.0: idle

  data:
    pools:   3 pools, 24 pgs
    objects: 350 objects, 1.3 GiB
    usage:   9.0 GiB used, 3.0 TiB / 3.0 TiB avail
    pgs:     24 active+clean

[root@3cfea3e7c7e2 build]# bin/ceph osd pool set testpool size 2
set pool 3 size to 2

 88      16       389       373   16.9526  0.631579     57.6862     2.69472
   89      16       391       375    16.852         8     58.3044     2.99112
   90      16       405       389   17.2869        56     58.9121     3.67076
   91      16       417       401   17.6244        48     1.49254     3.60002
   92      16       433       417   18.1284        64     1.34342     3.50895
   93      16       449       433   18.6216        64    0.963417     3.42239
   94      16       453       437   18.5936        16     1.24534     3.40225
   95      16       467       451   18.9873        56     1.24581     3.34853
   96      16       480       464   19.3311        52     1.14728     3.28931
   97      16       489       473   19.5029        36     1.39027     3.25261
   98      16       495       479   19.5488        24     1.96242     3.23465
   99      16       505       489   19.7553        40     1.91222     3.20803
2019-10-04T16:02:17.495392+0000 min lat: 0.514468 max lat: 58.9121 avg lat: 3.15763
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
  100      16       519       503   20.1177        56    0.721501     3.15763
  101      16       528       512   20.2749        36     1.09394     3.12528
  102      16       536       520   20.3898        32     1.79339     3.10347
  103      16       548       532   20.6578        48     1.42166     3.07108
  104      16       565       549    21.113        68    0.760797     3.00833
  105      16       573       557   21.2166        32     1.32553     2.98381
  106      16       590       574   21.6579        68    0.788553     2.93913
  107      16       600       584   21.8293        40     1.30258     2.91203
  108      16       612       596   22.0716        48      0.8999     2.88111
  109      16       624       608   22.3094        48     1.69133     2.85472
  110      16       637       621   22.5792        52    0.998792      2.8219
  111      16       651       635   22.8791        56     1.46139     2.78935
  112      16       658       642   22.9248        28     0.89574     2.77032
  113      16       667       651   23.0405        36     1.50485     2.75217
  114      16       682       666   23.3646        60     1.27387     2.72094
  115      16       695       679   23.6136        52     1.18959     2.69588
^C
[root@3cfea3e7c7e2 build]# 

#4 Updated by Vikhyat Umrao over 4 years ago

  • Pull request ID set to 30723

#5 Updated by Vikhyat Umrao over 4 years ago

- With fix:

# bin/ceph-mon -v
ceph version v15.0.0-5750-gdc473ec733 (dc473ec73336ca4c55bcac5c43496d63ca43f0d1) octopus (dev)

# bin/ceph osd pool set testpool size 3
Error EINVAL: pool size is bigger than the crush rule max size

#6 Updated by Vikhyat Umrao over 4 years ago

  • Status changed from In Progress to Fix Under Review

#7 Updated by Vikhyat Umrao over 4 years ago

  • Backport set to nautilus

#8 Updated by Vikhyat Umrao over 4 years ago

  • Status changed from Fix Under Review to 17

#9 Updated by Kefu Chai over 4 years ago

  • Status changed from 17 to Pending Backport

#10 Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #42326: nautilus: max_size from crushmap ignored when increasing size on pool added

#11 Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF