Project

General

Profile

Actions

Bug #47174

closed

[BlueStore] Pool/PG deletion(space reclamation) is very slow

Added by Vikhyat Umrao over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Version - 14.2.8 was also reproduced in 12.2.12

- We use cosbench to fill the cluster - obviously for the RGW workload. Five buckets were filled with following object sizes.

config="containers=r(1,5);objects=r(1,374000);sizes=h(1|1|50,64|64|15,8192|8192|15,65536|65536|15,1048576|1048576|5)KB" >

- Then 35 buckets were filled with the fixed object size 256K.

config="containers=u(6,15);objects=u(1,1000000);sizes=c(256)KB" 
config="containers=u(16,25);objects=u(1,1000000);sizes=c(256)KB" 
config="containers=u(26,35);objects=u(1,1000000);sizes=c(256)KB" 

- In the above workload testing the current space reclaimation status.

Thu Aug 27 13:16:27 UTC 2020
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       361 TiB     202 TiB     158 TiB      159 TiB         43.96
    TOTAL     361 TiB     202 TiB     158 TiB      159 TiB         43.96

Thu Aug 27 16:57:09 UTC 2020
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED       RAW USED     %RAW USED 
    hdd       361 TiB     283 TiB     77 TiB       78 TiB         21.49 
    TOTAL     361 TiB     283 TiB     77 TiB       78 TiB         21.49 

- If we go with above stats and in general with above object sizes we have seen it takes 8+ hours to reclaim the space where it was before starting the workload.

- This space reclamation time gets more increased when we fill the cluster with more small objects, for example:

config="containers=u(1,5);objects=u(1,50000000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" 

- It looks like the more the tiny objects the more time it takes to reclaim the space from pools/pgs deletion.

- The cluster has 192 OSD's and DB is in NVMe and block is in HDD.
+ DB partition is 62G in NVMe
+ OSD block disk is 1.8T
+ Each node has 24 OSD's
+ 12 OSD's DB in one NVMe so total 2 NVMe disks
+ 251G total RAM in each node
+ 56 core CPU
+ From one of the OSD node # uptime
17:24:24 up 2 days, 4:55, 2 users, load average: 81.69, 81.59, 81.82

top - 17:25:35 up 2 days,  4:56,  2 users,  load average: 83.08, 82.01, 81.95
Tasks: 644 total,   1 running, 643 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.9 us,  2.8 sy,  0.0 ni, 52.5 id, 37.7 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 26385555+total, 69740848 free, 18721012+used,  6904572 buff/cache
KiB Swap:   524284 total,   524284 free,        0 used. 66270048 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                       
  84346 ceph      20   0 9735624   7.4g  22612 S  41.2  2.9 489:58.54 /usr/bin/ceph-osd -f --cluster ceph --id 173 --setuser ceph --setgroup ceph                                   
  84266 ceph      20   0 9649848   7.4g  22448 S  29.4  2.9 394:04.31 /usr/bin/ceph-osd -f --cluster ceph --id 181 --setuser ceph --setgroup ceph                                   
  84315 ceph      20   0 9559208   7.4g  22624 S  29.4  2.9 475:31.08 /usr/bin/ceph-osd -f --cluster ceph --id 75 --setuser ceph --setgroup ceph                                    
  84357 ceph      20   0 9622716   7.4g  22636 S  29.4  2.9 413:25.13 /usr/bin/ceph-osd -f --cluster ceph --id 117 --setuser ceph --setgroup ceph                                   
  84281 ceph      20   0 9798076   7.4g  22696 S  23.5  2.9 450:28.81 /usr/bin/ceph-osd -f --cluster ceph --id 52 --setuser ceph --setgroup ceph                                    
  84291 ceph      20   0 9796536   7.4g  22688 S  23.5  2.9 341:54.12 /usr/bin/ceph-osd -f --cluster ceph --id 7 --setuser ceph --setgroup ceph                                     
  84302 ceph      20   0 9716028   7.4g  22792 S  23.5  2.9 438:46.45 /usr/bin/ceph-osd -f --cluster ceph --id 21 --setuser ceph --setgroup ceph                                    
  84313 ceph      20   0 9694608   7.3g  22644 S  23.5  2.9 346:54.91 /usr/bin/ceph-osd -f --cluster ceph --id 141 --setuser ceph --setgroup ceph                                   
  84317 ceph      20   0 9736432   7.4g  22676 S  23.5  2.9 459:17.50 /usr/bin/ceph-osd -f --cluster ceph --id 30 --setuser ceph --setgroup ceph                                    
  84336 ceph      20   0 9633048   7.4g  22784 S  23.5  2.9 469:43.33 /usr/bin/ceph-osd -f --cluster ceph --id 107 --setuser ceph --setgroup ceph                                   
  84342 ceph      20   0 9402956   7.4g  22640 S  23.5  2.9 452:44.92 /usr/bin/ceph-osd -f --cluster ceph --id 46 --setuser ceph --setgroup ceph                                    
  84351 ceph      20   0 9543764   7.4g  22604 S  23.5  2.9 334:12.42 /usr/bin/ceph-osd -f --cluster ceph --id 125 --setuser ceph --setgroup ceph                                   
  84362 ceph      20   0 9514564   7.4g  22556 S  23.5  2.9 341:06.37 /usr/bin/ceph-osd -f --cluster ceph --id 91 --setuser ceph --setgroup ceph                                    
  87544 ceph      20   0 9642692   7.4g  22600 S  23.5  2.9 345:25.25 /usr/bin/ceph-osd -f --cluster ceph --id 68 --setuser ceph --setgroup ceph                                    
  84247 ceph      20   0 9511292   7.4g  22680 S  17.6  2.9 405:16.52 /usr/bin/ceph-osd -f --cluster ceph --id 164 --setuser ceph --setgroup ceph                                   
  84254 ceph      20   0 9597888   7.4g  22604 S  17.6  2.9 404:31.53 /usr/bin/ceph-osd -f --cluster ceph --id 99 --setuser ceph --setgroup ceph                                    
  84323 ceph      20   0 9652036   7.4g  22592 S  17.6  2.9 489:35.24 /usr/bin/ceph-osd -f --cluster ceph --id 190 --setuser ceph --setgroup ceph                                   
  84341 ceph      20   0 9624968   7.4g  22484 S  17.6  2.9 369:05.30 /usr/bin/ceph-osd -f --cluster ceph --id 13 --setuser ceph --setgroup ceph                                    
  84353 ceph      20   0 9656128   7.4g  22648 S  17.6  2.9 391:18.63 /usr/bin/ceph-osd -f --cluster ceph --id 83 --setuser ceph --setgroup ceph                                    
  84354 ceph      20   0 9419564   7.4g  22628 S  17.6  2.9 365:26.11 /usr/bin/ceph-osd -f --cluster ceph --id 38 --setuser ceph --setgroup ceph                                    
 154813 root      20   0  162696   2856   1612 R  17.6  0.0   0:00.04 top -c -n1                                                                                                    
  84277 ceph      20   0 9713448   7.4g  22472 S  11.8  2.9 400:50.81 /usr/bin/ceph-osd -f --cluster ceph --id 156 --setuser ceph --setgroup ceph                                   
  84320 ceph      20   0 9736748   7.4g  22636 S  11.8  2.9 409:17.71 /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph --setgroup ceph                                   
  84355 ceph      20   0 9749624   6.8g  22588 S  11.8  2.7 359:17.82 /usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser ceph --setgroup ceph                                   
  84314 ceph      20   0 9561324   6.3g  22576 S   5.9  2.5 391:15.68 /usr/bin/ceph-osd -f --cluster ceph --id 59 --setuser ceph --setgroup ceph  
Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1       28250.00   31.00 2369532.00    244.00     0.00     0.00   0.00   0.00    2.89    2.26  83.16    83.88     7.87   0.04 100.70
nvme1n1       28957.00   36.00 2411584.00    288.00     0.00     0.00   0.00   0.00    2.64    1.81  78.04    83.28     8.00   0.03 101.00
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2          2776.00    4.00 232480.00     32.00     0.00     0.00   0.00   0.00    2.89    3.00   8.04    83.75     8.00   0.36 100.00
dm-3          2849.00    2.00 241796.00     16.00     0.00     0.00   0.00   0.00    2.81    2.00   8.01    84.87     8.00   0.35 100.00
dm-4          2259.00    1.00 177868.00      8.00     0.00     0.00   0.00   0.00    2.82    2.00   6.37    78.74     8.00   0.44 100.00
dm-5          2499.00    3.00 209644.00     24.00     0.00     0.00   0.00   0.00    2.93    1.00   7.32    83.89     8.00   0.40 100.00
dm-6          3115.00    7.00 260588.00     56.00     0.00     0.00   0.00   0.00    2.84    3.29   8.89    83.66     8.00   0.32 100.00
dm-7          2293.00    3.00 191508.00     24.00     0.00     0.00   0.00   0.00    2.89    1.00   6.64    83.52     8.00   0.44 100.00
dm-8          2592.00    1.00 221416.00     12.00     0.00     0.00   0.00   0.00    2.97    1.00   7.71    85.42    12.00   0.39 100.00
dm-9           690.00    1.00  58056.00      8.00     0.00     0.00   0.00   0.00    2.86    1.00   1.98    84.14     8.00   1.38  95.50
dm-10         2176.00    3.00 181016.00     16.00     0.00     0.00   0.00   0.00    2.90    1.33   6.30    83.19     5.33   0.46 100.00
dm-11         1462.00    0.00 125620.00      0.00     0.00     0.00   0.00   0.00    2.82    0.00   4.12    85.92     0.00   0.68 100.00
dm-12         3186.00    4.00 268904.00     32.00     0.00     0.00   0.00   0.00    2.91    3.00   9.30    84.40     8.00   0.31 100.00
dm-13         2379.00    1.00 203524.00      8.00     0.00     0.00   0.00   0.00    2.95    2.00   7.04    85.55     8.00   0.42 100.00
dm-14         1801.00    4.00 150780.00     32.00     0.00     0.00   0.00   0.00    2.70    1.25   4.87    83.72     8.00   0.55  99.90
dm-15         2654.00    2.00 220952.00     16.00     0.00     0.00   0.00   0.00    2.60    2.50   6.92    83.25     8.00   0.38 100.00
dm-16         2364.00    7.00 197504.00     56.00     0.00     0.00   0.00   0.00    2.63    1.29   6.22    83.55     8.00   0.42  99.90
dm-17         3057.00    7.00 225848.00     56.00     0.00     0.00   0.00   0.00    2.58    2.43   7.89    73.88     8.00   0.33 100.00
dm-18          963.00    0.00  83284.00      0.00     0.00     0.00   0.00   0.00    2.75    0.00   2.65    86.48     0.00   1.02  98.60
dm-19         2823.00    4.00 236264.00     32.00     0.00     0.00   0.00   0.00    2.61    1.50   7.38    83.69     8.00   0.35  99.90
dm-20         3670.00    3.00 311164.00     24.00     0.00     0.00   0.00   0.00    2.62    1.67   9.61    84.79     8.00   0.27  99.90
dm-21         1604.00    0.00 136908.00      0.00     0.00     0.00   0.00   0.00    2.79    0.00   4.48    85.35     0.00   0.62 100.00
dm-22         2286.00    5.00 192028.00     40.00     0.00     0.00   0.00   0.00    2.56    2.40   5.87    84.00     8.00   0.44 100.00
dm-23         2817.00    3.00 237992.00     24.00     0.00     0.00   0.00   0.00    2.63    1.33   7.40    84.48     8.00   0.35  99.90
dm-24         3108.00    1.00 263720.00      8.00     0.00     0.00   0.00   0.00    2.69    3.00   8.38    84.85     8.00   0.32 100.10
dm-25         1809.00    0.00 155404.00      0.00     0.00     0.00   0.00   0.00    2.74    0.00   4.95    85.91     0.00   0.55  99.90

- OSD memorty target value
osd memory target = 7880472439

- Cluster status - zero client IO

# ceph -s
  cluster:
    id:     78dbd3fd-75z2-47a7-848a-0137ba6ace9a
    health: HEALTH_WARN
            5 daemons have recently crashed
            2 slow ops, oldest one blocked for 84839 sec, daemons [osd.147,mon.f18-h09-000-r620] have slow ops.

  services:
    mon: 3 daemons, quorum mon1,mon2,mon3 (age 5d)
    mgr: mon1(active, since 5d), standbys: mon2, mon3
    osd: 192 osds: 192 up (since 109m), 192 in (since 7h)

  data:
    pools:   16 pools, 7040 pgs
    objects: 49 objects, 6.2 KiB
    usage:   74 TiB used, 287 TiB / 361 TiB avail
    pgs:     7032 active+clean
             7    active+clean+scrubbing
             1    active+clean+scrubbing+deep

  io:
    client:   253 B/s rd, 0 op/s rd, 0 op/s wr

- Data pool is an EC pool - 4+2

# ceph osd erasure-code-profile get myprofile
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #47044: PG::_delete_some isn't optimal iterating objectsResolvedIgor Fedotov

Actions
Actions

Also available in: Atom PDF