Bug #47174
closed[BlueStore] Pool/PG deletion(space reclamation) is very slow
0%
Description
Version - 14.2.8 was also reproduced in 12.2.12
- We use cosbench to fill the cluster - obviously for the RGW workload. Five buckets were filled with following object sizes.
config="containers=r(1,5);objects=r(1,374000);sizes=h(1|1|50,64|64|15,8192|8192|15,65536|65536|15,1048576|1048576|5)KB" >
- Then 35 buckets were filled with the fixed object size 256K.
config="containers=u(6,15);objects=u(1,1000000);sizes=c(256)KB" config="containers=u(16,25);objects=u(1,1000000);sizes=c(256)KB" config="containers=u(26,35);objects=u(1,1000000);sizes=c(256)KB"
- In the above workload testing the current space reclaimation status.
Thu Aug 27 13:16:27 UTC 2020 RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 361 TiB 202 TiB 158 TiB 159 TiB 43.96 TOTAL 361 TiB 202 TiB 158 TiB 159 TiB 43.96 Thu Aug 27 16:57:09 UTC 2020 RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 361 TiB 283 TiB 77 TiB 78 TiB 21.49 TOTAL 361 TiB 283 TiB 77 TiB 78 TiB 21.49
- If we go with above stats and in general with above object sizes we have seen it takes 8+ hours to reclaim the space where it was before starting the workload.
- This space reclamation time gets more increased when we fill the cluster with more small objects, for example:
config="containers=u(1,5);objects=u(1,50000000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB"
- It looks like the more the tiny objects the more time it takes to reclaim the space from pools/pgs deletion.
- The cluster has 192 OSD's and DB is in NVMe and block is in HDD.
+ DB partition is 62G in NVMe
+ OSD block disk is 1.8T
+ Each node has 24 OSD's
+ 12 OSD's DB in one NVMe so total 2 NVMe disks
+ 251G total RAM in each node
+ 56 core CPU
+ From one of the OSD node
# uptime
17:24:24 up 2 days, 4:55, 2 users, load average: 81.69, 81.59, 81.82
top - 17:25:35 up 2 days, 4:56, 2 users, load average: 83.08, 82.01, 81.95 Tasks: 644 total, 1 running, 643 sleeping, 0 stopped, 0 zombie %Cpu(s): 6.9 us, 2.8 sy, 0.0 ni, 52.5 id, 37.7 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 26385555+total, 69740848 free, 18721012+used, 6904572 buff/cache KiB Swap: 524284 total, 524284 free, 0 used. 66270048 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 84346 ceph 20 0 9735624 7.4g 22612 S 41.2 2.9 489:58.54 /usr/bin/ceph-osd -f --cluster ceph --id 173 --setuser ceph --setgroup ceph 84266 ceph 20 0 9649848 7.4g 22448 S 29.4 2.9 394:04.31 /usr/bin/ceph-osd -f --cluster ceph --id 181 --setuser ceph --setgroup ceph 84315 ceph 20 0 9559208 7.4g 22624 S 29.4 2.9 475:31.08 /usr/bin/ceph-osd -f --cluster ceph --id 75 --setuser ceph --setgroup ceph 84357 ceph 20 0 9622716 7.4g 22636 S 29.4 2.9 413:25.13 /usr/bin/ceph-osd -f --cluster ceph --id 117 --setuser ceph --setgroup ceph 84281 ceph 20 0 9798076 7.4g 22696 S 23.5 2.9 450:28.81 /usr/bin/ceph-osd -f --cluster ceph --id 52 --setuser ceph --setgroup ceph 84291 ceph 20 0 9796536 7.4g 22688 S 23.5 2.9 341:54.12 /usr/bin/ceph-osd -f --cluster ceph --id 7 --setuser ceph --setgroup ceph 84302 ceph 20 0 9716028 7.4g 22792 S 23.5 2.9 438:46.45 /usr/bin/ceph-osd -f --cluster ceph --id 21 --setuser ceph --setgroup ceph 84313 ceph 20 0 9694608 7.3g 22644 S 23.5 2.9 346:54.91 /usr/bin/ceph-osd -f --cluster ceph --id 141 --setuser ceph --setgroup ceph 84317 ceph 20 0 9736432 7.4g 22676 S 23.5 2.9 459:17.50 /usr/bin/ceph-osd -f --cluster ceph --id 30 --setuser ceph --setgroup ceph 84336 ceph 20 0 9633048 7.4g 22784 S 23.5 2.9 469:43.33 /usr/bin/ceph-osd -f --cluster ceph --id 107 --setuser ceph --setgroup ceph 84342 ceph 20 0 9402956 7.4g 22640 S 23.5 2.9 452:44.92 /usr/bin/ceph-osd -f --cluster ceph --id 46 --setuser ceph --setgroup ceph 84351 ceph 20 0 9543764 7.4g 22604 S 23.5 2.9 334:12.42 /usr/bin/ceph-osd -f --cluster ceph --id 125 --setuser ceph --setgroup ceph 84362 ceph 20 0 9514564 7.4g 22556 S 23.5 2.9 341:06.37 /usr/bin/ceph-osd -f --cluster ceph --id 91 --setuser ceph --setgroup ceph 87544 ceph 20 0 9642692 7.4g 22600 S 23.5 2.9 345:25.25 /usr/bin/ceph-osd -f --cluster ceph --id 68 --setuser ceph --setgroup ceph 84247 ceph 20 0 9511292 7.4g 22680 S 17.6 2.9 405:16.52 /usr/bin/ceph-osd -f --cluster ceph --id 164 --setuser ceph --setgroup ceph 84254 ceph 20 0 9597888 7.4g 22604 S 17.6 2.9 404:31.53 /usr/bin/ceph-osd -f --cluster ceph --id 99 --setuser ceph --setgroup ceph 84323 ceph 20 0 9652036 7.4g 22592 S 17.6 2.9 489:35.24 /usr/bin/ceph-osd -f --cluster ceph --id 190 --setuser ceph --setgroup ceph 84341 ceph 20 0 9624968 7.4g 22484 S 17.6 2.9 369:05.30 /usr/bin/ceph-osd -f --cluster ceph --id 13 --setuser ceph --setgroup ceph 84353 ceph 20 0 9656128 7.4g 22648 S 17.6 2.9 391:18.63 /usr/bin/ceph-osd -f --cluster ceph --id 83 --setuser ceph --setgroup ceph 84354 ceph 20 0 9419564 7.4g 22628 S 17.6 2.9 365:26.11 /usr/bin/ceph-osd -f --cluster ceph --id 38 --setuser ceph --setgroup ceph 154813 root 20 0 162696 2856 1612 R 17.6 0.0 0:00.04 top -c -n1 84277 ceph 20 0 9713448 7.4g 22472 S 11.8 2.9 400:50.81 /usr/bin/ceph-osd -f --cluster ceph --id 156 --setuser ceph --setgroup ceph 84320 ceph 20 0 9736748 7.4g 22636 S 11.8 2.9 409:17.71 /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph --setgroup ceph 84355 ceph 20 0 9749624 6.8g 22588 S 11.8 2.7 359:17.82 /usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser ceph --setgroup ceph 84314 ceph 20 0 9561324 6.3g 22576 S 5.9 2.5 391:15.68 /usr/bin/ceph-osd -f --cluster ceph --id 59 --setuser ceph --setgroup ceph
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 28250.00 31.00 2369532.00 244.00 0.00 0.00 0.00 0.00 2.89 2.26 83.16 83.88 7.87 0.04 100.70 nvme1n1 28957.00 36.00 2411584.00 288.00 0.00 0.00 0.00 0.00 2.64 1.81 78.04 83.28 8.00 0.03 101.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 2776.00 4.00 232480.00 32.00 0.00 0.00 0.00 0.00 2.89 3.00 8.04 83.75 8.00 0.36 100.00 dm-3 2849.00 2.00 241796.00 16.00 0.00 0.00 0.00 0.00 2.81 2.00 8.01 84.87 8.00 0.35 100.00 dm-4 2259.00 1.00 177868.00 8.00 0.00 0.00 0.00 0.00 2.82 2.00 6.37 78.74 8.00 0.44 100.00 dm-5 2499.00 3.00 209644.00 24.00 0.00 0.00 0.00 0.00 2.93 1.00 7.32 83.89 8.00 0.40 100.00 dm-6 3115.00 7.00 260588.00 56.00 0.00 0.00 0.00 0.00 2.84 3.29 8.89 83.66 8.00 0.32 100.00 dm-7 2293.00 3.00 191508.00 24.00 0.00 0.00 0.00 0.00 2.89 1.00 6.64 83.52 8.00 0.44 100.00 dm-8 2592.00 1.00 221416.00 12.00 0.00 0.00 0.00 0.00 2.97 1.00 7.71 85.42 12.00 0.39 100.00 dm-9 690.00 1.00 58056.00 8.00 0.00 0.00 0.00 0.00 2.86 1.00 1.98 84.14 8.00 1.38 95.50 dm-10 2176.00 3.00 181016.00 16.00 0.00 0.00 0.00 0.00 2.90 1.33 6.30 83.19 5.33 0.46 100.00 dm-11 1462.00 0.00 125620.00 0.00 0.00 0.00 0.00 0.00 2.82 0.00 4.12 85.92 0.00 0.68 100.00 dm-12 3186.00 4.00 268904.00 32.00 0.00 0.00 0.00 0.00 2.91 3.00 9.30 84.40 8.00 0.31 100.00 dm-13 2379.00 1.00 203524.00 8.00 0.00 0.00 0.00 0.00 2.95 2.00 7.04 85.55 8.00 0.42 100.00 dm-14 1801.00 4.00 150780.00 32.00 0.00 0.00 0.00 0.00 2.70 1.25 4.87 83.72 8.00 0.55 99.90 dm-15 2654.00 2.00 220952.00 16.00 0.00 0.00 0.00 0.00 2.60 2.50 6.92 83.25 8.00 0.38 100.00 dm-16 2364.00 7.00 197504.00 56.00 0.00 0.00 0.00 0.00 2.63 1.29 6.22 83.55 8.00 0.42 99.90 dm-17 3057.00 7.00 225848.00 56.00 0.00 0.00 0.00 0.00 2.58 2.43 7.89 73.88 8.00 0.33 100.00 dm-18 963.00 0.00 83284.00 0.00 0.00 0.00 0.00 0.00 2.75 0.00 2.65 86.48 0.00 1.02 98.60 dm-19 2823.00 4.00 236264.00 32.00 0.00 0.00 0.00 0.00 2.61 1.50 7.38 83.69 8.00 0.35 99.90 dm-20 3670.00 3.00 311164.00 24.00 0.00 0.00 0.00 0.00 2.62 1.67 9.61 84.79 8.00 0.27 99.90 dm-21 1604.00 0.00 136908.00 0.00 0.00 0.00 0.00 0.00 2.79 0.00 4.48 85.35 0.00 0.62 100.00 dm-22 2286.00 5.00 192028.00 40.00 0.00 0.00 0.00 0.00 2.56 2.40 5.87 84.00 8.00 0.44 100.00 dm-23 2817.00 3.00 237992.00 24.00 0.00 0.00 0.00 0.00 2.63 1.33 7.40 84.48 8.00 0.35 99.90 dm-24 3108.00 1.00 263720.00 8.00 0.00 0.00 0.00 0.00 2.69 3.00 8.38 84.85 8.00 0.32 100.10 dm-25 1809.00 0.00 155404.00 0.00 0.00 0.00 0.00 0.00 2.74 0.00 4.95 85.91 0.00 0.55 99.90
- OSD memorty target value
osd memory target = 7880472439
- Cluster status - zero client IO
# ceph -s cluster: id: 78dbd3fd-75z2-47a7-848a-0137ba6ace9a health: HEALTH_WARN 5 daemons have recently crashed 2 slow ops, oldest one blocked for 84839 sec, daemons [osd.147,mon.f18-h09-000-r620] have slow ops. services: mon: 3 daemons, quorum mon1,mon2,mon3 (age 5d) mgr: mon1(active, since 5d), standbys: mon2, mon3 osd: 192 osds: 192 up (since 109m), 192 in (since 7h) data: pools: 16 pools, 7040 pgs objects: 49 objects, 6.2 KiB usage: 74 TiB used, 287 TiB / 361 TiB avail pgs: 7032 active+clean 7 active+clean+scrubbing 1 active+clean+scrubbing+deep io: client: 253 B/s rd, 0 op/s rd, 0 op/s wr
- Data pool is an EC pool - 4+2
# ceph osd erasure-code-profile get myprofile crush-device-class= crush-failure-domain=osd crush-root=default jerasure-per-chunk-alignment=false k=4 m=2 plugin=jerasure technique=reed_sol_van w=8