Project

General

Profile

Actions

Bug #17989

closed

bluestore CEPH osd disk overloaded by reads periodically by rocksdb tables generation causing slow reads

Added by maribel alonso over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During several minutes we have observed disk sday overloaded by read operations ( see this statistics in 10 seconds period ), this causes slow reads on osd24 reading from this file:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sday 0.00 0.00 1729.50 69.00 13913.60 7574.00 23.90 2.86 1.59 0.80 21.31 0.55 99.22

[ ceph] # grep "enerated table" ceph-osd.24.log
2016-11-17 10:13:08.817002 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #360: 688733 keys, 67950830 bytes
2016-11-17 10:13:30.481400 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #361: 190476 keys, 67788987 bytes
2016-11-17 10:13:49.547932 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #362: 210835 keys, 67814227 bytes
2016-11-17 10:14:07.874568 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #363: 190522 keys, 67789341 bytes
2016-11-17 10:14:26.955161 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #364: 190447 keys, 67787553 bytes
2016-11-17 10:14:48.679477 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #365: 212809 keys, 67816325 bytes
2016-11-17 10:15:08.890487 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #366: 190522 keys, 67788531 bytes
2016-11-17 10:15:27.643525 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #367: 190511 keys, 67787365 bytes
2016-11-17 10:15:48.921387 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #368: 204626 keys, 67805722 bytes
2016-11-17 10:16:10.291240 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #369: 190443 keys, 67788767 bytes
2016-11-17 10:16:27.327313 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #370: 202913 keys, 67802523 bytes
2016-11-17 10:16:49.084237 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #371: 190472 keys, 67787510 bytes
2016-11-17 10:17:06.571193 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #372: 190468 keys, 67787108 bytes
2016-11-17 10:17:18.020120 7fd57f6c9700 4 rocksdb: [default] [JOB 89] Generated table #373: 2301296 keys, 13661386 bytes
[ ceph] # grep "enerated table" ceph-osd.23.log
2016-11-17 09:49:38.028023 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #263: 634528 keys, 67928524 bytes
2016-11-17 09:49:50.307365 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #264: 190890 keys, 67788090 bytes
2016-11-17 09:50:02.812456 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #265: 190829 keys, 67787458 bytes
2016-11-17 09:50:15.563327 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #266: 214706 keys, 67819357 bytes
2016-11-17 09:50:28.467158 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #267: 190829 keys, 67788428 bytes
2016-11-17 09:50:41.023930 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #268: 213193 keys, 67816081 bytes
2016-11-17 09:50:53.834008 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #269: 190821 keys, 67789966 bytes
2016-11-17 09:51:06.621466 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #270: 210275 keys, 67814895 bytes
2016-11-17 09:51:19.109983 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #271: 190838 keys, 67788071 bytes
2016-11-17 09:51:32.298792 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #272: 211110 keys, 67814543 bytes
2016-11-17 09:51:45.747544 7f6e99dc8700 4 rocksdb: [default] [JOB 65] Generated table #273: 2050660 keys, 64594228 bytes

====================

avg-cpu: %user %nice %system %iowait %steal %idle
36.78 0.00 7.03 21.75 0.00 34.45

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 9.80 0.00 33.40 0.00 1778.35 106.49 0.44 13.10 0.00 13.10 4.64 15.49
sdc 0.00 0.00 13.50 20.70 1432.80 3291.20 276.26 0.47 13.69 9.76 16.26 9.16 31.33
sdb 0.00 0.00 13.50 21.40 1363.20 3594.40 284.10 0.48 13.72 10.61 15.68 9.06 31.61
sde 0.00 0.00 17.20 16.80 1667.60 2722.00 258.21 0.42 12.23 9.13 15.40 8.52 28.96
sdd 0.00 0.00 23.00 18.20 2070.80 3664.40 278.41 0.50 12.12 8.14 17.15 7.77 32.01
sdh 0.00 0.00 24.20 22.20 2318.80 4220.40 281.86 0.66 14.22 9.96 18.86 8.33 38.65
sdi 0.00 0.00 29.10 20.40 3884.00 3006.00 278.38 0.69 13.99 11.07 18.16 8.60 42.58
sdf 0.00 0.00 25.90 19.40 2819.60 3332.80 271.63 0.59 13.11 9.35 18.12 7.97 36.09
sdg 0.00 0.00 19.00 22.60 2671.60 3713.20 306.96 0.65 15.86 13.28 18.02 9.05 37.63
sdj 0.00 0.00 16.30 18.70 2215.60 3236.80 311.57 0.47 13.33 10.07 16.16 8.67 30.33
sdk 0.00 0.00 12.60 18.10 1290.40 3018.00 280.68 0.41 13.43 9.64 16.06 8.95 27.47
sdl 0.00 0.00 18.90 17.40 2342.00 2903.60 289.01 0.47 12.99 8.84 17.51 8.22 29.83
sdm 0.00 0.00 21.70 19.70 2387.20 3302.00 274.84 0.56 13.51 10.04 17.32 7.85 32.51
sdn 0.00 0.00 20.80 22.40 3127.20 3423.20 303.26 0.64 14.94 11.68 17.96 8.97 38.77
sdo 0.00 0.00 27.60 20.70 3210.80 3135.20 262.77 0.63 12.95 9.43 17.64 7.91 38.21
sdp 0.00 0.00 18.10 18.10 2171.60 3375.20 306.45 0.46 12.71 8.70 16.72 8.37 30.29
sdq 0.00 0.00 26.40 25.60 4394.40 4774.00 352.63 0.78 14.97 11.08 18.98 8.94 46.48
sdr 0.00 0.00 17.20 21.30 2512.00 3795.60 327.67 0.56 14.55 11.23 17.23 8.79 33.84
sds 0.00 0.00 20.70 18.30 2060.80 2920.00 255.43 0.55 14.15 10.53 18.24 8.81 34.34
sdt 0.00 0.00 29.80 21.30 3124.40 4155.60 284.93 0.74 14.52 10.29 20.43 8.35 42.67
sdu 0.00 0.00 17.20 22.70 2791.20 3490.40 314.87 0.62 15.42 12.36 17.74 9.47 37.80
sdx 0.00 0.00 26.20 24.20 3612.40 4004.80 302.27 0.75 14.80 11.51 18.36 8.55 43.08
sdv 0.00 0.00 18.80 17.40 3485.20 2840.00 349.46 0.53 14.56 11.08 18.33 9.38 33.95
sdy 0.00 0.00 32.00 27.00 3643.20 5691.60 316.43 0.93 15.72 11.19 21.09 7.98 47.10
sdw 0.00 0.00 17.20 20.80 2084.40 3560.80 297.12 0.53 13.79 10.33 16.64 9.03 34.31
sdz 0.00 0.00 25.00 21.30 3990.00 3772.80 335.33 0.61 13.22 9.98 17.04 8.55 39.58
sdaa 0.00 0.00 18.00 18.00 2243.60 2927.20 287.27 0.46 12.90 10.64 15.16 9.02 32.46
sdab 0.00 0.00 16.10 19.30 3064.80 3258.00 357.22 0.48 13.45 11.11 15.40 8.97 31.74
sdac 0.00 0.00 17.60 22.00 3193.60 3957.20 361.15 0.62 15.56 12.35 18.13 9.24 36.59
sdad 0.00 0.00 23.10 22.80 2745.60 3646.80 278.54 0.60 13.05 9.69 16.46 8.62 39.56
sdae 0.00 0.00 31.30 24.20 2756.80 3169.20 213.55 0.66 11.83 8.11 16.64 7.36 40.87
sdaf 0.00 0.00 14.90 18.60 2032.40 2656.00 279.90 0.47 13.97 10.96 16.39 9.30 31.15
sdag 0.00 0.00 18.10 22.30 2116.00 3843.60 295.03 0.55 13.72 10.67 16.18 9.10 36.76
sdah 0.00 0.00 26.20 20.90 2964.40 3594.00 278.49 0.62 13.26 9.65 17.78 8.18 38.51
sdai 0.00 0.00 22.10 23.20 3995.20 3602.40 335.43 0.70 15.49 12.26 18.57 9.31 42.16
sdaj 0.00 0.00 18.10 21.70 2143.60 3860.80 301.73 0.55 13.87 10.06 17.05 8.70 34.61
sdak 0.00 0.00 30.40 21.90 1991.60 4090.00 232.57 0.68 12.94 8.17 19.56 7.23 37.82
sdal 0.00 0.00 23.00 16.20 2121.60 2458.80 233.69 0.44 11.21 8.05 15.70 7.93 31.08
sdam 0.00 0.00 22.30 21.10 3192.80 3669.60 316.24 0.62 14.39 10.95 18.02 8.76 38.00
sdan 0.00 0.00 24.60 25.60 2774.40 4842.40 303.46 0.76 15.07 11.52 18.48 8.44 42.38
sdao 0.00 0.00 20.40 16.40 2511.20 3216.00 311.26 0.45 12.19 8.76 16.46 8.32 30.61
sdaq 0.00 0.00 26.40 18.10 2908.40 2907.20 261.38 0.53 11.82 8.69 16.38 7.70 34.26
sdap 0.00 0.00 26.60 19.30 2670.80 2679.60 233.13 0.51 10.98 7.99 15.10 7.60 34.89
sdar 0.00 0.00 22.20 19.30 3346.40 3536.00 331.68 0.64 15.38 12.05 19.21 8.37 34.72
sdas 0.00 0.00 16.40 17.90 2066.80 3620.80 331.64 0.44 12.71 9.12 16.01 8.77 30.09
sdat 0.00 0.00 24.40 21.20 2992.00 3650.40 291.33 0.62 13.50 10.66 16.77 8.00 36.46
sdau 0.00 0.00 22.90 23.60 2450.00 3878.40 272.19 0.73 15.96 12.32 19.48 8.82 41.00
sdav 0.00 0.00 31.70 23.30 2336.40 3269.60 203.85 0.73 13.27 10.10 17.57 8.13 44.70
sdaw 0.00 0.00 24.30 22.60 3710.40 4053.20 331.07 0.69 14.62 10.44 19.12 8.54 40.04
sdax 0.00 0.00 20.00 17.30 2994.80 2882.00 315.11 0.53 14.12 11.86 16.73 9.35 34.87
sday 0.00 0.00 1729.50 69.00 13913.60 7574.00 23.90 2.86 1.59 0.80 21.31 0.55 99.22
sdaz 0.00 0.00 39.50 25.10 3975.20 3897.60 243.74 0.87 13.46 10.62 17.92 8.04 51.91
sdba 0.00 0.00 17.00 17.40 2262.40 2654.00 285.84 0.48 13.87 10.84 16.83 8.76 30.14
sdbb 0.00 0.00 24.00 21.60 2581.20 3735.60 277.05 0.63 13.71 10.05 17.79 8.14 37.12
sdbc 0.00 0.00 22.50 21.90 2078.80 2773.20 218.56 0.58 13.13 10.04 16.31 8.56 38.01
sdbd 0.00 0.00 20.20 23.70 3117.20 3987.20 323.66 0.64 14.65 11.01 17.75 8.77 38.51
sdbe 0.00 0.00 19.60 21.70 3070.40 3597.20 322.89 0.61 14.80 11.60 17.70 8.85 36.57
sdbf 0.00 0.00 22.70 20.80 2842.40 3435.60 288.64 0.60 13.83 10.56 17.39 8.94 38.88
sdbg 0.00 0.00 19.10 25.30 2669.20 5004.00 345.64 0.69 15.57 12.07 18.22 9.58 42.53
sdbh 0.00 0.00 30.40 22.00 3420.80 3722.80 272.66 0.72 13.65 10.66 17.78 7.88 41.29
sdbi 0.00 0.00 23.60 22.00 3511.20 3434.80 304.65 0.68 15.01 11.59 18.68 8.51 38.81
sdbj 0.00 0.00 15.30 19.00 2132.40 3526.80 329.98 0.51 14.92 11.06 18.02 9.21 31.60
sdbk 0.00 0.00 16.40 20.10 2594.40 4025.20 362.72 0.50 13.80 10.18 16.76 9.07 33.12
sdbl 0.00 0.00 18.80 18.80 3314.80 3059.20 339.04 0.49 13.13 10.66 15.59 8.60 32.34
sdbm 0.00 0.00 21.00 20.30 2722.40 3318.40 292.53 0.54 13.01 9.60 16.54 8.69 35.87
sdbn 0.00 0.00 19.50 22.00 2277.20 4048.80 304.87 0.57 13.70 10.09 16.90 8.61 35.73
sdbo 0.00 0.00 22.30 23.80 2409.60 3954.00 276.08 0.73 15.74 11.44 19.77 9.05 41.74
sdbp 0.00 0.00 19.80 21.80 2134.80 3473.60 269.63 0.61 14.62 11.14 17.78 9.24 38.45
sdbq 0.00 0.00 26.90 25.00 2989.60 4289.60 280.51 0.85 16.28 13.26 19.54 8.88 46.10
dm-0 0.00 0.00 0.00 22.30 0.00 90.40 8.11 0.29 13.01 0.00 13.01 1.38 3.08
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 6.10 0.00 71.60 23.48 0.09 15.15 0.00 15.15 5.25 3.20
dm-3 0.00 0.00 0.00 1.30 0.00 6.75 10.38 0.02 11.92 0.00 11.92 5.69 0.74
dm-4 0.00 0.00 0.00 13.50 0.00 1609.60 238.46 0.16 12.16 0.00 12.16 6.59 8.89

Actions #1

Updated by Mark Nelson over 7 years ago

  • Status changed from New to Closed

Hi,

Thanks for the report! Unfortunately this is a side effect of using rocksdb for metadata. If you look at the compaction statistics in the logs, you'll probably see a fair amount of both read and write traffic associated with compaction. These are primarily due to pglog updates and any object metadata updates. You can tune rocksdb in various ways to balance read amplification, write amplification, and size amplification, but it's very tricky to hit the right balance and is hardware dependent so our current tunings might not be ideal for your devices. You can see a discussion regarding this here:

https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide

Another thought: You might try playing with bluestore's min_alloc_size setting. Larger settings will cause an additional WAL write up to the min_alloc_size (similar to the filestore journal write), but will reduce the amount of metadata stored in rocksdb. It might be that a larger value will help reduce read traffic during compaction and in general require reading less metadata. In the long run, switching to something like sandisk's zetascale object store may result in better amplification characteristics and higher performance.

Actions

Also available in: Atom PDF