Bug #41027
closedrbd bench write does not properly count in flight IOs
0%
Description
While benchmarking a heavily IO saturated hdd ceph cluster I noticed that the rbd bench --io-type=write mode does not seem to wait for outstanding writes to by sync'd to OSDs.
1st, proof the cluster is IO saturated, with rbd bench in read mode:
# rbd bench test/testdan --io-type=read --io-size=4096 --io-threads=1 --io-total=1M --io-pattern rand bench type read io_size 4096 io_threads 1 bytes 1048576 pattern random SEC OPS OPS/SEC BYTES/SEC 1 7 6.19 25361.76 2 17 8.64 35377.47 3 39 12.00 49170.77 4 52 13.23 54189.72 5 81 15.22 62324.34 6 102 20.16 82579.13 7 116 19.65 80487.61 8 138 21.05 86239.17 9 159 20.56 84216.75 10 169 19.06 78068.31 11 185 16.39 67119.80 12 200 17.09 70001.57 13 232 18.81 77065.04 elapsed: 14 ops: 256 ops/sec: 17.98 bytes/sec: 73644.95
Now writing to this cluster I get 15k IOPS, which is clearly wrong:
# rbd bench test/testdan --io-type=write --io-size=4096 --io-threads=1 --io-total=100M --io-pattern rand bench type write io_size 4096 io_threads 1 bytes 104857600 pattern random SEC OPS OPS/SEC BYTES/SEC 1 15397 15413.13 63132185.43 elapsed: 2 ops: 25600 ops/sec: 9906.94 bytes/sec: 40578823.57
Here, with debug_ms I see that even though I have --io-threads=1 it is sending 5 writes before the first osd_op_reply comes in:
[15:17][root@p05517715y58557 (production:ceph/beesly/mon*2:peon) ~]# rbd --debug_ms=1 bench test/testdan --io-type=write --io-size=4096 --io-threads=1 --io-total=20000 --io-pattern rand 2>&1 | grep write bench type write io_size 4096 io_threads 1 bytes 20000 pattern random 2019-07-31 15:17:59.393 7f987affd700 1 -- 188.184.36.164:0/3854274056 --> 128.142.162.23:6884/927919 -- osd_op(unknown.0.0:16 78.333 78:ccc2236b:::rbd_data.33b6d06b8b4567.0000000000000321:head [write 3112960~4096] snapc 0=[] ondisk+write+known_if_redirected e2428362) v8 -- 0x7f98700121d0 con 0 2019-07-31 15:17:59.393 7f987affd700 1 -- 188.184.36.164:0/3854274056 --> 128.142.161.230:6824/3839248 -- osd_op(unknown.0.0:18 78.3df 78:fbeced39:::rbd_data.33b6d06b8b4567.0000000000000592:head [write 425984~4096] snapc 0=[] ondisk+write+known_if_redirected e2428362) v8 -- 0x7f9870016d80 con 0 2019-07-31 15:17:59.393 7f987affd700 1 -- 188.184.36.164:0/3854274056 --> 128.142.161.213:6852/3058740 -- osd_op(unknown.0.0:19 78.3b1 78:8df00f11:::rbd_data.33b6d06b8b4567.00000000000007f4:head [write 3579904~4096] snapc 0=[] ondisk+write+known_if_redirected e2428362) v8 -- 0x7f987001be20 con 0 2019-07-31 15:17:59.393 7f987affd700 1 -- 188.184.36.164:0/3854274056 --> 128.142.162.25:6820/897497 -- osd_op(unknown.0.0:20 78.3d 78:bc39328a:::rbd_data.33b6d06b8b4567.0000000000000620:head [write 3719168~4096] snapc 0=[] ondisk+write+known_if_redirected e2428362) v8 -- 0x7f9870020ec0 con 0 2019-07-31 15:17:59.393 7f987affd700 1 -- 188.184.36.164:0/3854274056 --> 128.142.161.221:6888/3981167 -- osd_op(unknown.0.0:21 78.3ff 78:ffd7215d:::rbd_data.33b6d06b8b4567.0000000000000073:head [write 1892352~4096] snapc 0=[] ondisk+write+known_if_redirected e2428362) v8 -- 0x7f9870026040 con 0 2019-07-31 15:17:59.396 7f9894ddf700 1 -- 188.184.36.164:0/3854274056 <== osd.751 128.142.162.25:6820/897497 1 ==== osd_op_reply(20 rbd_data.33b6d06b8b4567.0000000000000620 [write 3719168~4096] v2428362'96100 uv96100 ondisk = 0) v8 ==== 184+0+0 (2063549112 0 0) 0x7f988c0054e0 con 0x7f987001d750 2019-07-31 15:17:59.396 7f98945de700 1 -- 188.184.36.164:0/3854274056 <== osd.1390 128.142.161.221:6888/3981167 1 ==== osd_op_reply(21 rbd_data.33b6d06b8b4567.0000000000000073 [write 1892352~4096] v2428362'95936 uv95936 ondisk = 0) v8 ==== 184+0+0 (4292782517 0 0) 0x7f9884007d30 con 0x7f9870022810 2019-07-31 15:17:59.397 7f9893ddd700 1 -- 188.184.36.164:0/3854274056 <== osd.534 128.142.161.213:6852/3058740 1 ==== osd_op_reply(19 rbd_data.33b6d06b8b4567.00000000000007f4 [write 3579904~4096] v2428362'96979 uv96979 ondisk = 0) v8 ==== 184+0+0 (253385391 0 0) 0x7f9888005430 con 0x7f98700186b0 2019-07-31 15:17:59.398 7f9894ddf700 1 -- 188.184.36.164:0/3854274056 <== osd.664 128.142.162.23:6884/927919 1 ==== osd_op_reply(16 rbd_data.33b6d06b8b4567.0000000000000321 [write 3112960~4096] v2428362'96098 uv96098 ondisk = 0) v8 ==== 184+0+0 (1504253969 0 0) 0x7f988c0054e0 con 0x7f987000eac0 2019-07-31 15:17:59.512 7f98945de700 1 -- 188.184.36.164:0/3854274056 <== osd.497 128.142.161.230:6824/3839248 1 ==== osd_op_reply(18 rbd_data.33b6d06b8b4567.0000000000000592 [write 425984~4096] v2428362'96712 uv96712 ondisk = 0) v8 ==== 184+0+0 (1159898189 0 0) 0x7f9884007d30 con 0x7f9870013670
Thanks
Updated by Jason Dillaman over 4 years ago
- Status changed from New to Need More Info
Do you have the RBD in-memory writeback cache enabled?
Updated by Dan van der Ster over 4 years ago
Jason Dillaman wrote:
Do you have the RBD in-memory writeback cache enabled?
Doh.. with `rbd cache = false` I get the expected (low) IOPS.
My config before had the default `rbd cache writethrough until flush = true`... is rbd bench sending a flush early on to turn on writeback?
Updated by Jason Dillaman over 4 years ago
Dan van der Ster wrote:
My config before had the default `rbd cache writethrough until flush = true`... is rbd bench sending a flush early on to turn on writeback?
Yes, `rbd bench` will send a flush prior to starting the benchmark to ensure the cache is in writeback mode.
Updated by Dan van der Ster over 4 years ago
Jason Dillaman wrote:
Dan van der Ster wrote:
My config before had the default `rbd cache writethrough until flush = true`... is rbd bench sending a flush early on to turn on writeback?
Yes, `rbd bench` will send a flush prior to starting the benchmark to ensure the cache is in writeback mode.
OK apologies for the ticket in that case. Thanks for your time, and please close.
Updated by Jason Dillaman over 4 years ago
- Status changed from Need More Info to Closed