Bug #21375
closedAverage latency output in rados bench is arithmetically flawed
0%
Description
Seen on v12.2.0-296-g706f78da1a on a virtualized test system:
sudo rados -p bench bench 30 write hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 30 seconds or 0 objects Object prefix: benchmark_data_deploy_22718 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 58 42 167.914 168 0.734898 0.300907 2 16 105 89 177.941 188 0.358432 0.322523 3 16 154 138 183.95 196 0.351548 0.322912 4 16 206 190 189.954 208 0.141235 0.319715 5 16 254 238 190.358 192 0.156576 0.31991 6 16 304 288 191.961 200 0.231245 0.323936 7 16 348 332 189.678 176 0.120068 0.323501 8 16 400 384 191.963 208 0.763719 0.323801 9 16 452 436 193.742 208 0.336659 0.324155 10 16 502 486 194.366 200 0.412668 0.323478 11 16 550 534 194.135 192 0.318614 0.323487 12 16 601 585 194.954 204 0.330679 0.323183 13 16 650 634 195.032 196 0.546288 0.322944 14 16 695 679 193.958 180 0.139749 0.322562 15 16 779 763 203.424 336 0.638265 9.00648e+07 16 16 828 812 202.958 196 0.492445 8.46299e+07 17 16 877 861 202.548 196 0.164353 7.98136e+07 18 16 923 907 201.516 184 0.254678 7.57657e+07 19 16 976 960 202.066 212 0.237107 7.15828e+07 2017-09-12 16:20:08.253560 min lat: 0.0693709 max lat: 4.29497e+09 avg lat: 6.79718e+07 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 20 16 1027 1011 202.162 204 0.219402 6.79718e+07 21 16 1074 1058 201.485 188 0.194049 6.49522e+07 22 16 1124 1108 201.416 200 0.254869 6.20212e+07 23 16 1175 1159 201.527 204 0.530085 5.9292e+07 24 16 1213 1197 199.462 152 0.5128 5.74098e+07 25 16 1270 1254 200.602 228 0.212382 5.48002e+07 26 16 1317 1301 200.116 188 0.161623 5.28205e+07 27 16 1366 1350 199.963 196 0.183112 5.09033e+07 28 16 1414 1398 199.677 192 0.281947 4.91556e+07 29 16 1453 1437 198.171 156 0.362389 4.78215e+07 30 16 1499 1483 197.698 184 0.186634 4.63382e+07 Total time run: 30.448048 Total writes made: 1500 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 197.057 Stddev Bandwidth: 30.4936 Max bandwidth (MB/sec): 336 Min bandwidth (MB/sec): 152 Average IOPS: 49 Stddev IOPS: 7 Max IOPS: 84 Min IOPS: 38 Average Latency(s): 4.5813e+07 Stddev Latency(s): 4.41357e+08 Max latency(s): 4.29497e+09 Min latency(s): 0.0693709
None of the numbers here would be realistic if running on real hardware, but the latency from second 15 onward looks just implausible. :)
Updated by Florian Haas over 6 years ago
- Subject changed from Latency output in rados bench is arithmetically flawed to Average latency output in rados bench is arithmetically flawed
Dropped an important word here: it's obviously the average latency from second 15 onward that looks off.
Updated by Florian Haas over 6 years ago
- Affected Versions v12.2.1 added
Just checking: is anyone looking into this? This still exists on 12.2.1, and unfortunately makes rados bench
results rather unsuitable for consumption...
Below is reference output from 12.2.1:
# rados bench -p bench 30 write hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 30 seconds or 0 objects Object prefix: benchmark_data_daisy_35973 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 32 16 63.9748 64 0.524363 0.380571 2 16 59 43 85.9721 108 0.491057 0.596051 3 16 109 93 123.964 200 0.502876 1.8473e+08 4 16 129 113 112.971 80 0.748528 1.52034e+08 5 16 141 125 99.9765 48 0.102788 1.37439e+08 6 16 170 154 102.644 116 0.175975 1.11558e+08 7 16 206 190 108.548 144 0.526126 9.04204e+07 8 16 239 223 111.477 132 0.634028 7.70398e+07 9 16 279 263 116.866 160 0.0811387 6.53227e+07 10 16 291 275 109.979 48 0.396706 6.24723e+07 11 16 324 308 111.979 132 0.0802478 5.57788e+07 12 16 337 321 106.98 52 0.138103 5.35198e+07 13 16 355 339 104.289 72 0.226504 5.06781e+07 14 16 393 377 107.695 152 0.27853 4.55699e+07 15 16 461 445 118.646 272 0.081326 3.86064e+07 16 16 507 491 122.728 184 0.330384 3.49896e+07 17 16 578 562 132.212 284 0.538753 3.05692e+07 18 16 639 623 138.42 244 0.5365 2.7576e+07 19 16 689 673 141.483 200 0.492813 2.55273e+07 2017-10-12 09:38:20.735721 min lat: 0.0502576 max lat: 4.29497e+09 avg lat: 2.3216e+07 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 20 16 756 740 147.799 268 0.350281 2.3216e+07 21 16 820 804 152.943 256 0.161733 2.1368e+07 22 16 878 862 156.53 232 0.239297 1.99302e+07 23 16 937 921 159.98 236 0.0916112 1.86535e+07 24 16 974 958 159.48 148 0.24256 1.79331e+07 25 16 1008 992 158.424 136 0.319072 1.73184e+07 26 16 1053 1037 159.252 180 0.142151 1.65669e+07 27 16 1101 1085 160.462 192 0.62604 1.5834e+07 28 16 1132 1116 159.161 124 0.124207 1.53941e+07 29 16 1168 1152 158.638 144 0.149339 1.49131e+07 30 16 1170 1154 153.563 8 1.33382 1.48872e+07 Total time run: 30.220721 Total writes made: 1170 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 154.861 Stddev Bandwidth: 74.9491 Max bandwidth (MB/sec): 284 Min bandwidth (MB/sec): 8 Average IOPS: 38 Stddev IOPS: 18 Max IOPS: 71 Min IOPS: 2 Average Latency(s): 1.46836e+07 Stddev Latency(s): 2.50807e+08 Max latency(s): 4.29497e+09 Min latency(s): 0.0502576 Cleaning up (deleting benchmark objects) Removed 1170 objects Clean up completed and total clean up time :0.472365
Updated by Shinobu Kinjo over 6 years ago
I don't see same result on physical machine with:
$ ceph -v
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
How did you build a virtualized test system?
Updated by Florian Haas over 6 years ago
Shinobu Kinjo wrote:
How did you build a virtualized test system?
It's three nodes in an OpenStack public cloud, but how exactly would that make a difference? Is the latency being calculated based on system clock time? Would an adjustment in the system clock cause this issue?
Updated by Shinobu Kinjo over 6 years ago
Have you you ever tried Kraken, Jewel? If not, can you try? And what's your exact kernel version?
Updated by Florian Haas over 6 years ago
Shinobu Kinjo wrote:
Have you you ever tried Kraken, Jewel? If not, can you try? And what's your exact kernel version?
I've been using Ceph since before Argonaut. :) I've never seen rados
bench act up in this particular way before.
This issue definitely did not exist in Jewel. I can't say for sure
whether it existed in Kraken, because I don't have Kraken "rados
bench" records to go back to.
Updated by Mohamad Gebai over 6 years ago
Looking at the code, it seems that clock_gettime() with the REALTIME clock is used to calculate the latency. The reference for this clock is the epoch, which means that any changes made to the system's clock will affect the REATLTIME clock. Is it possible that ntp kicked in during the rados bench? Alternatnively, maybe the paravirtualization clock (kvm clock) made changes to the system's clock during the benchmark? Dmesg would tell you that.
Updated by Mohamad Gebai over 6 years ago
But again, since you don't have any issue with the 'last lat', it might not be that...
Updated by Florian Haas over 6 years ago
Mohamad Gebai wrote:
But again, since you don't have any issue with the 'last lat', it might not be that...
Right. In addition, look at the following line:
2017-10-12 09:38:20.735721 min lat: 0.0502576 max lat: 4.29497e+09 avg lat: 2.3216e+07
The excessive value for max lat
is never actually found in any prior value of last lat
.
Updated by Mohamad Gebai over 6 years ago
Right, interesting.. After trying a bunch of things, I was able to reproduce this:
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 16 0 0 0 - 0
2 16 16 0 0 0 - 0
3 16 19 3 3.99923 4 2.91914 2.6029
4 16 19 3 2.99939 0 - 2.6029
5 16 33 17 13.5974 28 2.32565 3.89453
6 16 34 18 11.9978 4 5.88878 4.00532
7 16 35 19 10.8552 4 3.97716 4.00384
8 16 45 29 12.7 40 4.29492e+09 1.48101e+09
9 16 51 35 13.8149 24 47428.7 1.22712e+09
10 16 53 37 13.2924 8 47429.3 1.16079e+09
11 16 59 43 14.1746 24 2.3183 9.98825e+08
12 16 62 46 14.0089 12 3.26318 9.33685e+08
13 16 68 52 14.7156 24 2.67895 8.25955e+08
14 16 75 59 15.5932 28 3.74444 7.27961e+08
15 16 79 63 15.6182 16 2.65534 6.81741e+08
16 16 82 66 15.4069 12 2.549 6.50753e+08
17 16 83 67 14.7777 4 3.879 6.4104e+08
18 16 90 74 15.4686 28 2.82302 5.80401e+08
19 16 96 80 15.8921 24 2.88408 5.36871e+08
2017-10-16 21:26:40.002227 min lat: 1.73544 max lat: 4.29492e+09 avg lat: 5.17466e+08
[...]
I also don't have the 'min lat' anywhere. The only way I was able to reproduce this was to start rados bench, go back in time (which makes rados bench quiet because its waiting for the right time to output), then immediately go forward to the correct time. In other words, the slightest time travel backwards followed by a readjustment causes this. The readjustment is necessary as 'rados bench' doesn't return otherwise (it's waiting on the wall clock to complete).
Anyway, this might be a different problem than what you're experiencing. How easily can you reproduce this issue? And is there anything on your system that might be playing with the clock? Or maybe something that might cause the clock to fall behind for a moment? Are you using the kvm clock? Does your CPU have constant_tsc? Are you using a tickless kernel?
Updated by Mohamad Gebai over 6 years ago
PS: I know that 'last lat' was also wrong in the example I just pasted, but that's not always the case:
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
[...]
11 16 51 35 12.7248 32 3.00379 4.40251
12 16 55 39 12.9975 16 3.91942 4.32376
13 16 60 44 13.5357 20 2.54609 9.76118e+07
14 16 63 47 13.4259 12 3.11633 9.13813e+07
[...]
Updated by Florian Haas over 6 years ago
Mohamad Gebai wrote:
Anyway, this might be a different problem than what you're experiencing. How easily can you reproduce this issue? And is there anything on your system that might be playing with the clock? Or maybe something that might cause the clock to fall behind for a moment? Are you using the kvm clock? Does your CPU have constant_tsc? Are you using a tickless kernel?
Yes, this is a VM using the KVM clock source. I'll try to find some time in the next few days to reproduce this (currently traveling), but assuming this is indeed linked to the clock jumping around (which sounds entirely plausible!), maybe the best option is to just add a warning to the output, saying something to the effect of "clock adjusted, results will be unreliable"? Or would you expect other Ceph components, apart from `rados bench`, to also have a problem with this?
Updated by Mohamad Gebai over 6 years ago
- Assignee changed from Shinobu Kinjo to Mohamad Gebai
Updated by Mohamad Gebai over 6 years ago
PR submitted: https://github.com/ceph/ceph/pull/18588