Bug #21375
closed
Average latency output in rados bench is arithmetically flawed
Added by Florian Haas over 6 years ago.
Updated over 6 years ago.
Description
Seen on v12.2.0-296-g706f78da1a on a virtualized test system:
sudo rados -p bench bench 30 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 30 seconds or 0 objects
Object prefix: benchmark_data_deploy_22718
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 58 42 167.914 168 0.734898 0.300907
2 16 105 89 177.941 188 0.358432 0.322523
3 16 154 138 183.95 196 0.351548 0.322912
4 16 206 190 189.954 208 0.141235 0.319715
5 16 254 238 190.358 192 0.156576 0.31991
6 16 304 288 191.961 200 0.231245 0.323936
7 16 348 332 189.678 176 0.120068 0.323501
8 16 400 384 191.963 208 0.763719 0.323801
9 16 452 436 193.742 208 0.336659 0.324155
10 16 502 486 194.366 200 0.412668 0.323478
11 16 550 534 194.135 192 0.318614 0.323487
12 16 601 585 194.954 204 0.330679 0.323183
13 16 650 634 195.032 196 0.546288 0.322944
14 16 695 679 193.958 180 0.139749 0.322562
15 16 779 763 203.424 336 0.638265 9.00648e+07
16 16 828 812 202.958 196 0.492445 8.46299e+07
17 16 877 861 202.548 196 0.164353 7.98136e+07
18 16 923 907 201.516 184 0.254678 7.57657e+07
19 16 976 960 202.066 212 0.237107 7.15828e+07
2017-09-12 16:20:08.253560 min lat: 0.0693709 max lat: 4.29497e+09 avg lat: 6.79718e+07
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
20 16 1027 1011 202.162 204 0.219402 6.79718e+07
21 16 1074 1058 201.485 188 0.194049 6.49522e+07
22 16 1124 1108 201.416 200 0.254869 6.20212e+07
23 16 1175 1159 201.527 204 0.530085 5.9292e+07
24 16 1213 1197 199.462 152 0.5128 5.74098e+07
25 16 1270 1254 200.602 228 0.212382 5.48002e+07
26 16 1317 1301 200.116 188 0.161623 5.28205e+07
27 16 1366 1350 199.963 196 0.183112 5.09033e+07
28 16 1414 1398 199.677 192 0.281947 4.91556e+07
29 16 1453 1437 198.171 156 0.362389 4.78215e+07
30 16 1499 1483 197.698 184 0.186634 4.63382e+07
Total time run: 30.448048
Total writes made: 1500
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 197.057
Stddev Bandwidth: 30.4936
Max bandwidth (MB/sec): 336
Min bandwidth (MB/sec): 152
Average IOPS: 49
Stddev IOPS: 7
Max IOPS: 84
Min IOPS: 38
Average Latency(s): 4.5813e+07
Stddev Latency(s): 4.41357e+08
Max latency(s): 4.29497e+09
Min latency(s): 0.0693709
None of the numbers here would be realistic if running on real hardware, but the latency from second 15 onward looks just implausible. :)
- Subject changed from Latency output in rados bench is arithmetically flawed to Average latency output in rados bench is arithmetically flawed
Dropped an important word here: it's obviously the average latency from second 15 onward that looks off.
- Affected Versions v12.2.1 added
Just checking: is anyone looking into this? This still exists on 12.2.1, and unfortunately makes rados bench
results rather unsuitable for consumption...
Below is reference output from 12.2.1:
# rados bench -p bench 30 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 30 seconds or 0 objects
Object prefix: benchmark_data_daisy_35973
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 32 16 63.9748 64 0.524363 0.380571
2 16 59 43 85.9721 108 0.491057 0.596051
3 16 109 93 123.964 200 0.502876 1.8473e+08
4 16 129 113 112.971 80 0.748528 1.52034e+08
5 16 141 125 99.9765 48 0.102788 1.37439e+08
6 16 170 154 102.644 116 0.175975 1.11558e+08
7 16 206 190 108.548 144 0.526126 9.04204e+07
8 16 239 223 111.477 132 0.634028 7.70398e+07
9 16 279 263 116.866 160 0.0811387 6.53227e+07
10 16 291 275 109.979 48 0.396706 6.24723e+07
11 16 324 308 111.979 132 0.0802478 5.57788e+07
12 16 337 321 106.98 52 0.138103 5.35198e+07
13 16 355 339 104.289 72 0.226504 5.06781e+07
14 16 393 377 107.695 152 0.27853 4.55699e+07
15 16 461 445 118.646 272 0.081326 3.86064e+07
16 16 507 491 122.728 184 0.330384 3.49896e+07
17 16 578 562 132.212 284 0.538753 3.05692e+07
18 16 639 623 138.42 244 0.5365 2.7576e+07
19 16 689 673 141.483 200 0.492813 2.55273e+07
2017-10-12 09:38:20.735721 min lat: 0.0502576 max lat: 4.29497e+09 avg lat: 2.3216e+07
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
20 16 756 740 147.799 268 0.350281 2.3216e+07
21 16 820 804 152.943 256 0.161733 2.1368e+07
22 16 878 862 156.53 232 0.239297 1.99302e+07
23 16 937 921 159.98 236 0.0916112 1.86535e+07
24 16 974 958 159.48 148 0.24256 1.79331e+07
25 16 1008 992 158.424 136 0.319072 1.73184e+07
26 16 1053 1037 159.252 180 0.142151 1.65669e+07
27 16 1101 1085 160.462 192 0.62604 1.5834e+07
28 16 1132 1116 159.161 124 0.124207 1.53941e+07
29 16 1168 1152 158.638 144 0.149339 1.49131e+07
30 16 1170 1154 153.563 8 1.33382 1.48872e+07
Total time run: 30.220721
Total writes made: 1170
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 154.861
Stddev Bandwidth: 74.9491
Max bandwidth (MB/sec): 284
Min bandwidth (MB/sec): 8
Average IOPS: 38
Stddev IOPS: 18
Max IOPS: 71
Min IOPS: 2
Average Latency(s): 1.46836e+07
Stddev Latency(s): 2.50807e+08
Max latency(s): 4.29497e+09
Min latency(s): 0.0502576
Cleaning up (deleting benchmark objects)
Removed 1170 objects
Clean up completed and total clean up time :0.472365
- Assignee set to Shinobu Kinjo
I don't see same result on physical machine with:
$ ceph -v
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
How did you build a virtualized test system?
Shinobu Kinjo wrote:
How did you build a virtualized test system?
It's three nodes in an OpenStack public cloud, but how exactly would that make a difference? Is the latency being calculated based on system clock time? Would an adjustment in the system clock cause this issue?
Have you you ever tried Kraken, Jewel? If not, can you try? And what's your exact kernel version?
Shinobu Kinjo wrote:
Have you you ever tried Kraken, Jewel? If not, can you try? And what's your exact kernel version?
I've been using Ceph since before Argonaut. :) I've never seen rados
bench act up in this particular way before.
This issue definitely did not exist in Jewel. I can't say for sure
whether it existed in Kraken, because I don't have Kraken "rados
bench" records to go back to.
Looking at the code, it seems that clock_gettime() with the REALTIME clock is used to calculate the latency. The reference for this clock is the epoch, which means that any changes made to the system's clock will affect the REATLTIME clock. Is it possible that ntp kicked in during the rados bench? Alternatnively, maybe the paravirtualization clock (kvm clock) made changes to the system's clock during the benchmark? Dmesg would tell you that.
But again, since you don't have any issue with the 'last lat', it might not be that...
Mohamad Gebai wrote:
But again, since you don't have any issue with the 'last lat', it might not be that...
Right. In addition, look at the following line:
2017-10-12 09:38:20.735721 min lat: 0.0502576 max lat: 4.29497e+09 avg lat: 2.3216e+07
The excessive value for max lat
is never actually found in any prior value of last lat
.
Right, interesting.. After trying a bunch of things, I was able to reproduce this:
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 16 0 0 0 - 0
2 16 16 0 0 0 - 0
3 16 19 3 3.99923 4 2.91914 2.6029
4 16 19 3 2.99939 0 - 2.6029
5 16 33 17 13.5974 28 2.32565 3.89453
6 16 34 18 11.9978 4 5.88878 4.00532
7 16 35 19 10.8552 4 3.97716 4.00384
8 16 45 29 12.7 40 4.29492e+09 1.48101e+09
9 16 51 35 13.8149 24 47428.7 1.22712e+09
10 16 53 37 13.2924 8 47429.3 1.16079e+09
11 16 59 43 14.1746 24 2.3183 9.98825e+08
12 16 62 46 14.0089 12 3.26318 9.33685e+08
13 16 68 52 14.7156 24 2.67895 8.25955e+08
14 16 75 59 15.5932 28 3.74444 7.27961e+08
15 16 79 63 15.6182 16 2.65534 6.81741e+08
16 16 82 66 15.4069 12 2.549 6.50753e+08
17 16 83 67 14.7777 4 3.879 6.4104e+08
18 16 90 74 15.4686 28 2.82302 5.80401e+08
19 16 96 80 15.8921 24 2.88408 5.36871e+08
2017-10-16 21:26:40.002227 min lat: 1.73544 max lat: 4.29492e+09 avg lat: 5.17466e+08
[...]
I also don't have the 'min lat' anywhere. The only way I was able to reproduce this was to start rados bench, go back in time (which makes rados bench quiet because its waiting for the right time to output), then immediately go forward to the correct time. In other words, the slightest time travel backwards followed by a readjustment causes this. The readjustment is necessary as 'rados bench' doesn't return otherwise (it's waiting on the wall clock to complete).
Anyway, this might be a different problem than what you're experiencing. How easily can you reproduce this issue? And is there anything on your system that might be playing with the clock? Or maybe something that might cause the clock to fall behind for a moment? Are you using the kvm clock? Does your CPU have constant_tsc? Are you using a tickless kernel?
PS: I know that 'last lat' was also wrong in the example I just pasted, but that's not always the case:
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
[...]
11 16 51 35 12.7248 32 3.00379 4.40251
12 16 55 39 12.9975 16 3.91942 4.32376
13 16 60 44 13.5357 20 2.54609 9.76118e+07
14 16 63 47 13.4259 12 3.11633 9.13813e+07
[...]
Mohamad Gebai wrote:
Anyway, this might be a different problem than what you're experiencing. How easily can you reproduce this issue? And is there anything on your system that might be playing with the clock? Or maybe something that might cause the clock to fall behind for a moment? Are you using the kvm clock? Does your CPU have constant_tsc? Are you using a tickless kernel?
Yes, this is a VM using the KVM clock source. I'll try to find some time in the next few days to reproduce this (currently traveling), but assuming this is indeed linked to the clock jumping around (which sounds entirely plausible!), maybe the best option is to just add a warning to the output, saying something to the effect of "clock adjusted, results will be unreliable"? Or would you expect other Ceph components, apart from `rados bench`, to also have a problem with this?
- Assignee changed from Shinobu Kinjo to Mohamad Gebai
- Status changed from New to Resolved
Also available in: Atom
PDF