Bug #3936
closedrbd: Strange dd speed behaviour (server side issue?)
0%
Description
I have 3 node/15 osds (5 on each), every on separate drive installation (with SSD cache), journal in RAMFS. XFS as backing store for OSD. Also have 3 mons and 3 mds on separate 3 nodes. Every mon writes to SSD. Hosts are connected using 1G/10G mixed. All pools are size=3
I've created RBD device and trying to test it. And have some strange behaviour.
1st, I'm launching fio and got pretty nice results about concurrency and IOPS:
root@hosting-cloud1-s1:~# fio --filename=/dev/rbd/rbd/test --direct=1 --rw=randrw --bs=4k --size=1G --numjobs=50 --runtime=100 --group_reporting --name=test --rwmixread=95 --thread --ioengine=aio test: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1 ... test: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1 Starting 50 threads Jobs: 39 (f=39): [mmmmmm_m_mmmmmm_mmmmmmm_mm__mmmmmmmm_mm_m___mmmmmm] [8.4% done] [3452K/249K /s] [843/61 iops] [eta 18m:19s] test: (groupid=0, jobs=50): err= 0: pid=25476 read : io=633276KB, bw=6276KB/s, iops=1569, runt=100903msec slat (usec): min=14, max=1385, avg=61.85, stdev= 3.27 clat (usec): min=57, max=4817K, avg=28138.66, stdev=13854.88 bw (KB/s) : min= 0, max= 580, per=2.31%, avg=145.27, stdev=14.12 write: io=34012KB, bw=345166B/s, iops=84, runt=100903msec slat (usec): min=18, max=820, avg=63.80, stdev= 3.08 clat (msec): min=1, max=4169, avg=41.27, stdev=14.50 bw (KB/s) : min= 0, max= 50, per=2.63%, avg= 8.85, stdev= 1.05 cpu : usr=1.88%, sys=12.59%, ctx=9516717, majf=156, minf=253109 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=158319/8503, short=0/0 lat (usec): 100=0.01%, 250=0.01%, 500=14.07%, 750=15.61%, 1000=8.70% lat (msec): 2=14.11%, 4=9.83%, 10=6.89%, 20=7.53%, 50=12.96% lat (msec): 100=3.23%, 250=4.12%, 500=2.01%, 750=0.53%, 1000=0.24% lat (msec): 2000=0.17%, >=2000=0.01% Run status group 0 (all jobs): READ: io=633276KB, aggrb=6276KB/s, minb=6426KB/s, maxb=6426KB/s, mint=100903msec, maxt=100903msec WRITE: io=34012KB, aggrb=337KB/s, minb=345KB/s, maxb=345KB/s, mint=100903msec, maxt=100903msec
Next, I've tried dbench and got next results:
DBench 20 SAS Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 1636786 0.225 493.473 Close 1202468 0.003 4.339 Rename 69322 0.746 168.787 Unlink 330314 1.272 506.259 Deltree 40 24.670 112.694 Mkdir 20 0.007 0.007 Qpathinfo 1483642 0.044 548.973 Qfileinfo 260015 0.002 3.554 Qfsinfo 271935 0.007 9.574 Sfileinfo 133431 0.472 505.952 Find 573572 0.060 120.191 WriteX 815868 0.285 687.965 ReadX 2565845 0.034 402.321 LockX 5328 0.007 3.633 UnlockX 5328 0.004 0.092 Flush 114709 92.376 993.116 -------------------------------------------------------- DBench 20 Ceph Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 3031192 0.297 1848.309 Close 2226672 0.004 5.188 Rename 128357 0.633 447.750 Unlink 612116 1.365 1702.513 Deltree 80 37.245 216.251 Mkdir 40 0.007 0.016 Qpathinfo 2747388 0.059 980.984 Qfileinfo 481600 0.002 3.049 Qfsinfo 503764 0.008 4.435 Sfileinfo 246912 0.421 1705.006 Find 1062128 0.070 339.131 WriteX 1511925 0.317 777.022 ReadX 4751626 0.041 773.618 LockX 9874 0.007 1.624 UnlockX 9874 0.004 0.136 Flush 212445 42.397 2963.149
Speed was twice faster than RAID1 2xSAS 10K of 73GB but with higher latency. About 200MB/sec.
Next, I'm trying to do just:
dd if=/dev/rbd/rbd/test of=/var/www-cache/test bs=4M
And it is strange:
1st 1.5G is copied very fast, but next speed moves to almost zero, and when I'm viewing ifstat on DD host it shows almost 0 incoming traffic. And furthermore, when I'm trying to see iostat on OSD host it also shows no activity.
First of all, I've thought that ceph handles bad sparse volumes created and with no data, then I've created Ext4 on RBD and filled with data. Filling was very impressive about speed, but when I've tried to DD volume again, the same picture occured as before - 1st 1.5GB is OK (ifstat/iostat) and next almost to 0.
Updated by Ivan Kudryavtsev about 11 years ago
I also tried to do:
dd if=/dev/rbd/rbd/test of=/dev/null bs=4M - the same situation.
Updated by Ivan Kudryavtsev about 11 years ago
rados -p rbd bench 120 write -t 16
shows about 90-110 MB/sec.
Updated by Ian Colle about 11 years ago
- Project changed from CephFS to rbd
- Target version deleted (
v0.56)
Updated by Ian Colle about 11 years ago
- Assignee set to Alex Elder
- Priority changed from Normal to High
Updated by Greg Farnum about 11 years ago
It's pretty likely that this is a server-side behavior rather than a client-side one. Keep that in mind when reproducing and debugging!
Updated by Ivan Kudryavtsev about 11 years ago
Ifstat output looks like:
eth1 eth0 ib0 KB/s in KB/s out KB/s in KB/s out KB/s in KB/s out 8057.82 134.23 0.34 0.76 21019.68 71.07 12888.93 211.60 0.19 0.16 25221.22 82.05 10526.51 176.11 0.37 0.16 25223.97 79.15 10956.70 189.17 0.07 0.16 29428.05 92.74 12485.04 221.08 0.13 0.16 19706.17 63.39 4700.10 78.09 0.07 0.16 30744.86 87.90 7519.63 126.90 0.26 0.16 21019.66 65.64 12887.68 214.49 0.13 0.16 11487.29 36.72 9669.09 157.11 0.26 0.16 22144.58 72.16 8594.75 146.85 0.07 0.16 31353.90 110.93 ===== 0.07 0.08 0.59 0.16 34.26 34.85 38.20 41.09 0.19 0.16 30.57 34.62 38.12 40.51 0.19 0.16 34.08 36.34 17.35 18.32 0.41 0.16 24.78 26.09 25.69 27.41 0.13 0.16 31.61 33.74 8.54 9.09 0.07 0.16 25.55 26.07 17.15 18.25 0.19 0.16 25.06 29.07 25.49 27.28 0.13 0.16 24.29 25.20 8.54 9.09 0.13 0.16 30.98 30.74 25.56 27.28 0.07 0.16 28.17 29.90 0.07 0.07 0.29 0.25 29.29 33.94 8.47 9.16 0.16 0.16 30.95 33.54 8.47 9.09 0.26 0.16 30.18 29.92 16.95 18.26 0.07 0.16 49.13 52.17 8.61 9.16 0.19 0.16 27.07 29.92 16.95 18.12 0.19 0.16 31.00 33.98 16.95 18.19 0.19 0.16 37.69 38.46 8.41 9.03 0.13 0.16 31.27 34.21 16.95 18.19 0.13 0.16 29.32 33.75 ===== 17.79 44.42 0.13 0.16 12620.13 48.53 eth1 eth0 ib0 KB/s in KB/s out KB/s in KB/s out KB/s in KB/s out 7256.63 129.10 0.40 0.76 4203.80 12.96 12759.52 217.86 0.19 0.16 7170.64 34.16 5781.17 106.97 0.13 0.16 32.63 34.18 25.62 27.24 0.16 0.16 30.66 33.99 14.83 16.01 0.32 0.16 34.98 25.65 27.86 29.60 0.07 0.16 8412.56 31.97 12889.14 219.37 0.19 0.16 37835.11 115.14 12888.38 216.01 0.07 0.16 21019.31 67.72 17182.41 296.69 0.19 0.16 21020.33 63.20 17182.55 305.12 0.07 0.16 32241.23 102.97
and some time after in the same DD:
24783.31 412.76 0.13 0.16 12611.68 40.37 18053.93 297.39 0.13 0.16 21019.87 65.68 19858.86 321.00 0.12 0.16 21020.61 64.59 20273.00 333.89 0.21 0.32 21020.08 65.88 15707.43 274.34 0.07 0.16 16829.48 60.19 eth1 eth0 ib0 KB/s in KB/s out KB/s in KB/s out KB/s in KB/s out 2684.44 45.98 0.32 0.39 36.19 38.44 4305.16 80.63 0.26 0.42 7540.23 37.67 4313.70 84.18 0.27 0.16 32.71 34.81 8.67 9.16 0.07 0.16 30.40 33.99 0.06 0.00 0.13 0.16 31.59 33.74 0.00 0.00 0.13 0.16 29.42 30.12 34.08 36.34 0.19 0.16 24.92 25.63 17.08 18.18 0.32 0.16 27.13 29.50 8.61 9.16 0.19 0.16 38.85 42.75 8.48 9.09 0.07 0.16 28.98 29.27 8.47 9.09 0.32 0.16 23.29 25.40 0.00 0.00 0.07 0.16 29.18 28.64 0.00 0.00 0.13 0.16 24.52 29.65 8.61 9.23 0.50 0.16 23.54 21.58 17.21 18.24 0.13 0.16 30.48 33.71 0.07 0.07 0.44 0.16 32.17 33.97 8.41 9.03 0.38 0.16 26.41 29.03 8.54 9.16 0.32 0.16 39.22 42.34 8.72 9.34 0.26 0.16 40.70 42.13 25.49 27.27 0.13 0.16 20.67 21.98 0.00 0.00 0.61 0.16 30.96 33.71 23.69 25.52 0.07 0.16 21.02 22.03 10.41 10.95 0.49 0.16 26.15 29.86 16.88 18.18 0.13 0.16 28.37 27.13 25.76 27.42 0.43 0.16 30.27 33.75 12.58 13.61 0.32 0.16 32.64 33.80 80.81 86.37 0.49 0.16 14.56 15.82 34.42 36.47 0.13 0.16 20.93 23.29 49.19 52.67 0.56 0.16 24.35 26.94 18.82 20.23 0.28 0.16 19.71 21.58 eth1 eth0 ib0 KB/s in KB/s out KB/s in KB/s out KB/s in KB/s out 26.07 27.67 0.56 0.48 28.89 30.34 8.22 8.89 0.19 0.16 23.76 25.19 8.54 9.09 0.19 0.16 27.69 30.10 0.07 0.07 0.13 0.16 26.15 29.26 25.29 27.20 0.13 0.16 41.69 43.37 0.20 0.15 0.19 0.16 35.14 37.83 12.46 13.45 0.19 0.16 23.49 25.41 13.17 13.96 0.13 0.16 37.74 38.72 17.25 18.49 0.07 0.16 31.33 34.59 29.96 32.03 0.19 0.16 20.07 22.25 21.18 22.62 0.13 0.16 40.74 42.95 25.37 27.14 0.13 0.16 37.95 39.29 0.00 0.00 0.07 0.16 29.92 33.94 8.61 9.16 0.13 0.16 29.41 29.50 16.95 18.19 0.13 0.16 22.48 25.62 17.28 18.46 0.19 0.16 26.59 25.64 16.76 17.91 0.13 0.16 26.56 29.64 17.21 18.24 0.19 0.16 28.89 30.34 0.07 0.07 0.13 0.16 38.22 40.15 53.68 57.29 0.13 0.16 24.69 27.34 6.30 6.74 0.07 0.16 36.93 41.68 16.81 18.10 0.19 0.16 24.21 25.21 0.27 0.15 0.07 0.16 26.45 25.22 16.50 17.95 0.19 0.16 21.93 25.65 26.07 27.64 0.13 0.16 35.15 38.67 22.48 24.36 0.13 0.16 35.78 35.05 11.50 12.02 0.13 0.16 24.05 25.41 8.54 9.16 0.13 0.16 32.96 37.61 17.32 18.49 0.07 0.16 40.01 30.39 33.96 36.29 0.13 0.16 30.93 30.34 eth1 eth0 ib0 KB/s in KB/s out KB/s in KB/s out KB/s in KB/s out 37.74 40.52 0.51 0.64 17.17 18.55 21.90 23.25 0.13 0.25 33.52 38.02 8.41 9.03 0.07 0.16 34.22 34.40 8.61 9.16 0.19 0.16 29.31 33.32 11.17 12.00 0.07 0.16 27.41 29.48 31.07 33.27 0.13 0.16 22.01 21.37 17.00 18.10 0.19 0.16 30.50 33.77 0.00 0.00 7.62 0.16 33.19 33.76 17.08 18.32 0.07 0.16 33.05 38.47 33.97 36.36 0.19 0.16 25.11 25.63 8.60 9.14 0.13 0.16 25.07 25.68 17.15 18.33 0.19 0.16 18.99 21.77 8.54 9.15 0.32 0.16 36.89 38.69 8.61 9.16 0.26 0.22 29.54 32.89 8.66 9.36 0.07 0.16 21.62 21.38 17.78 18.38 0.19 0.16 31.17 35.04 4297.43 70.61 0.13 0.16 21019.40 65.80 8593.37 147.89 0.13 0.16 8414.72 34.85 2052.93 76.16 0.13 0.16 27.79 29.70 42.50 45.38 0.13 0.16 28.30 30.77 0.00 0.00 0.13 0.16 6596.35 48.00 4305.66 78.92 0.32 0.16 25486.20 82.40 8594.28 137.26 0.19 0.16 18391.33 57.92 7923.04 135.33 0.19 0.16 12612.10 38.68 9265.37 154.34 0.32 0.16 29427.89 86.30 4297.46 71.44 0.32 0.16 41776.19 125.82 11680.27 190.47 0.07 0.16 25486.26 79.69 2573.42 45.14 0.13 0.16 27063.31 80.92 9780.42 159.76 0.28 0.16 16158.55 52.03 16240.82 280.61 0.13 0.16 24040.91 73.10 eth1 eth0 ib0 KB/s in KB/s out KB/s in KB/s out KB/s in KB/s out 15569.96 268.50 0.34 0.76 29419.37 93.12 4295.88 74.65 0.19 0.16 38359.55 116.29 12888.82 214.54 0.07 0.16 24699.23 74.49 11280.13 180.17 0.19 0.16 26799.46 86.48 2693.27 43.31 0.13 0.16 23648.67 68.44 14161.33 231.31 0.19 0.16 20874.91 62.78 14164.57 237.77 0.19 0.16 17749.25 56.70 12622.60 212.63 0.13 0.16 11823.39 38.79
Updated by Ivan Kudryavtsev about 11 years ago
eth1 and ib0 connected to ceph cloud.
Updated by Alex Elder about 11 years ago
I'm sorry if I missed it, but can you tell me the version
of the kernel you are working with? Running "uname -a"
might be enough. I'd like to try to narrow down which
recent changes might (or might not) have contributed to
this behavior. Thanks.
Updated by Ivan Kudryavtsev about 11 years ago
Linux hosting-cloud1-s1.zzzing.ru 3.7.2-ceph #1 SMP Wed Jan 16 23:25:11 NOVT 2013 x86_64 GNU/Linux
Kernel config: http://hosting-cloud1-s1.zzzing.ru/config-3.7.2-ceph.txt
Updated by Ivan Kudryavtsev about 11 years ago
During slow periods iostat shows no operations on OSDs. First of all I've thought about sceduler and iowait problems, but seems it's somewhere inside ceph.
Updated by Alex Elder about 11 years ago
3.7.2-ceph
That tells me a lot, and in fact makes me suspect it might
not be rbd that's the cause.
What can you tell me about the version of ceph you have
running on an osd node?
Updated by Sage Weil about 11 years ago
It may be a network issue as well. Is it easy for you by chance to try with regular ethernet (say, gig instead of 10gig) on the client side?
Updated by Josh Durgin about 11 years ago
- Subject changed from rbd: Strange dd speed behaviour to rbd: Strange dd speed behaviour (server side issue?)
- Assignee changed from Alex Elder to Josh Durgin
What size is the journal on your osds? You may just be seeing a slowdown when the journals fill up, and must be flushed to the main osd data store.
Also, could you try the dd with conv=fsync and oflag=sync so that the client page cache doesn't confuse things?
Updated by Ivan Kudryavtsev about 11 years ago
Tried on regular 1G ethernet (other client server). Works well. Speed is stable and no such effect.
I don't understand how it could be connected to network? Since it works almost
good on high speed IB, but slow down for 5-15 seconds from time to time.
Will try at the same server with IB network disabled.
Ceph version is 0.56.2.
Updated by Ivan Kudryavtsev about 11 years ago
Journal is 1G in RAM. Since there are no writers at all yet, I don't think that journal is filled and causes device syncing. As I wrote before, no valuable iops activity detected during slow down.