Project

General

Profile

Actions

Bug #3936

closed

rbd: Strange dd speed behaviour (server side issue?)

Added by Ivan Kudryavtsev over 11 years ago. Updated about 11 years ago.

Status:
Rejected
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have 3 node/15 osds (5 on each), every on separate drive installation (with SSD cache), journal in RAMFS. XFS as backing store for OSD. Also have 3 mons and 3 mds on separate 3 nodes. Every mon writes to SSD. Hosts are connected using 1G/10G mixed. All pools are size=3

I've created RBD device and trying to test it. And have some strange behaviour.

1st, I'm launching fio and got pretty nice results about concurrency and IOPS:

root@hosting-cloud1-s1:~# fio --filename=/dev/rbd/rbd/test --direct=1 --rw=randrw --bs=4k --size=1G --numjobs=50 --runtime=100 --group_reporting --name=test --rwmixread=95 --thread --ioengine=aio
test: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1
...
test: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1
Starting 50 threads
Jobs: 39 (f=39): [mmmmmm_m_mmmmmm_mmmmmmm_mm__mmmmmmmm_mm_m___mmmmmm] [8.4% done] [3452K/249K /s] [843/61 iops] [eta 18m:19s]    
test: (groupid=0, jobs=50): err= 0: pid=25476
  read : io=633276KB, bw=6276KB/s, iops=1569, runt=100903msec
    slat (usec): min=14, max=1385, avg=61.85, stdev= 3.27
    clat (usec): min=57, max=4817K, avg=28138.66, stdev=13854.88
    bw (KB/s) : min=    0, max=  580, per=2.31%, avg=145.27, stdev=14.12
  write: io=34012KB, bw=345166B/s, iops=84, runt=100903msec
    slat (usec): min=18, max=820, avg=63.80, stdev= 3.08
    clat (msec): min=1, max=4169, avg=41.27, stdev=14.50
    bw (KB/s) : min=    0, max=   50, per=2.63%, avg= 8.85, stdev= 1.05
  cpu          : usr=1.88%, sys=12.59%, ctx=9516717, majf=156, minf=253109
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=158319/8503, short=0/0
     lat (usec): 100=0.01%, 250=0.01%, 500=14.07%, 750=15.61%, 1000=8.70%
     lat (msec): 2=14.11%, 4=9.83%, 10=6.89%, 20=7.53%, 50=12.96%
     lat (msec): 100=3.23%, 250=4.12%, 500=2.01%, 750=0.53%, 1000=0.24%
     lat (msec): 2000=0.17%, >=2000=0.01%

Run status group 0 (all jobs):
   READ: io=633276KB, aggrb=6276KB/s, minb=6426KB/s, maxb=6426KB/s, mint=100903msec, maxt=100903msec
  WRITE: io=34012KB, aggrb=337KB/s, minb=345KB/s, maxb=345KB/s, mint=100903msec, maxt=100903msec

Next, I've tried dbench and got next results:

DBench 20 SAS
Operation      Count        AvgLat     MaxLat
----------------------------------------
NTCreateX    1636786     0.225     493.473
Close        1202468     0.003       4.339
Rename         69322                 0.746     168.787
Unlink        330314     1.272     506.259
Deltree           40                24.670   112.694
Mkdir             20                 0.007       0.007
Qpathinfo    1483642     0.044     548.973
Qfileinfo     260015     0.002       3.554
Qfsinfo       271935     0.007       9.574
Sfileinfo     133431     0.472     505.952
Find          573572     0.060     120.191
WriteX        815868     0.285     687.965
ReadX        2565845     0.034     402.321
LockX           5328                 0.007       3.633
UnlockX         5328                 0.004       0.092
Flush         114709     92.376   993.116

--------------------------------------------------------
DBench 20 Ceph
Operation      Count    AvgLat    MaxLat
----------------------------------------
NTCreateX    3031192     0.297  1848.309
Close        2226672     0.004     5.188
Rename        128357     0.633   447.750
Unlink        612116     1.365  1702.513
Deltree           80        37.245   216.251
Mkdir             40         0.007     0.016
Qpathinfo    2747388     0.059   980.984
Qfileinfo     481600     0.002     3.049
Qfsinfo       503764     0.008     4.435
Sfileinfo     246912     0.421  1705.006
Find         1062128     0.070   339.131
WriteX       1511925     0.317   777.022
ReadX        4751626     0.041   773.618
LockX           9874         0.007     1.624
UnlockX         9874         0.004     0.136
Flush         212445    42.397  2963.149

Speed was twice faster than RAID1 2xSAS 10K of 73GB but with higher latency. About 200MB/sec.

Next, I'm trying to do just:

dd if=/dev/rbd/rbd/test of=/var/www-cache/test bs=4M

And it is strange:

1st 1.5G is copied very fast, but next speed moves to almost zero, and when I'm viewing ifstat on DD host it shows almost 0 incoming traffic. And furthermore, when I'm trying to see iostat on OSD host it also shows no activity.

First of all, I've thought that ceph handles bad sparse volumes created and with no data, then I've created Ext4 on RBD and filled with data. Filling was very impressive about speed, but when I've tried to DD volume again, the same picture occured as before - 1st 1.5GB is OK (ifstat/iostat) and next almost to 0.

Actions

Also available in: Atom PDF