Bug #752
closedHigh I/O wait when writing data
0%
Description
Like I said on IRC, I'm seeing a high load on my machine "noisy".
The setup is:- Intel(R) Xeon(R) CPU 5110 1.6Ghz (Dual-Core)
- 6GB DDR2
- 1x 80GB (OS / Logging)
- 4x WD 2TB (OSD data)
I have 4 OSD's running (on the 2TB disks), with their journal on a tmpfs drive (200MB per OSD).
With Qemu-RBD I have a VM "beta" running, this VM has a 1.5TB disk and I'm rsyncing 1.1TB data to it.
The replication level of the "rbd" pool is at 3:
pg_pool 3 'rbd' pg_pool(rep pg_size 3 crush_ruleset 3 object_hash rjenkins pg_num 2048 pgp_num 256 lpg_num 2 lpgp_num 2 last_change 9 owner 0)
The status of my Ceph cluster:
2011-01-28 22:12:37.084838 pg v123162: 8224 pgs: 8219 active+clean, 5 active+clean+inconsistent; 745 GB data, 2298 GB used, 5147 GB / 7452 GB avail 2011-01-28 22:12:37.104838 mds e66: 1/1/1 up {0=up:active} 2011-01-28 22:12:37.128499 osd e2064: 4 osds: 4 up, 4 in 2011-01-28 22:12:37.128602 log 2011-01-28 22:12:11.549246 osd1 [2a00:f10:113:1:230:48ff:fe8d:a21e]:6804/2622 721 : [INF] 3.db scrub ok 2011-01-28 22:12:37.139256 class rbd (v1.3 [x86-64]) 2011-01-28 22:12:37.147454 mon e1: 1 mons at {noisy=[2a00:f10:113:1:230:48ff:fe8d:a21e]:6789/0}
In the VM I'm downloading the data with about 100Mbit, but the speed fluctuates between 5MB and 9MB/sec.
The load on noisy is rather high, a sysstat tells me:
04:55:01 PM CPU %user %nice %system %iowait %steal %idle 05:05:02 PM all 26.10 0.00 42.90 24.24 0.00 6.76 05:15:01 PM all 25.54 0.00 42.41 24.20 0.00 7.85 05:25:02 PM all 25.75 0.00 42.64 23.59 0.00 8.02 05:35:01 PM all 24.13 0.00 40.21 27.19 0.00 8.47 05:45:02 PM all 24.81 0.00 41.38 26.04 0.00 7.77 05:55:02 PM all 22.08 0.00 36.88 28.23 0.00 12.82 06:05:01 PM all 23.39 0.00 38.34 27.97 0.00 10.31 06:15:02 PM all 25.14 0.00 41.46 25.76 0.00 7.64 06:25:02 PM all 25.22 0.00 41.45 25.28 0.00 8.05 06:35:02 PM all 24.22 0.00 40.70 25.31 0.00 9.77 06:45:01 PM all 25.08 0.00 41.44 25.11 0.00 8.37 06:55:02 PM all 24.03 0.00 39.25 27.08 0.00 9.65 07:05:01 PM all 24.48 0.00 39.72 25.35 0.00 10.46 07:15:01 PM all 23.48 0.00 38.83 27.38 0.00 10.31 07:25:02 PM all 22.52 0.00 37.76 29.40 0.00 10.32 07:35:01 PM all 22.48 0.00 36.90 29.93 0.00 10.69 07:45:01 PM all 21.72 0.00 35.86 31.07 0.00 11.35 07:55:02 PM all 22.54 0.00 38.56 29.19 0.00 9.71 08:05:02 PM all 23.96 0.00 40.21 27.77 0.00 8.06 08:15:02 PM all 22.71 0.00 37.98 31.26 0.00 8.05 08:25:02 PM all 20.86 0.00 34.94 30.19 0.00 14.00 08:35:01 PM all 24.32 0.00 40.13 27.69 0.00 7.86 08:45:02 PM all 25.04 0.00 41.97 27.34 0.00 5.64 08:55:01 PM all 24.07 0.00 40.50 27.60 0.00 7.82 09:05:01 PM all 23.60 0.00 39.41 29.10 0.00 7.90 09:15:01 PM all 18.31 0.00 31.59 33.26 0.00 16.84 09:25:01 PM all 24.57 0.00 41.64 28.17 0.00 5.62 09:35:02 PM all 23.22 0.00 39.63 28.83 0.00 8.32 09:45:01 PM all 20.53 0.00 35.36 34.28 0.00 9.83 09:55:02 PM all 23.67 0.00 39.81 26.93 0.00 9.59 10:05:02 PM all 23.44 0.00 40.71 28.88 0.00 6.97 Average: all 24.71 0.00 39.26 24.85 0.00 11.19
As you can see, the I/O Wait is high, but the System also consumes a lot of CPU.
On Sage's request I set debug-ms to 1 for osd.0, the last 10.000 lines of the log are attached to this issue.
Is this a normal load? With replication at three 100Mb of traffic would generate 300Mb of writes spread over the 4 disks, that seems to be in the range of what these disks should be able to handle.
The rsync just finished and after it did, the load dropped very hard, the rsync stats:
Number of files: 932 Number of files transferred: 473 Total file size: 775568137247 bytes Total transferred file size: 670012521514 bytes Literal data: 670012521514 bytes Matched data: 0 bytes File list size: 51337 File list generation time: 0.199 seconds File list transfer time: 0.000 seconds Total bytes sent: 10432 Total bytes received: 670094382303 sent 10432 bytes received 670094382303 bytes 7369464.94 bytes/sec total size is 775568137247 speedup is 1.16
I had to restart the rsync a few times since the VM crashed due to some btrfs bugs (which are fixed now).
P.S.: I saw #563 a lot during this rsync.
Files