Project

General

Profile

Actions

Bug #752

closed

High I/O wait when writing data

Added by Wido den Hollander over 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Like I said on IRC, I'm seeing a high load on my machine "noisy".

The setup is:
  • Intel(R) Xeon(R) CPU 5110 1.6Ghz (Dual-Core)
  • 6GB DDR2
  • 1x 80GB (OS / Logging)
  • 4x WD 2TB (OSD data)

I have 4 OSD's running (on the 2TB disks), with their journal on a tmpfs drive (200MB per OSD).

With Qemu-RBD I have a VM "beta" running, this VM has a 1.5TB disk and I'm rsyncing 1.1TB data to it.

The replication level of the "rbd" pool is at 3:

pg_pool 3 'rbd' pg_pool(rep pg_size 3 crush_ruleset 3 object_hash rjenkins pg_num 2048 pgp_num 256 lpg_num 2 lpgp_num 2 last_change 9 owner 0)

The status of my Ceph cluster:

2011-01-28 22:12:37.084838    pg v123162: 8224 pgs: 8219 active+clean, 5 active+clean+inconsistent; 745 GB data, 2298 GB used, 5147 GB / 7452 GB avail
2011-01-28 22:12:37.104838   mds e66: 1/1/1 up {0=up:active}
2011-01-28 22:12:37.128499   osd e2064: 4 osds: 4 up, 4 in
2011-01-28 22:12:37.128602   log 2011-01-28 22:12:11.549246 osd1 [2a00:f10:113:1:230:48ff:fe8d:a21e]:6804/2622 721 : [INF] 3.db scrub ok
2011-01-28 22:12:37.139256   class rbd (v1.3 [x86-64])
2011-01-28 22:12:37.147454   mon e1: 1 mons at {noisy=[2a00:f10:113:1:230:48ff:fe8d:a21e]:6789/0}

In the VM I'm downloading the data with about 100Mbit, but the speed fluctuates between 5MB and 9MB/sec.

The load on noisy is rather high, a sysstat tells me:

04:55:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:05:02 PM     all     26.10      0.00     42.90     24.24      0.00      6.76
05:15:01 PM     all     25.54      0.00     42.41     24.20      0.00      7.85
05:25:02 PM     all     25.75      0.00     42.64     23.59      0.00      8.02
05:35:01 PM     all     24.13      0.00     40.21     27.19      0.00      8.47
05:45:02 PM     all     24.81      0.00     41.38     26.04      0.00      7.77
05:55:02 PM     all     22.08      0.00     36.88     28.23      0.00     12.82
06:05:01 PM     all     23.39      0.00     38.34     27.97      0.00     10.31
06:15:02 PM     all     25.14      0.00     41.46     25.76      0.00      7.64
06:25:02 PM     all     25.22      0.00     41.45     25.28      0.00      8.05
06:35:02 PM     all     24.22      0.00     40.70     25.31      0.00      9.77
06:45:01 PM     all     25.08      0.00     41.44     25.11      0.00      8.37
06:55:02 PM     all     24.03      0.00     39.25     27.08      0.00      9.65
07:05:01 PM     all     24.48      0.00     39.72     25.35      0.00     10.46
07:15:01 PM     all     23.48      0.00     38.83     27.38      0.00     10.31
07:25:02 PM     all     22.52      0.00     37.76     29.40      0.00     10.32
07:35:01 PM     all     22.48      0.00     36.90     29.93      0.00     10.69
07:45:01 PM     all     21.72      0.00     35.86     31.07      0.00     11.35
07:55:02 PM     all     22.54      0.00     38.56     29.19      0.00      9.71
08:05:02 PM     all     23.96      0.00     40.21     27.77      0.00      8.06
08:15:02 PM     all     22.71      0.00     37.98     31.26      0.00      8.05
08:25:02 PM     all     20.86      0.00     34.94     30.19      0.00     14.00
08:35:01 PM     all     24.32      0.00     40.13     27.69      0.00      7.86
08:45:02 PM     all     25.04      0.00     41.97     27.34      0.00      5.64
08:55:01 PM     all     24.07      0.00     40.50     27.60      0.00      7.82
09:05:01 PM     all     23.60      0.00     39.41     29.10      0.00      7.90
09:15:01 PM     all     18.31      0.00     31.59     33.26      0.00     16.84
09:25:01 PM     all     24.57      0.00     41.64     28.17      0.00      5.62
09:35:02 PM     all     23.22      0.00     39.63     28.83      0.00      8.32
09:45:01 PM     all     20.53      0.00     35.36     34.28      0.00      9.83
09:55:02 PM     all     23.67      0.00     39.81     26.93      0.00      9.59
10:05:02 PM     all     23.44      0.00     40.71     28.88      0.00      6.97
Average:        all     24.71      0.00     39.26     24.85      0.00     11.19

As you can see, the I/O Wait is high, but the System also consumes a lot of CPU.

On Sage's request I set debug-ms to 1 for osd.0, the last 10.000 lines of the log are attached to this issue.

Is this a normal load? With replication at three 100Mb of traffic would generate 300Mb of writes spread over the 4 disks, that seems to be in the range of what these disks should be able to handle.

The rsync just finished and after it did, the load dropped very hard, the rsync stats:

Number of files: 932
Number of files transferred: 473
Total file size: 775568137247 bytes
Total transferred file size: 670012521514 bytes
Literal data: 670012521514 bytes
Matched data: 0 bytes
File list size: 51337
File list generation time: 0.199 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10432
Total bytes received: 670094382303

sent 10432 bytes  received 670094382303 bytes  7369464.94 bytes/sec
total size is 775568137247  speedup is 1.16

I had to restart the rsync a few times since the VM crashed due to some btrfs bugs (which are fixed now).

P.S.: I saw #563 a lot during this rsync.


Files

osd.0.log.gz (264 KB) osd.0.log.gz osd0 log with debug-ms=1 Wido den Hollander, 01/28/2011 01:17 PM
bonnie.csv (2.79 KB) bonnie.csv Wido den Hollander, 01/31/2011 11:22 AM
Actions

Also available in: Atom PDF