Project

General

Profile

Actions

Bug #752

closed

High I/O wait when writing data

Added by Wido den Hollander about 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Like I said on IRC, I'm seeing a high load on my machine "noisy".

The setup is:
  • Intel(R) Xeon(R) CPU 5110 1.6Ghz (Dual-Core)
  • 6GB DDR2
  • 1x 80GB (OS / Logging)
  • 4x WD 2TB (OSD data)

I have 4 OSD's running (on the 2TB disks), with their journal on a tmpfs drive (200MB per OSD).

With Qemu-RBD I have a VM "beta" running, this VM has a 1.5TB disk and I'm rsyncing 1.1TB data to it.

The replication level of the "rbd" pool is at 3:

pg_pool 3 'rbd' pg_pool(rep pg_size 3 crush_ruleset 3 object_hash rjenkins pg_num 2048 pgp_num 256 lpg_num 2 lpgp_num 2 last_change 9 owner 0)

The status of my Ceph cluster:

2011-01-28 22:12:37.084838    pg v123162: 8224 pgs: 8219 active+clean, 5 active+clean+inconsistent; 745 GB data, 2298 GB used, 5147 GB / 7452 GB avail
2011-01-28 22:12:37.104838   mds e66: 1/1/1 up {0=up:active}
2011-01-28 22:12:37.128499   osd e2064: 4 osds: 4 up, 4 in
2011-01-28 22:12:37.128602   log 2011-01-28 22:12:11.549246 osd1 [2a00:f10:113:1:230:48ff:fe8d:a21e]:6804/2622 721 : [INF] 3.db scrub ok
2011-01-28 22:12:37.139256   class rbd (v1.3 [x86-64])
2011-01-28 22:12:37.147454   mon e1: 1 mons at {noisy=[2a00:f10:113:1:230:48ff:fe8d:a21e]:6789/0}

In the VM I'm downloading the data with about 100Mbit, but the speed fluctuates between 5MB and 9MB/sec.

The load on noisy is rather high, a sysstat tells me:

04:55:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:05:02 PM     all     26.10      0.00     42.90     24.24      0.00      6.76
05:15:01 PM     all     25.54      0.00     42.41     24.20      0.00      7.85
05:25:02 PM     all     25.75      0.00     42.64     23.59      0.00      8.02
05:35:01 PM     all     24.13      0.00     40.21     27.19      0.00      8.47
05:45:02 PM     all     24.81      0.00     41.38     26.04      0.00      7.77
05:55:02 PM     all     22.08      0.00     36.88     28.23      0.00     12.82
06:05:01 PM     all     23.39      0.00     38.34     27.97      0.00     10.31
06:15:02 PM     all     25.14      0.00     41.46     25.76      0.00      7.64
06:25:02 PM     all     25.22      0.00     41.45     25.28      0.00      8.05
06:35:02 PM     all     24.22      0.00     40.70     25.31      0.00      9.77
06:45:01 PM     all     25.08      0.00     41.44     25.11      0.00      8.37
06:55:02 PM     all     24.03      0.00     39.25     27.08      0.00      9.65
07:05:01 PM     all     24.48      0.00     39.72     25.35      0.00     10.46
07:15:01 PM     all     23.48      0.00     38.83     27.38      0.00     10.31
07:25:02 PM     all     22.52      0.00     37.76     29.40      0.00     10.32
07:35:01 PM     all     22.48      0.00     36.90     29.93      0.00     10.69
07:45:01 PM     all     21.72      0.00     35.86     31.07      0.00     11.35
07:55:02 PM     all     22.54      0.00     38.56     29.19      0.00      9.71
08:05:02 PM     all     23.96      0.00     40.21     27.77      0.00      8.06
08:15:02 PM     all     22.71      0.00     37.98     31.26      0.00      8.05
08:25:02 PM     all     20.86      0.00     34.94     30.19      0.00     14.00
08:35:01 PM     all     24.32      0.00     40.13     27.69      0.00      7.86
08:45:02 PM     all     25.04      0.00     41.97     27.34      0.00      5.64
08:55:01 PM     all     24.07      0.00     40.50     27.60      0.00      7.82
09:05:01 PM     all     23.60      0.00     39.41     29.10      0.00      7.90
09:15:01 PM     all     18.31      0.00     31.59     33.26      0.00     16.84
09:25:01 PM     all     24.57      0.00     41.64     28.17      0.00      5.62
09:35:02 PM     all     23.22      0.00     39.63     28.83      0.00      8.32
09:45:01 PM     all     20.53      0.00     35.36     34.28      0.00      9.83
09:55:02 PM     all     23.67      0.00     39.81     26.93      0.00      9.59
10:05:02 PM     all     23.44      0.00     40.71     28.88      0.00      6.97
Average:        all     24.71      0.00     39.26     24.85      0.00     11.19

As you can see, the I/O Wait is high, but the System also consumes a lot of CPU.

On Sage's request I set debug-ms to 1 for osd.0, the last 10.000 lines of the log are attached to this issue.

Is this a normal load? With replication at three 100Mb of traffic would generate 300Mb of writes spread over the 4 disks, that seems to be in the range of what these disks should be able to handle.

The rsync just finished and after it did, the load dropped very hard, the rsync stats:

Number of files: 932
Number of files transferred: 473
Total file size: 775568137247 bytes
Total transferred file size: 670012521514 bytes
Literal data: 670012521514 bytes
Matched data: 0 bytes
File list size: 51337
File list generation time: 0.199 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10432
Total bytes received: 670094382303

sent 10432 bytes  received 670094382303 bytes  7369464.94 bytes/sec
total size is 775568137247  speedup is 1.16

I had to restart the rsync a few times since the VM crashed due to some btrfs bugs (which are fixed now).

P.S.: I saw #563 a lot during this rsync.


Files

osd.0.log.gz (264 KB) osd.0.log.gz osd0 log with debug-ms=1 Wido den Hollander, 01/28/2011 01:17 PM
bonnie.csv (2.79 KB) bonnie.csv Wido den Hollander, 01/31/2011 11:22 AM
Actions #1

Updated by Wido den Hollander about 13 years ago

I've done some benchmarks today in the VM and noticed something weird.

In the VM I ran:

dd if=/dev/zero of=1GB.bin bs=1024k count=1024 conv=sync

This gave me a really poor performance:

root@beta:~# dd if=/dev/zero of=1GB.bin bs=1024k count=1024 conv=sync && sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 57.4975 s, 18.7 MB/s
root@beta:~#

But what I noticed more, is the about of written data to the disks:

root@noisy:~# iostat -k sdb sdc sdd sde
Linux 2.6.38-999-generic (noisy)     01/31/2011     _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.51    0.00   11.86   18.09    0.00   62.55

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdb             157.47       110.50      2448.63   40500708  897484716
sdd             149.80        42.17      2409.39   15456072  883103944
sdc             160.24        48.25      2526.69   17683264  926094568
sde             163.65        76.56      2439.40   28060388  894102580

root@noisy:~# iostat -k sdb sdc sdd sde
Linux 2.6.38-999-generic (noisy)     01/31/2011     _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.51    0.00   11.86   18.09    0.00   62.54

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdb             157.52       110.48      2450.70   40503424  898471392
sdd             149.88        42.16      2411.66   15458072  884157636
sdc             160.30        48.24      2528.73   17685404  927080064
sde             163.73        76.55      2441.53   28063280  895109236

root@noisy:~#

The first iostat is of before the "dd", the second of after.

To my calculations a total of 3938MB was written while my replication level is at 3. In theory only 3*1024=3072MB should have been written. But there is a additional extra 28% of data which has been written to the disks.

This does not explain the very low performance of only 18MB/sec, but 28% of extra data being written seems a lot to me.

I benchmarked the individual OSD's and they are all around 55MB/sec, so that is fine.

Does Qemu-RBD do a lot of small files (with the extra 28% overhead) which brings the performance down and the I/O wait up?

Actions #2

Updated by Wido den Hollander about 13 years ago

Oh, I forgot my bonnie++ results.

Actions #3

Updated by Wido den Hollander about 13 years ago

While running rsync I sometimes see these messages appearing inside the VM:

Feb  1 17:47:17 beta kernel: [ 7126.530741] flush-251:2: page allocation failure. order:0, mode:0x20
Feb  1 17:47:17 beta kernel: [ 7126.530750] Pid: 1086, comm: flush-251:2 Not tainted 2.6.32-28-server #55-Ubuntu
Feb  1 17:47:17 beta kernel: [ 7126.530755] Call Trace:
Feb  1 17:47:17 beta kernel: [ 7126.530759]  <IRQ>  [<ffffffff810fa099>] __alloc_pages_slowpath+0x4a9/0x590
Feb  1 17:47:17 beta kernel: [ 7126.530780]  [<ffffffff810fa2f1>] __alloc_pages_nodemask+0x171/0x180
Feb  1 17:47:17 beta kernel: [ 7126.530789]  [<ffffffff8112d427>] alloc_pages_current+0x87/0xd0
Feb  1 17:47:17 beta kernel: [ 7126.530796]  [<ffffffff813dbab2>] try_fill_recv+0x182/0x200
Feb  1 17:47:17 beta kernel: [ 7126.530801]  [<ffffffff813dbcdd>] virtnet_poll+0x10d/0x160
Feb  1 17:47:17 beta kernel: [ 7126.530807]  [<ffffffff8147427f>] net_rx_action+0x10f/0x250
Feb  1 17:47:17 beta kernel: [ 7126.530814]  [<ffffffff8106d637>] __do_softirq+0xb7/0x1e0
Feb  1 17:47:17 beta kernel: [ 7126.530823]  [<ffffffff810936ba>] ? tick_program_event+0x2a/0x30
Feb  1 17:47:17 beta kernel: [ 7126.530829]  [<ffffffff810132ec>] call_softirq+0x1c/0x30
Feb  1 17:47:17 beta kernel: [ 7126.530833]  [<ffffffff81014cb5>] do_softirq+0x65/0xa0
Feb  1 17:47:17 beta kernel: [ 7126.530836]  [<ffffffff8106d4d5>] irq_exit+0x85/0x90
Feb  1 17:47:17 beta kernel: [ 7126.530844]  [<ffffffff815608f1>] smp_apic_timer_interrupt+0x71/0x9c
Feb  1 17:47:17 beta kernel: [ 7126.530848]  [<ffffffff81012cb3>] apic_timer_interrupt+0x13/0x20
Feb  1 17:47:17 beta kernel: [ 7126.530850]  <EOI>  [<ffffffff812a0d95>] ? __make_request+0x165/0x4a0
Feb  1 17:47:17 beta kernel: [ 7126.530861]  [<ffffffff8129f421>] ? generic_make_request+0x1b1/0x4f0
Feb  1 17:47:17 beta kernel: [ 7126.530865]  [<ffffffff810f5f05>] ? mempool_alloc_slab+0x15/0x20
Feb  1 17:47:17 beta kernel: [ 7126.530871]  [<ffffffff81436f11>] ? dm_merge_bvec+0xc1/0x140
Feb  1 17:47:17 beta kernel: [ 7126.530875]  [<ffffffff8129f7e0>] ? submit_bio+0x80/0x110
Feb  1 17:47:17 beta kernel: [ 7126.530882]  [<ffffffff81167382>] ? __mark_inode_dirty+0x42/0x1e0
Feb  1 17:47:17 beta kernel: [ 7126.530928]  [<ffffffffa00f6e74>] ? xfs_submit_ioend_bio+0x54/0x70 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.530945]  [<ffffffffa00f6f6b>] ? xfs_submit_ioend+0xdb/0xf0 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.530962]  [<ffffffffa00f7e2f>] ? xfs_page_state_convert+0x39f/0x720 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.530979]  [<ffffffffa00f830a>] ? xfs_vm_writepage+0x7a/0x130 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.530986]  [<ffffffff8110d265>] ? __dec_zone_page_state+0x35/0x40
Feb  1 17:47:17 beta kernel: [ 7126.530991]  [<ffffffff810fc0d7>] ? __writepage+0x17/0x40
Feb  1 17:47:17 beta kernel: [ 7126.530997]  [<ffffffff810fd257>] ? write_cache_pages+0x1d7/0x3e0
Feb  1 17:47:17 beta kernel: [ 7126.531001]  [<ffffffff810fc0c0>] ? __writepage+0x0/0x40
Feb  1 17:47:17 beta kernel: [ 7126.531006]  [<ffffffff810fd484>] ? generic_writepages+0x24/0x30
Feb  1 17:47:17 beta kernel: [ 7126.531023]  [<ffffffffa00f70fd>] ? xfs_vm_writepages+0x5d/0x80 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.531028]  [<ffffffff810fd4b1>] ? do_writepages+0x21/0x40
Feb  1 17:47:17 beta kernel: [ 7126.531032]  [<ffffffff81166676>] ? writeback_single_inode+0xf6/0x3d0
Feb  1 17:47:17 beta kernel: [ 7126.531037]  [<ffffffff81166da5>] ? writeback_sb_inodes+0x195/0x280
Feb  1 17:47:17 beta kernel: [ 7126.531044]  [<ffffffff81061671>] ? dequeue_entity+0x1a1/0x1e0
Feb  1 17:47:17 beta kernel: [ 7126.531048]  [<ffffffff811675c0>] ? writeback_inodes_wb+0xa0/0x1b0
Feb  1 17:47:17 beta kernel: [ 7126.531052]  [<ffffffff8116790b>] ? wb_writeback+0x23b/0x2a0
Feb  1 17:47:17 beta kernel: [ 7126.531057]  [<ffffffff81075fbc>] ? lock_timer_base+0x3c/0x70
Feb  1 17:47:17 beta kernel: [ 7126.531061]  [<ffffffff81076ad2>] ? del_timer_sync+0x22/0x30
Feb  1 17:47:17 beta kernel: [ 7126.531065]  [<ffffffff81167a19>] ? wb_do_writeback+0xa9/0x190
Feb  1 17:47:17 beta kernel: [ 7126.531068]  [<ffffffff810760d0>] ? process_timeout+0x0/0x10
Feb  1 17:47:17 beta kernel: [ 7126.531072]  [<ffffffff81167b53>] ? bdi_writeback_task+0x53/0xf0
Feb  1 17:47:17 beta kernel: [ 7126.531077]  [<ffffffff8110efa6>] ? bdi_start_fn+0x86/0x100
Feb  1 17:47:17 beta kernel: [ 7126.531081]  [<ffffffff8110ef20>] ? bdi_start_fn+0x0/0x100
Feb  1 17:47:17 beta kernel: [ 7126.531086]  [<ffffffff81084086>] ? kthread+0x96/0xa0
Feb  1 17:47:17 beta kernel: [ 7126.531090]  [<ffffffff810131ea>] ? child_rip+0xa/0x20
Feb  1 17:47:17 beta kernel: [ 7126.531094]  [<ffffffff81083ff0>] ? kthread+0x0/0xa0
Feb  1 17:47:17 beta kernel: [ 7126.531098]  [<ffffffff810131e0>] ? child_rip+0x0/0x20
Feb  1 17:47:17 beta kernel: [ 7126.531100] Mem-Info:
Feb  1 17:47:17 beta kernel: [ 7126.531104] Node 0 DMA per-cpu:
Feb  1 17:47:17 beta kernel: [ 7126.531108] CPU    0: hi:    0, btch:   1 usd:   0
Feb  1 17:47:17 beta kernel: [ 7126.531110] Node 0 DMA32 per-cpu:
Feb  1 17:47:17 beta kernel: [ 7126.531113] CPU    0: hi:  186, btch:  31 usd:  30
Feb  1 17:47:17 beta kernel: [ 7126.531120] active_anon:1457 inactive_anon:1616 isolated_anon:0
Feb  1 17:47:17 beta kernel: [ 7126.531122]  active_file:4171 inactive_file:229442 isolated_file:0
Feb  1 17:47:17 beta kernel: [ 7126.531123]  unevictable:0 dirty:9308 writeback:15301 unstable:0
Feb  1 17:47:17 beta kernel: [ 7126.531124]  free:1340 slab_reclaimable:8181 slab_unreclaimable:1908
Feb  1 17:47:17 beta kernel: [ 7126.531125]  mapped:1407 shmem:55 pagetables:330 bounce:0
Feb  1 17:47:17 beta kernel: [ 7126.531128] Node 0 DMA free:3996kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11548kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:332kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb  1 17:47:17 beta kernel: [ 7126.531142] lowmem_reserve[]: 0 994 994 994
Feb  1 17:47:17 beta kernel: [ 7126.531147] Node 0 DMA32 free:1364kB min:4000kB low:5000kB high:6000kB active_anon:5828kB inactive_anon:6464kB active_file:16684kB inactive_file:906220kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018060kB mlocked:0kB dirty:37232kB writeback:61204kB mapped:5628kB shmem:220kB slab_reclaimable:32392kB slab_unreclaimable:7608kB kernel_stack:632kB pagetables:1320kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb  1 17:47:17 beta kernel: [ 7126.531160] lowmem_reserve[]: 0 0 0 0
Feb  1 17:47:17 beta kernel: [ 7126.531165] Node 0 DMA: 1*4kB 1*8kB 3*16kB 1*32kB 5*64kB 10*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3996kB
Feb  1 17:47:17 beta kernel: [ 7126.531177] Node 0 DMA32: 1*4kB 0*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1364kB
Feb  1 17:47:17 beta kernel: [ 7126.531188] 233671 total pagecache pages
Feb  1 17:47:17 beta kernel: [ 7126.531191] 0 pages in swap cache
Feb  1 17:47:17 beta kernel: [ 7126.531193] Swap cache stats: add 0, delete 0, find 0/0
Feb  1 17:47:17 beta kernel: [ 7126.531196] Free swap  = 659448kB
Feb  1 17:47:17 beta kernel: [ 7126.531197] Total swap = 659448kB
Feb  1 17:47:17 beta kernel: [ 7126.537113] 262139 pages RAM
Feb  1 17:47:17 beta kernel: [ 7126.537118] 6224 pages reserved
Feb  1 17:47:17 beta kernel: [ 7126.537120] 235593 pages shared
Feb  1 17:47:17 beta kernel: [ 7126.537122] 22041 pages non-shared
Feb  1 17:47:17 beta kernel: [ 7126.537435] flush-251:2: page allocation failure. order:0, mode:0x4020
Feb  1 17:47:17 beta kernel: [ 7126.537441] Pid: 1086, comm: flush-251:2 Not tainted 2.6.32-28-server #55-Ubuntu
Feb  1 17:47:17 beta kernel: [ 7126.537444] Call Trace:
Feb  1 17:47:17 beta kernel: [ 7126.537458]  [<ffffffff810fa099>] __alloc_pages_slowpath+0x4a9/0x590
Feb  1 17:47:17 beta kernel: [ 7126.537463]  [<ffffffff810fa2f1>] __alloc_pages_nodemask+0x171/0x180
Feb  1 17:47:17 beta kernel: [ 7126.537470]  [<ffffffff8112d427>] alloc_pages_current+0x87/0xd0
Feb  1 17:47:17 beta kernel: [ 7126.537475]  [<ffffffff81133467>] new_slab+0x2f7/0x310
Feb  1 17:47:17 beta kernel: [ 7126.537480]  [<ffffffff81135d11>] __slab_alloc+0x201/0x2d0
Feb  1 17:47:17 beta kernel: [ 7126.537488]  [<ffffffff8131d4b4>] ? vring_add_indirect+0x34/0x1c0
Feb  1 17:47:17 beta kernel: [ 7126.537492]  [<ffffffff811370ad>] __kmalloc+0x16d/0x1a0
Feb  1 17:47:17 beta kernel: [ 7126.537496]  [<ffffffff8131d4b4>] vring_add_indirect+0x34/0x1c0
Feb  1 17:47:17 beta kernel: [ 7126.537500]  [<ffffffff8131d868>] vring_add_buf+0x228/0x280
Feb  1 17:47:17 beta kernel: [ 7126.537506]  [<ffffffff81381509>] do_req+0x1a9/0x2c0
Feb  1 17:47:17 beta kernel: [ 7126.537510]  [<ffffffff8138165e>] do_virtblk_request+0x3e/0x90
Feb  1 17:47:17 beta kernel: [ 7126.537517]  [<ffffffff8129e9b3>] __generic_unplug_device+0x33/0x40
Feb  1 17:47:17 beta kernel: [ 7126.537521]  [<ffffffff81295e0a>] elv_insert+0x8a/0x200
Feb  1 17:47:17 beta kernel: [ 7126.537524]  [<ffffffff81295ff2>] __elv_add_request+0x72/0xd0
Feb  1 17:47:17 beta kernel: [ 7126.537529]  [<ffffffff812a0d5f>] __make_request+0x12f/0x4a0
Feb  1 17:47:17 beta kernel: [ 7126.537532]  [<ffffffff8129f421>] generic_make_request+0x1b1/0x4f0
Feb  1 17:47:17 beta kernel: [ 7126.537537]  [<ffffffff810f5f05>] ? mempool_alloc_slab+0x15/0x20
Feb  1 17:47:17 beta kernel: [ 7126.537542]  [<ffffffff81436f11>] ? dm_merge_bvec+0xc1/0x140
Feb  1 17:47:17 beta kernel: [ 7126.537546]  [<ffffffff8129f7e0>] submit_bio+0x80/0x110
Feb  1 17:47:17 beta kernel: [ 7126.537552]  [<ffffffff81167382>] ? __mark_inode_dirty+0x42/0x1e0
Feb  1 17:47:17 beta kernel: [ 7126.537578]  [<ffffffffa00f6e74>] xfs_submit_ioend_bio+0x54/0x70 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.537594]  [<ffffffffa00f6f6b>] xfs_submit_ioend+0xdb/0xf0 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.537609]  [<ffffffffa00f7e2f>] xfs_page_state_convert+0x39f/0x720 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.537625]  [<ffffffffa00f830a>] xfs_vm_writepage+0x7a/0x130 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.537631]  [<ffffffff8110d265>] ? __dec_zone_page_state+0x35/0x40
Feb  1 17:47:17 beta kernel: [ 7126.537636]  [<ffffffff810fc0d7>] __writepage+0x17/0x40
Feb  1 17:47:17 beta kernel: [ 7126.537640]  [<ffffffff810fd257>] write_cache_pages+0x1d7/0x3e0
Feb  1 17:47:17 beta kernel: [ 7126.537645]  [<ffffffff810fc0c0>] ? __writepage+0x0/0x40
Feb  1 17:47:17 beta kernel: [ 7126.537650]  [<ffffffff810fd484>] generic_writepages+0x24/0x30
Feb  1 17:47:17 beta kernel: [ 7126.537665]  [<ffffffffa00f70fd>] xfs_vm_writepages+0x5d/0x80 [xfs]
Feb  1 17:47:17 beta kernel: [ 7126.537670]  [<ffffffff810fd4b1>] do_writepages+0x21/0x40
Feb  1 17:47:17 beta kernel: [ 7126.537674]  [<ffffffff81166676>] writeback_single_inode+0xf6/0x3d0
Feb  1 17:47:17 beta kernel: [ 7126.537679]  [<ffffffff81166da5>] writeback_sb_inodes+0x195/0x280
Feb  1 17:47:17 beta kernel: [ 7126.537685]  [<ffffffff81061671>] ? dequeue_entity+0x1a1/0x1e0
Feb  1 17:47:17 beta kernel: [ 7126.537689]  [<ffffffff811675c0>] writeback_inodes_wb+0xa0/0x1b0
Feb  1 17:47:17 beta kernel: [ 7126.537692]  [<ffffffff8116790b>] wb_writeback+0x23b/0x2a0
Feb  1 17:47:17 beta kernel: [ 7126.537698]  [<ffffffff81075fbc>] ? lock_timer_base+0x3c/0x70
Feb  1 17:47:17 beta kernel: [ 7126.537702]  [<ffffffff81076ad2>] ? del_timer_sync+0x22/0x30
Feb  1 17:47:17 beta kernel: [ 7126.537706]  [<ffffffff81167a19>] wb_do_writeback+0xa9/0x190
Feb  1 17:47:17 beta kernel: [ 7126.537709]  [<ffffffff810760d0>] ? process_timeout+0x0/0x10
Feb  1 17:47:17 beta kernel: [ 7126.537713]  [<ffffffff81167b53>] bdi_writeback_task+0x53/0xf0
Feb  1 17:47:17 beta kernel: [ 7126.537718]  [<ffffffff8110efa6>] bdi_start_fn+0x86/0x100
Feb  1 17:47:17 beta kernel: [ 7126.537721]  [<ffffffff8110ef20>] ? bdi_start_fn+0x0/0x100
Feb  1 17:47:17 beta kernel: [ 7126.537726]  [<ffffffff81084086>] kthread+0x96/0xa0
Feb  1 17:47:17 beta kernel: [ 7126.537732]  [<ffffffff810131ea>] child_rip+0xa/0x20
Feb  1 17:47:17 beta kernel: [ 7126.537736]  [<ffffffff81083ff0>] ? kthread+0x0/0xa0
Feb  1 17:47:17 beta kernel: [ 7126.537739]  [<ffffffff810131e0>] ? child_rip+0x0/0x20
Feb  1 17:47:17 beta kernel: [ 7126.537741] Mem-Info:
Feb  1 17:47:17 beta kernel: [ 7126.537743] Node 0 DMA per-cpu:
Feb  1 17:47:17 beta kernel: [ 7126.537747] CPU    0: hi:    0, btch:   1 usd:   0
Feb  1 17:47:17 beta kernel: [ 7126.537749] Node 0 DMA32 per-cpu:
Feb  1 17:47:17 beta kernel: [ 7126.537752] CPU    0: hi:  186, btch:  31 usd:  30
Feb  1 17:47:17 beta kernel: [ 7126.537759] active_anon:1457 inactive_anon:1616 isolated_anon:0
Feb  1 17:47:17 beta kernel: [ 7126.537760]  active_file:4171 inactive_file:229442 isolated_file:0
Feb  1 17:47:17 beta kernel: [ 7126.537762]  unevictable:0 dirty:9308 writeback:15301 unstable:0
Feb  1 17:47:17 beta kernel: [ 7126.537763]  free:1340 slab_reclaimable:8181 slab_unreclaimable:1908
Feb  1 17:47:17 beta kernel: [ 7126.537764]  mapped:1407 shmem:55 pagetables:330 bounce:0
Feb  1 17:47:17 beta kernel: [ 7126.537767] Node 0 DMA free:3996kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11548kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:332kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb  1 17:47:17 beta kernel: [ 7126.537780] lowmem_reserve[]: 0 994 994 994
Feb  1 17:47:17 beta kernel: [ 7126.537785] Node 0 DMA32 free:1364kB min:4000kB low:5000kB high:6000kB active_anon:5828kB inactive_anon:6464kB active_file:16684kB inactive_file:906220kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018060kB mlocked:0kB dirty:37232kB writeback:61204kB mapped:5628kB shmem:220kB slab_reclaimable:32392kB slab_unreclaimable:7608kB kernel_stack:632kB pagetables:1320kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb  1 17:47:17 beta kernel: [ 7126.537799] lowmem_reserve[]: 0 0 0 0
Feb  1 17:47:17 beta kernel: [ 7126.537803] Node 0 DMA: 1*4kB 1*8kB 3*16kB 1*32kB 5*64kB 10*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3996kB
Feb  1 17:47:17 beta kernel: [ 7126.537815] Node 0 DMA32: 1*4kB 0*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1364kB
Feb  1 17:47:17 beta kernel: [ 7126.537827] 233671 total pagecache pages
Feb  1 17:47:17 beta kernel: [ 7126.537829] 0 pages in swap cache
Feb  1 17:47:17 beta kernel: [ 7126.537831] Swap cache stats: add 0, delete 0, find 0/0
Feb  1 17:47:17 beta kernel: [ 7126.537834] Free swap  = 659448kB
Feb  1 17:47:17 beta kernel: [ 7126.537835] Total swap = 659448kB
Feb  1 17:47:17 beta kernel: [ 7126.540126] 262139 pages RAM
Feb  1 17:47:17 beta kernel: [ 7126.540126] 6224 pages reserved
Feb  1 17:47:17 beta kernel: [ 7126.540126] 235593 pages shared
Feb  1 17:47:17 beta kernel: [ 7126.540126] 22041 pages non-shared
Feb  1 17:47:17 beta kernel: [ 7126.540126] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
Feb  1 17:47:17 beta kernel: [ 7126.540126]   cache: kmalloc-2048, object size: 2048, buffer size: 2048, default order: 2, min order: 0
Feb  1 17:47:17 beta kernel: [ 7126.540126]   node 0: slabs: 31, objs: 248, free: 0

It seems that XFS can't flush enough at some point and runs out of allocated memory.

I could try and change "vm.min_free_kbytes", but the messages should not appear.

Could be related to the slow write performance I'm seeing, but I just wanted to report it.

Actions #4

Updated by Wido den Hollander about 13 years ago

I've spent a lot of time testing and finding out where this could come from, but it seems to be done now for no good reason.

I'm running 2.6.38-rc5 on all my OSD's (2 physical machines atm) with the btrfs code which is included in rc5. The ceph version I'm using is 4231cef68f534364767e98570e9b0785c4cc18e3 ( Merge remote branch 'origin/max_commit_size' ).

Right now the write speeds seem OK (about 70MB/sec with replication = 3), but even better is de CPU usage. Where my machine "noisy" would have a load of about 7 or 8 during the rsync (See above), it's down to 2 ~ 3.

The virtual machine is also working much, much, much better. No more stalling in the VM, no messages in the dmesg either. rsync runs fine. Right now I've rsynced about 400GB of data multiple times and it is working fine.

Recovery actions also work much better, they run smooth and are not taking the whole system down, it actually keeps working during a recovery, a bit slower, but it works!

Took a lot of time, but this combination seems to be going fine.

Actions #5

Updated by Sage Weil about 13 years ago

  • Status changed from New to Resolved

That's good news. I'm not really sure what was going wrong here before either. Let's see if this comes back.

Actions

Also available in: Atom PDF