Project

General

Profile

Actions

Bug #150

closed

order:1 page allocation failure

Added by Sage Weil almost 14 years ago. Updated over 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

workload was rsync to a ceph mount.
ceph3 mounting cosd0:/
not sure which version. probably unstable from last week?

[336121.073871] ceph-msgr/0: page allocation failure. order:1, mode:0x20
[336121.073874] Pid: 2640, comm: ceph-msgr/0 Not tainted 2.6.34 #29
[336121.073876] Call Trace:
[336121.073878]  <IRQ>  [<ffffffff81083eae>] __alloc_pages_nodemask+0x635/0x673
[336121.073889]  [<ffffffff81056810>] ? put_lock_stats+0x25/0x27
[336121.073894]  [<ffffffff810a7e7e>] kmem_getpages+0x64/0x12f
[336121.073897]  [<ffffffff810a8906>] fallback_alloc+0x127/0x1a4
[336121.073901]  [<ffffffff810a8ac8>] ____cache_alloc_node+0x145/0x15a
[336121.073904]  [<ffffffff810a83da>] kmem_cache_alloc_node+0xf8/0x159
[336121.073908]  [<ffffffff810a8488>] ? __kmalloc_node_track_caller+0x24/0x29
[336121.073912]  [<ffffffff810a8488>] __kmalloc_node_track_caller+0x24/0x29
[336121.073916]  [<ffffffff8138ff10>] __alloc_skb+0x6f/0x15e
[336121.073920]  [<ffffffff813c9480>] tcp_send_ack+0x29/0xd4
[336121.073923]  [<ffffffff813cc664>] tcp_delack_timer+0x16d/0x1c8
[336121.073927]  [<ffffffff8103fa92>] run_timer_softirq+0x1e8/0x275
[336121.073930]  [<ffffffff8103fa03>] ? run_timer_softirq+0x159/0x275
[336121.073934]  [<ffffffff813cc4f7>] ? tcp_delack_timer+0x0/0x1c8
[336121.073939]  [<ffffffff8103ab68>] ? __do_softirq+0x68/0x156
[336121.073943]  [<ffffffff8103abbc>] __do_softirq+0xbc/0x156
[336121.073948]  [<ffffffff810038cc>] call_softirq+0x1c/0x34
[336121.073951]  [<ffffffff810056d5>] do_softirq+0x38/0x83
[336121.073955]  [<ffffffff8103aa98>] irq_exit+0x45/0x51
[336121.073959]  [<ffffffff8101814c>] smp_apic_timer_interrupt+0x86/0x96
[336121.073963]  [<ffffffff81003393>] apic_timer_interrupt+0x13/0x20
[336121.073965]  <EOI>  [<ffffffff81060ef0>] ? generic_exec_single+0x39/0x8b
[336121.073972]  [<ffffffff81060f29>] ? generic_exec_single+0x72/0x8b
[336121.073976]  [<ffffffff810610c1>] smp_call_function_single+0x101/0x13a
[336121.073980]  [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12
[336121.073984]  [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12
[336121.073988]  [<ffffffff810611cf>] smp_call_function_many+0xd5/0x197
[336121.073992]  [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12
[336121.073995]  [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12
[336121.073999]  [<ffffffff810612c9>] smp_call_function+0x38/0x63
[336121.074003]  [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12
[336121.074006]  [<ffffffff8103a595>] on_each_cpu+0x2c/0x6b
[336121.074010]  [<ffffffff81083877>] drain_all_pages+0x17/0x19
[336121.074014]  [<ffffffff81083d1c>] __alloc_pages_nodemask+0x4a3/0x673
[336121.074018]  [<ffffffff810a7e7e>] kmem_getpages+0x64/0x12f
[336121.074022]  [<ffffffff810a8906>] fallback_alloc+0x127/0x1a4
[336121.074026]  [<ffffffff810a8ac8>] ____cache_alloc_node+0x145/0x15a
[336121.074029]  [<ffffffff810a83da>] kmem_cache_alloc_node+0xf8/0x159
[336121.074033]  [<ffffffff810a8488>] ? __kmalloc_node_track_caller+0x24/0x29
[336121.074036]  [<ffffffff810a8488>] __kmalloc_node_track_caller+0x24/0x29
[336121.074040]  [<ffffffff8138ff10>] __alloc_skb+0x6f/0x15e
[336121.074044]  [<ffffffff813be43b>] sk_stream_alloc_skb+0x38/0xed
[336121.074048]  [<ffffffff813bf1ec>] tcp_sendpage+0x14b/0x5d4
[336121.074053]  [<ffffffff81387081>] kernel_sendpage+0x16/0x1f
[336121.074081]  [<ffffffffa008ec41>] try_write+0x710/0x10eb [ceph]
[336121.074103]  [<ffffffffa0090135>] con_work+0x135/0x6b2 [ceph]
[336121.074108]  [<ffffffff8104786b>] worker_thread+0x1e8/0x2fa
[336121.074112]  [<ffffffff81047812>] ? worker_thread+0x18f/0x2fa
[336121.074133]  [<ffffffffa0090000>] ? con_work+0x0/0x6b2 [ceph]
[336121.074137]  [<ffffffff8104a990>] ? autoremove_wake_function+0x0/0x38
[336121.074141]  [<ffffffff81047683>] ? worker_thread+0x0/0x2fa
[336121.074145]  [<ffffffff8104a65e>] kthread+0x7d/0x85
[336121.074150]  [<ffffffff810037d4>] kernel_thread_helper+0x4/0x10
[336121.074155]  [<ffffffff81429380>] ? restore_args+0x0/0x30
[336121.074158]  [<ffffffff8104a5e1>] ? kthread+0x0/0x85
[336121.074162]  [<ffffffff810037d0>] ? kernel_thread_helper+0x0/0x10
[336121.074164] Mem-Info:
[336121.074166] Node 0 DMA per-cpu:
[336121.074168] CPU    0: hi:    0, btch:   1 usd:   0
[336121.074171] CPU    1: hi:    0, btch:   1 usd:   0
[336121.074172] Node 0 DMA32 per-cpu:
[336121.074175] CPU    0: hi:  186, btch:  31 usd: 185
[336121.074178] CPU    1: hi:  186, btch:  31 usd:  33
[336121.074183] active_anon:4954 inactive_anon:1010 isolated_anon:0
[336121.074185]  active_file:1896 inactive_file:583520 isolated_file:0
[336121.074186]  unevictable:0 dirty:61301 writeback:2023 unstable:0
[336121.074187]  free:28448 slab_reclaimable:4350 slab_unreclaimable:371260
[336121.074189]  mapped:1396 shmem:1129 pagetables:536 bounce:0
[336121.074190] Node 0 DMA free:15816kB min:252kB low:312kB high:376kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15708kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:72kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[336121.074201] lowmem_reserve[]: 0 3929 3929 3929
[336121.074205] Node 0 DMA32 free:97976kB min:65280kB low:81600kB high:97920kB active_anon:19816kB inactive_anon:4040kB active_file:7584kB inactive_file:2334080kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4023776kB mlocked:0kB dirty:245204kB writeback:8092kB mapped:5584kB shmem:4516kB slab_reclaimable:17400kB slab_unreclaimable:1484968kB kernel_stack:848kB pagetables:2144kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[336121.074217] lowmem_reserve[]: 0 0 0 0
[336121.074221] Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15816kB
[336121.074233] Node 0 DMA32: 22730*4kB 562*8kB 22*16kB 7*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 97976kB
[336121.074245] 586781 total pagecache pages
[336121.074247] 216 pages in swap cache
[336121.074249] Swap cache stats: add 3407, delete 3191, find 5721/5843
[336121.074251] Free swap  = 9760388kB
[336121.074253] Total swap = 9767484kB
[336121.077856] 1023968 pages RAM
[336121.077856] 20170 pages reserved
[336121.077856] 12344 pages shared
[336121.077856] 783435 pages non-shared
Actions #1

Updated by Sage Weil almost 14 years ago

  • Subject changed from page allocation failure to order:1 page allocation failure
Actions #2

Updated by Yehuda Sadeh almost 14 years ago

Too many dirty pages? Too many pending osd requests?
We should probably try to get how many osds requests were in-flight, and also the bdi info.

Actions #3

Updated by Sage Weil almost 14 years ago

  • Status changed from New to Can't reproduce

we've fixed a bunch of memory leaks. haven't seen this recently.

Actions

Also available in: Atom PDF