Actions
Bug #150
closedorder:1 page allocation failure
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
workload was rsync to a ceph mount.
ceph3 mounting cosd0:/
not sure which version. probably unstable from last week?
[336121.073871] ceph-msgr/0: page allocation failure. order:1, mode:0x20 [336121.073874] Pid: 2640, comm: ceph-msgr/0 Not tainted 2.6.34 #29 [336121.073876] Call Trace: [336121.073878] <IRQ> [<ffffffff81083eae>] __alloc_pages_nodemask+0x635/0x673 [336121.073889] [<ffffffff81056810>] ? put_lock_stats+0x25/0x27 [336121.073894] [<ffffffff810a7e7e>] kmem_getpages+0x64/0x12f [336121.073897] [<ffffffff810a8906>] fallback_alloc+0x127/0x1a4 [336121.073901] [<ffffffff810a8ac8>] ____cache_alloc_node+0x145/0x15a [336121.073904] [<ffffffff810a83da>] kmem_cache_alloc_node+0xf8/0x159 [336121.073908] [<ffffffff810a8488>] ? __kmalloc_node_track_caller+0x24/0x29 [336121.073912] [<ffffffff810a8488>] __kmalloc_node_track_caller+0x24/0x29 [336121.073916] [<ffffffff8138ff10>] __alloc_skb+0x6f/0x15e [336121.073920] [<ffffffff813c9480>] tcp_send_ack+0x29/0xd4 [336121.073923] [<ffffffff813cc664>] tcp_delack_timer+0x16d/0x1c8 [336121.073927] [<ffffffff8103fa92>] run_timer_softirq+0x1e8/0x275 [336121.073930] [<ffffffff8103fa03>] ? run_timer_softirq+0x159/0x275 [336121.073934] [<ffffffff813cc4f7>] ? tcp_delack_timer+0x0/0x1c8 [336121.073939] [<ffffffff8103ab68>] ? __do_softirq+0x68/0x156 [336121.073943] [<ffffffff8103abbc>] __do_softirq+0xbc/0x156 [336121.073948] [<ffffffff810038cc>] call_softirq+0x1c/0x34 [336121.073951] [<ffffffff810056d5>] do_softirq+0x38/0x83 [336121.073955] [<ffffffff8103aa98>] irq_exit+0x45/0x51 [336121.073959] [<ffffffff8101814c>] smp_apic_timer_interrupt+0x86/0x96 [336121.073963] [<ffffffff81003393>] apic_timer_interrupt+0x13/0x20 [336121.073965] <EOI> [<ffffffff81060ef0>] ? generic_exec_single+0x39/0x8b [336121.073972] [<ffffffff81060f29>] ? generic_exec_single+0x72/0x8b [336121.073976] [<ffffffff810610c1>] smp_call_function_single+0x101/0x13a [336121.073980] [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12 [336121.073984] [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12 [336121.073988] [<ffffffff810611cf>] smp_call_function_many+0xd5/0x197 [336121.073992] [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12 [336121.073995] [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12 [336121.073999] [<ffffffff810612c9>] smp_call_function+0x38/0x63 [336121.074003] [<ffffffff810831ab>] ? drain_local_pages+0x0/0x12 [336121.074006] [<ffffffff8103a595>] on_each_cpu+0x2c/0x6b [336121.074010] [<ffffffff81083877>] drain_all_pages+0x17/0x19 [336121.074014] [<ffffffff81083d1c>] __alloc_pages_nodemask+0x4a3/0x673 [336121.074018] [<ffffffff810a7e7e>] kmem_getpages+0x64/0x12f [336121.074022] [<ffffffff810a8906>] fallback_alloc+0x127/0x1a4 [336121.074026] [<ffffffff810a8ac8>] ____cache_alloc_node+0x145/0x15a [336121.074029] [<ffffffff810a83da>] kmem_cache_alloc_node+0xf8/0x159 [336121.074033] [<ffffffff810a8488>] ? __kmalloc_node_track_caller+0x24/0x29 [336121.074036] [<ffffffff810a8488>] __kmalloc_node_track_caller+0x24/0x29 [336121.074040] [<ffffffff8138ff10>] __alloc_skb+0x6f/0x15e [336121.074044] [<ffffffff813be43b>] sk_stream_alloc_skb+0x38/0xed [336121.074048] [<ffffffff813bf1ec>] tcp_sendpage+0x14b/0x5d4 [336121.074053] [<ffffffff81387081>] kernel_sendpage+0x16/0x1f [336121.074081] [<ffffffffa008ec41>] try_write+0x710/0x10eb [ceph] [336121.074103] [<ffffffffa0090135>] con_work+0x135/0x6b2 [ceph] [336121.074108] [<ffffffff8104786b>] worker_thread+0x1e8/0x2fa [336121.074112] [<ffffffff81047812>] ? worker_thread+0x18f/0x2fa [336121.074133] [<ffffffffa0090000>] ? con_work+0x0/0x6b2 [ceph] [336121.074137] [<ffffffff8104a990>] ? autoremove_wake_function+0x0/0x38 [336121.074141] [<ffffffff81047683>] ? worker_thread+0x0/0x2fa [336121.074145] [<ffffffff8104a65e>] kthread+0x7d/0x85 [336121.074150] [<ffffffff810037d4>] kernel_thread_helper+0x4/0x10 [336121.074155] [<ffffffff81429380>] ? restore_args+0x0/0x30 [336121.074158] [<ffffffff8104a5e1>] ? kthread+0x0/0x85 [336121.074162] [<ffffffff810037d0>] ? kernel_thread_helper+0x0/0x10 [336121.074164] Mem-Info: [336121.074166] Node 0 DMA per-cpu: [336121.074168] CPU 0: hi: 0, btch: 1 usd: 0 [336121.074171] CPU 1: hi: 0, btch: 1 usd: 0 [336121.074172] Node 0 DMA32 per-cpu: [336121.074175] CPU 0: hi: 186, btch: 31 usd: 185 [336121.074178] CPU 1: hi: 186, btch: 31 usd: 33 [336121.074183] active_anon:4954 inactive_anon:1010 isolated_anon:0 [336121.074185] active_file:1896 inactive_file:583520 isolated_file:0 [336121.074186] unevictable:0 dirty:61301 writeback:2023 unstable:0 [336121.074187] free:28448 slab_reclaimable:4350 slab_unreclaimable:371260 [336121.074189] mapped:1396 shmem:1129 pagetables:536 bounce:0 [336121.074190] Node 0 DMA free:15816kB min:252kB low:312kB high:376kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15708kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:72kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [336121.074201] lowmem_reserve[]: 0 3929 3929 3929 [336121.074205] Node 0 DMA32 free:97976kB min:65280kB low:81600kB high:97920kB active_anon:19816kB inactive_anon:4040kB active_file:7584kB inactive_file:2334080kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4023776kB mlocked:0kB dirty:245204kB writeback:8092kB mapped:5584kB shmem:4516kB slab_reclaimable:17400kB slab_unreclaimable:1484968kB kernel_stack:848kB pagetables:2144kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [336121.074217] lowmem_reserve[]: 0 0 0 0 [336121.074221] Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15816kB [336121.074233] Node 0 DMA32: 22730*4kB 562*8kB 22*16kB 7*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 97976kB [336121.074245] 586781 total pagecache pages [336121.074247] 216 pages in swap cache [336121.074249] Swap cache stats: add 3407, delete 3191, find 5721/5843 [336121.074251] Free swap = 9760388kB [336121.074253] Total swap = 9767484kB [336121.077856] 1023968 pages RAM [336121.077856] 20170 pages reserved [336121.077856] 12344 pages shared [336121.077856] 783435 pages non-shared
Updated by Sage Weil almost 14 years ago
- Subject changed from page allocation failure to order:1 page allocation failure
Updated by Yehuda Sadeh almost 14 years ago
Too many dirty pages? Too many pending osd requests?
We should probably try to get how many osds requests were in-flight, and also the bdi info.
Updated by Sage Weil almost 14 years ago
- Status changed from New to Can't reproduce
we've fixed a bunch of memory leaks. haven't seen this recently.
Actions