Project

General

Profile

Actions

Bug #252

closed

GFP at tcp_sendpage+0x327/0x5d3

Added by Sage Weil almost 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

just saw this on both ceph2 and ceph4. running bonnie.sh and .. iozone? and a few times earlier this week.

client is on master branch, eb2d68b502a3460a28ec21076ba759d38023131c, + yehuda's module loading fix.

[254410.244871] ceph: osd1 10.3.14.128:6800 connection failed
[254413.248422] ceph: osd1 10.3.14.128:6800 connection failed
[254417.499979] ceph: osd1 10.3.14.128:6800 connection failed
[254420.503820] ceph: osd1 10.3.14.128:6800 connection failed
[254427.503082] ceph: osd1 10.3.14.128:6800 connection failed
[254438.510184] ceph: osd1 10.3.14.128:6800 connection failed
[254457.548634] ceph: osd1 10.3.14.128:6800 connection failed
[254492.553782] ceph: osd1 10.3.14.128:6800 connection failed
[254557.464357] ceph:  tid 441784 timed out on osd1, will reset osd
[254559.556324] ceph: osd1 10.3.14.128:6800 connection failed
[254611.507868] ceph: mds0 hung
[254622.563055] ceph:  tid 441784 timed out on osd1, will reset osd
[254682.654162] ceph:  tid 441784 timed out on osd1, will reset osd
[254688.691700] ceph: mon0 10.3.14.136:6789 socket closed
[254688.697331] ceph: mon0 10.3.14.136:6789 session lost, hunting for new mon
[254688.950176] ceph: osd2 down
[254688.953154] ceph: osd8 down
[254688.956312] ceph: mon0 10.3.14.136:6789 session established
[254697.685943] ceph: osd3 down
[254697.688915] ceph: osd4 down
[254704.047740] ceph: osd8 up
[254704.050559] ceph: osd8 weight 0x10000 (in)
[254707.701106] ceph: osd2 up
[254707.703880] ceph: osd2 weight 0x10000 (in)
[254709.429926] ceph: get_reply unknown tid 441781 from osd7
[254712.649652] ceph: mds0 came back
[254712.653038] ceph: mds0 caps went stale, renewing
[254712.712709] ceph: osd4 up
[254712.715494] ceph: osd4 weight 0x10000 (in)
[254717.724337] ceph: osd3 up
[254717.727128] ceph: osd3 weight 0x10000 (in)
[254735.795747] general protection fault: 0000 [#1] PREEMPT SMP 
[254735.797730] last sysfs file: /sys/kernel/uevent_seqnum
[254735.797730] CPU 0 
[254735.797730] Modules linked in: aes_x86_64 aes_generic ceph fan ac battery ehci_hcd uhci_hcd container thermal processor button
[254735.797730] 
[254735.797730] Pid: 2859, comm: ceph-msgr/0 Not tainted 2.6.35-rc3+ #44 PDSMi+/PDSMi
[254735.797730] RIP: 0010:[<ffffffff813df46b>]  [<ffffffff813df46b>] tcp_sendpage+0x327/0x5d3
[254735.797730] RSP: 0018:ffff88011bccbbd0  EFLAGS: 00010246
[254735.797730] RAX: ffffffff8171b390 RBX: ffff88011bed2aa8 RCX: 000000000000fe88
[254735.797730] RDX: 6b6b6b6b6b6b6b6b RSI: 00000000000001c8 RDI: ffff88007dacd3e0
[254735.797730] RBP: ffff88011bccbc60 R08: 0000000000000000 R09: ffffffff813df19d
[254735.797730] R10: ffffffff810aa8dc R11: ffff8800a2ee69b8 R12: ffff88011c4adde0
[254735.797730] R13: 0000000000000d4c R14: 0000000000000000 R15: 00000000000002b4
[254735.797730] FS:  0000000000000000(0000) GS:ffff880002600000(0000) knlGS:0000000000000000
[254735.797730] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[254735.797730] CR2: 00007ff945586250 CR3: 000000011bde6000 CR4: 00000000000006f0
[254735.797730] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[254735.797730] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[254735.797730] Process ceph-msgr/0 (pid: 2859, threadinfo ffff88011bcca000, task ffff88011bcd03d0)
[254735.797730] Stack:
[254735.797730]  0000000000000000 0000000000000000 0000000000000000 ffff88011bed2c80
[254735.797730] <0> 0000c04000000000 0000000000000d4c 000002b400004040 0000000000000000
[254735.797730] <0> 00008800000005a8 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 0000000000000000
[254735.797730] Call Trace:
[254735.797730]  [<ffffffff813a49bd>] kernel_sendpage+0x16/0x1f
[254735.797730]  [<ffffffffa00f8758>] try_write+0x649/0xff4 [ceph]
[254735.797730]  [<ffffffffa00f9bfa>] con_work+0x135/0x6b2 [ceph]
[254735.797730]  [<ffffffff81048406>] worker_thread+0x1e8/0x2fa
[254735.797730]  [<ffffffff810483ad>] ? worker_thread+0x18f/0x2fa
[254735.797730]  [<ffffffff8102ce5c>] ? sub_preempt_count+0x92/0x9e
[254735.797730]  [<ffffffffa00f9ac5>] ? con_work+0x0/0x6b2 [ceph]
[254735.797730]  [<ffffffff8104b4c8>] ? autoremove_wake_function+0x0/0x38
[254735.797730]  [<ffffffff8104821e>] ? worker_thread+0x0/0x2fa
[254735.797730]  [<ffffffff8104b196>] kthread+0x7d/0x85
[254735.797730]  [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[254735.797730]  [<ffffffff8144fc80>] ? restore_args+0x0/0x30
[254735.797730]  [<ffffffff8104b119>] ? kthread+0x0/0x85
[254735.797730]  [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
[254735.797730] Code: 00 45 85 c0 74 21 41 8b 94 24 a8 00 00 00 41 8d 46 ff 49 03 94 24 b0 00 00 00 48 98 48 c1 e0 04 44 01 6c 02 3c eb 61 48 8b 55 b8 <66> 83 3a 00 79 04 48 8b 52 10 8b 42 08 85 c0 75 04 0f 0b eb fe 
[254735.797730] RIP  [<ffffffff813df46b>] tcp_sendpage+0x327/0x5d3
[254735.797730]  RSP <ffff88011bccbbd0>
[254736.074181] ---[ end trace a23ea86bf8dbfa65 ]---
[254778.094378] ceph:  tid 424933 timed out on osd3, will reset osd
[254838.185488] ceph:  tid 424933 timed out on osd3, will reset osd
Actions #1

Updated by Sage Weil almost 14 years ago

  • Target version set to v2.6.35
Actions #2

Updated by Sage Weil almost 14 years ago

  • Status changed from New to Resolved

Ah, finally. Fixed by commit:ed98adad3d87594c55347824e85137d1829c9e70

Actions

Also available in: Atom PDF