Actions
Bug #304
closedGPF in writepages_finish
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
untar_snap_rm.sh
[ 5046.221121] general protection fault: 0000 [#1] PREEMPT SMP [ 5046.222540] last sysfs file: /sys/kernel/uevent_seqnum [ 5046.222540] CPU 0 [ 5046.222540] Modules linked in: aes_x86_64 aes_generic ceph fan ac battery container ehci_hcd uhci_hcd thermal processor button [ 5046.222540] [ 5046.222540] Pid: 2578, comm: ceph-msgr/0 Not tainted 2.6.35-rc6+ #52 PDSMi+/PDSMi [ 5046.222540] RIP: 0010:[<ffffffffa00e6eab>] [<ffffffffa00e6eab>] writepages_finish+0x140/0x3dd [ceph] [ 5046.222540] RSP: 0018:ffff88011dd3fab0 EFLAGS: 00010202 [ 5046.222540] RAX: ffff8800badca000 RBX: 6b6b6b6b6b6b6b6b RCX: ffff88011c19eeb8 [ 5046.222540] RDX: 0000000000000400 RSI: 000000000000027d RDI: ffff88011dc31000 [ 5046.222540] RBP: ffff88011dd3fbb0 R08: 0000000000000000 R09: 0000000000000002 [ 5046.222540] R10: ffff88011dd3fb80 R11: ffff88011ded8048 R12: 0000000000000000 [ 5046.222540] R13: 0000000000000002 R14: ffff88011dc31000 R15: ffff8800cfc42ce0 [ 5046.222540] FS: 0000000000000000(0000) GS:ffff880002a00000(0000) knlGS:0000000000000000 [ 5046.222540] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 5046.222540] CR2: 00007f2043671210 CR3: 000000011dccf000 CR4: 00000000000006f0 [ 5046.222540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5046.222540] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 5046.222540] Process ceph-msgr/0 (pid: 2578, threadinfo ffff88011dd3e000, task ffff88011c19e890) [ 5046.222540] Stack: [ 5046.222540] ffff88011dd3fad0 ffffffff8102bd1d ffffffff81d10050 000004001dd3ffd8 [ 5046.222540] <0> ffff88011dc31b48 ffff8800adeefa10 ffff8800adeef5c0 000000001c19e890 [ 5046.222540] <0> ffff88011c2974c8 00003f5500000007 0000000000000006 ffff88011d3771a8 [ 5046.222540] Call Trace: [ 5046.222540] [<ffffffff8102bd1d>] ? get_parent_ip+0x11/0x41 [ 5046.222540] [<ffffffff81058893>] ? mark_held_locks+0x49/0x64 [ 5046.222540] [<ffffffff8144d1e5>] ? __mutex_unlock_slowpath+0x10d/0x130 [ 5046.222540] [<ffffffff81058a2a>] ? trace_hardirqs_on+0xd/0xf [ 5046.222540] [<ffffffffa0102cc8>] dispatch+0x257/0x429 [ceph] [ 5046.222540] [<ffffffffa00f4b2c>] try_read+0xc44/0x129b [ceph] [ 5046.222540] [<ffffffffa00f6bf1>] ? con_work+0xad/0x6b2 [ceph] [ 5046.222540] [<ffffffff8144d65e>] ? mutex_lock_nested+0x2f7/0x314 [ 5046.222540] [<ffffffffa00f6bf1>] ? con_work+0xad/0x6b2 [ceph] [ 5046.222540] [<ffffffffa00f6c6d>] con_work+0x129/0x6b2 [ceph] [ 5046.222540] [<ffffffff810484f6>] worker_thread+0x1e8/0x2fa [ 5046.222540] [<ffffffff8104849d>] ? worker_thread+0x18f/0x2fa [ 5046.222540] [<ffffffff8102ce8f>] ? sub_preempt_count+0x92/0x9e [ 5046.222540] [<ffffffffa00f6b44>] ? con_work+0x0/0x6b2 [ceph] [ 5046.222540] [<ffffffff8104b5b8>] ? autoremove_wake_function+0x0/0x38 [ 5046.222540] [<ffffffff8104830e>] ? worker_thread+0x0/0x2fa [ 5046.222540] [<ffffffff8104b286>] kthread+0x7d/0x85 [ 5046.222540] [<ffffffff81003794>] kernel_thread_helper+0x4/0x10 [ 5046.222540] [<ffffffff8144fc40>] ? restore_args+0x0/0x30 [ 5046.222540] [<ffffffff8104b209>] ? kthread+0x0/0x85 [ 5046.222540] [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10 [ 5046.222540] Code: e2 00 04 00 00 89 95 1c ff ff ff e9 d6 00 00 00 4a 8d 04 e5 00 00 00 00 49 03 87 20 02 00 00 48 8b 18 48 85 db 75 04 0f 0b eb fe <f6> 03 08 0f 84 7e 02 00 00 48 83 c9 ff f0 49 0f c1 8e 40 0b 00 [ 5046.222540] RIP [<ffffffffa00e6eab>] writepages_finish+0x140/0x3dd [ceph] [ 5046.222540] RSP <ffff88011dd3fab0> [ 5046.541042] ---[ end trace 8c2df756baa8bb35 ]---
Updated by Sage Weil almost 14 years ago
another node got this, probably the same bug?
[ 5037.964266] general protection fault: 0000 [#1] PREEMPT SMP [ 5037.964422] last sysfs file: /sys/kernel/uevent_seqnum [ 5037.964422] CPU 0 [ 5037.964422] Modules linked in: aes_x86_64 aes_generic ceph fan ac battery container ehci_hcd uhci_hcd thermal processor button [ 5037.964422] [ 5037.964422] Pid: 2571, comm: ceph-msgr/0 Not tainted 2.6.35-rc6+ #52 PDSMi+/PDSMi [ 5037.964422] RIP: 0010:[<ffffffff813df433>] [<ffffffff813df433>] tcp_sendpage+0x327/0x5d3 [ 5037.964422] RSP: 0018:ffff88011c5f3bd0 EFLAGS: 00010246 [ 5037.964422] RAX: 0000000000000001 RBX: ffff88010a441688 RCX: 0000000000000a9a [ 5037.964422] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000a9a RDI: ffff88010a441688 [ 5037.964422] RBP: ffff88011c5f3c60 R08: 0000000000000000 R09: 0000000000000004 [ 5037.964422] R10: ffffffff813df165 R11: ffff88011b847108 R12: ffff88011df38de0 [ 5037.964422] R13: 0000000000000a9a R14: 0000000000000001 R15: 0000000000000000 [ 5037.964422] FS: 0000000000000000(0000) GS:ffff880002a00000(0000) knlGS:0000000000000000 [ 5037.964422] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 5037.964422] CR2: 00007f05c5314000 CR3: 000000011c611000 CR4: 00000000000006f0 [ 5037.964422] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5037.964422] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 5037.964422] Process ceph-msgr/0 (pid: 2571, threadinfo ffff88011c5f2000, task ffff88011d3506d0) [ 5037.964422] Stack: [ 5037.964422] 0000000000000000 ffff880100000000 0000000000000000 ffff88010a441860 [ 5037.964422] <0> 0000c0401c5f3c10 0000000000001000 00000000000000b6 0000000000000000 [ 5037.964422] <0> 00018801000005a8 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 0000000000000000 [ 5037.964422] Call Trace: [ 5037.964422] [<ffffffff813a476d>] kernel_sendpage+0x16/0x1f [ 5037.964422] [<ffffffffa00e97cc>] try_write+0x649/0xff4 [ceph] [ 5037.964422] [<ffffffffa00eac79>] con_work+0x135/0x6b2 [ceph] [ 5037.964422] [<ffffffff810484f6>] worker_thread+0x1e8/0x2fa [ 5037.964422] [<ffffffff8104849d>] ? worker_thread+0x18f/0x2fa [ 5037.964422] [<ffffffff8102ce8f>] ? sub_preempt_count+0x92/0x9e [ 5037.964422] [<ffffffffa00eab44>] ? con_work+0x0/0x6b2 [ceph] [ 5037.964422] [<ffffffff8104b5b8>] ? autoremove_wake_function+0x0/0x38 [ 5037.964422] [<ffffffff8104830e>] ? worker_thread+0x0/0x2fa [ 5037.964422] [<ffffffff8104b286>] kthread+0x7d/0x85 [ 5037.964422] [<ffffffff81003794>] kernel_thread_helper+0x4/0x10 [ 5037.964422] [<ffffffff8144fc40>] ? restore_args+0x0/0x30 [ 5037.964422] [<ffffffff8104b209>] ? kthread+0x0/0x85 [ 5037.964422] [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10 [ 5037.964422] Code: 00 45 85 c0 74 21 41 8b 94 24 a8 00 00 00 41 8d 46 ff 49 03 94 24 b0 00 00 00 48 98 48 c1 e0 04 44 01 6c 02 3c eb 61 48 8b 55 b8 <66> 83 3a 00 79 04 48 8b 52 10 8b 42 08 85 c0 75 04 0f 0b eb fe [ 5037.964422] RIP [<ffffffff813df433>] tcp_sendpage+0x327/0x5d3 [ 5037.964422] RSP <ffff88011c5f3bd0> [ 5038.238660] ---[ end trace 07169429531399da ]---
Updated by Sage Weil almost 14 years ago
the first crash is addr.c:534,
WARN_ON(!PageUptodate(page));
bad page pointer page=5b5b5b.., i=0.
Updated by Sage Weil over 13 years ago
- Target version changed from v2.6.35 to v2.6.36
Updated by Sage Weil over 13 years ago
- Status changed from New to Can't reproduce
Actions