Project

General

Profile

Actions

Bug #304

closed

GPF in writepages_finish

Added by Sage Weil almost 14 years ago. Updated over 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

untar_snap_rm.sh

[ 5046.221121] general protection fault: 0000 [#1] PREEMPT SMP 
[ 5046.222540] last sysfs file: /sys/kernel/uevent_seqnum
[ 5046.222540] CPU 0 
[ 5046.222540] Modules linked in: aes_x86_64 aes_generic ceph fan ac battery container ehci_hcd uhci_hcd thermal processor button
[ 5046.222540] 
[ 5046.222540] Pid: 2578, comm: ceph-msgr/0 Not tainted 2.6.35-rc6+ #52 PDSMi+/PDSMi
[ 5046.222540] RIP: 0010:[<ffffffffa00e6eab>]  [<ffffffffa00e6eab>] writepages_finish+0x140/0x3dd [ceph]
[ 5046.222540] RSP: 0018:ffff88011dd3fab0  EFLAGS: 00010202
[ 5046.222540] RAX: ffff8800badca000 RBX: 6b6b6b6b6b6b6b6b RCX: ffff88011c19eeb8
[ 5046.222540] RDX: 0000000000000400 RSI: 000000000000027d RDI: ffff88011dc31000
[ 5046.222540] RBP: ffff88011dd3fbb0 R08: 0000000000000000 R09: 0000000000000002
[ 5046.222540] R10: ffff88011dd3fb80 R11: ffff88011ded8048 R12: 0000000000000000
[ 5046.222540] R13: 0000000000000002 R14: ffff88011dc31000 R15: ffff8800cfc42ce0
[ 5046.222540] FS:  0000000000000000(0000) GS:ffff880002a00000(0000) knlGS:0000000000000000
[ 5046.222540] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5046.222540] CR2: 00007f2043671210 CR3: 000000011dccf000 CR4: 00000000000006f0
[ 5046.222540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5046.222540] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 5046.222540] Process ceph-msgr/0 (pid: 2578, threadinfo ffff88011dd3e000, task ffff88011c19e890)
[ 5046.222540] Stack:
[ 5046.222540]  ffff88011dd3fad0 ffffffff8102bd1d ffffffff81d10050 000004001dd3ffd8
[ 5046.222540] <0> ffff88011dc31b48 ffff8800adeefa10 ffff8800adeef5c0 000000001c19e890
[ 5046.222540] <0> ffff88011c2974c8 00003f5500000007 0000000000000006 ffff88011d3771a8
[ 5046.222540] Call Trace:
[ 5046.222540]  [<ffffffff8102bd1d>] ? get_parent_ip+0x11/0x41
[ 5046.222540]  [<ffffffff81058893>] ? mark_held_locks+0x49/0x64
[ 5046.222540]  [<ffffffff8144d1e5>] ? __mutex_unlock_slowpath+0x10d/0x130
[ 5046.222540]  [<ffffffff81058a2a>] ? trace_hardirqs_on+0xd/0xf
[ 5046.222540]  [<ffffffffa0102cc8>] dispatch+0x257/0x429 [ceph]
[ 5046.222540]  [<ffffffffa00f4b2c>] try_read+0xc44/0x129b [ceph]
[ 5046.222540]  [<ffffffffa00f6bf1>] ? con_work+0xad/0x6b2 [ceph]
[ 5046.222540]  [<ffffffff8144d65e>] ? mutex_lock_nested+0x2f7/0x314
[ 5046.222540]  [<ffffffffa00f6bf1>] ? con_work+0xad/0x6b2 [ceph]
[ 5046.222540]  [<ffffffffa00f6c6d>] con_work+0x129/0x6b2 [ceph]
[ 5046.222540]  [<ffffffff810484f6>] worker_thread+0x1e8/0x2fa
[ 5046.222540]  [<ffffffff8104849d>] ? worker_thread+0x18f/0x2fa
[ 5046.222540]  [<ffffffff8102ce8f>] ? sub_preempt_count+0x92/0x9e
[ 5046.222540]  [<ffffffffa00f6b44>] ? con_work+0x0/0x6b2 [ceph]
[ 5046.222540]  [<ffffffff8104b5b8>] ? autoremove_wake_function+0x0/0x38
[ 5046.222540]  [<ffffffff8104830e>] ? worker_thread+0x0/0x2fa
[ 5046.222540]  [<ffffffff8104b286>] kthread+0x7d/0x85
[ 5046.222540]  [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[ 5046.222540]  [<ffffffff8144fc40>] ? restore_args+0x0/0x30
[ 5046.222540]  [<ffffffff8104b209>] ? kthread+0x0/0x85
[ 5046.222540]  [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
[ 5046.222540] Code: e2 00 04 00 00 89 95 1c ff ff ff e9 d6 00 00 00 4a 8d 04 e5 00 00 00 00 49 03 87 20 02 00 00 48 8b 18 48 85 db 75 04 0f 0b eb fe <f6> 03 08 0f 84 7e 02 00 00 48 83 c9 ff f0 49 0f c1 8e 40 0b 00 
[ 5046.222540] RIP  [<ffffffffa00e6eab>] writepages_finish+0x140/0x3dd [ceph]
[ 5046.222540]  RSP <ffff88011dd3fab0>
[ 5046.541042] ---[ end trace 8c2df756baa8bb35 ]---

Actions #1

Updated by Sage Weil almost 14 years ago

another node got this, probably the same bug?

[ 5037.964266] general protection fault: 0000 [#1] PREEMPT SMP 
[ 5037.964422] last sysfs file: /sys/kernel/uevent_seqnum
[ 5037.964422] CPU 0 
[ 5037.964422] Modules linked in: aes_x86_64 aes_generic ceph fan ac battery container ehci_hcd uhci_hcd thermal processor button
[ 5037.964422] 
[ 5037.964422] Pid: 2571, comm: ceph-msgr/0 Not tainted 2.6.35-rc6+ #52 PDSMi+/PDSMi
[ 5037.964422] RIP: 0010:[<ffffffff813df433>]  [<ffffffff813df433>] tcp_sendpage+0x327/0x5d3
[ 5037.964422] RSP: 0018:ffff88011c5f3bd0  EFLAGS: 00010246
[ 5037.964422] RAX: 0000000000000001 RBX: ffff88010a441688 RCX: 0000000000000a9a
[ 5037.964422] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000a9a RDI: ffff88010a441688
[ 5037.964422] RBP: ffff88011c5f3c60 R08: 0000000000000000 R09: 0000000000000004
[ 5037.964422] R10: ffffffff813df165 R11: ffff88011b847108 R12: ffff88011df38de0
[ 5037.964422] R13: 0000000000000a9a R14: 0000000000000001 R15: 0000000000000000
[ 5037.964422] FS:  0000000000000000(0000) GS:ffff880002a00000(0000) knlGS:0000000000000000
[ 5037.964422] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5037.964422] CR2: 00007f05c5314000 CR3: 000000011c611000 CR4: 00000000000006f0
[ 5037.964422] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5037.964422] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 5037.964422] Process ceph-msgr/0 (pid: 2571, threadinfo ffff88011c5f2000, task ffff88011d3506d0)
[ 5037.964422] Stack:
[ 5037.964422]  0000000000000000 ffff880100000000 0000000000000000 ffff88010a441860
[ 5037.964422] <0> 0000c0401c5f3c10 0000000000001000 00000000000000b6 0000000000000000
[ 5037.964422] <0> 00018801000005a8 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 0000000000000000
[ 5037.964422] Call Trace:
[ 5037.964422]  [<ffffffff813a476d>] kernel_sendpage+0x16/0x1f
[ 5037.964422]  [<ffffffffa00e97cc>] try_write+0x649/0xff4 [ceph]
[ 5037.964422]  [<ffffffffa00eac79>] con_work+0x135/0x6b2 [ceph]
[ 5037.964422]  [<ffffffff810484f6>] worker_thread+0x1e8/0x2fa
[ 5037.964422]  [<ffffffff8104849d>] ? worker_thread+0x18f/0x2fa
[ 5037.964422]  [<ffffffff8102ce8f>] ? sub_preempt_count+0x92/0x9e
[ 5037.964422]  [<ffffffffa00eab44>] ? con_work+0x0/0x6b2 [ceph]
[ 5037.964422]  [<ffffffff8104b5b8>] ? autoremove_wake_function+0x0/0x38
[ 5037.964422]  [<ffffffff8104830e>] ? worker_thread+0x0/0x2fa
[ 5037.964422]  [<ffffffff8104b286>] kthread+0x7d/0x85
[ 5037.964422]  [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[ 5037.964422]  [<ffffffff8144fc40>] ? restore_args+0x0/0x30
[ 5037.964422]  [<ffffffff8104b209>] ? kthread+0x0/0x85
[ 5037.964422]  [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
[ 5037.964422] Code: 00 45 85 c0 74 21 41 8b 94 24 a8 00 00 00 41 8d 46 ff 49 03 94 24 b0 00 00 00 48 98 48 c1 e0 04 44 01 6c 02 3c eb 61 48 8b 55 b8 <66> 83 3a 00 79 04 48 8b 52 10 8b 42 08 85 c0 75 04 0f 0b eb fe 
[ 5037.964422] RIP  [<ffffffff813df433>] tcp_sendpage+0x327/0x5d3
[ 5037.964422]  RSP <ffff88011c5f3bd0>
[ 5038.238660] ---[ end trace 07169429531399da ]---

Actions #2

Updated by Sage Weil almost 14 years ago

the first crash is addr.c:534,
WARN_ON(!PageUptodate(page));
bad page pointer page=5b5b5b.., i=0.

Actions #3

Updated by Sage Weil over 13 years ago

  • Target version changed from v2.6.35 to v2.6.36
Actions #4

Updated by Sage Weil over 13 years ago

  • Target version deleted (v2.6.36)
Actions #5

Updated by Sage Weil over 13 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF