Actions
Bug #210
closedGPF in ceph_con_revoke_message+0x2c/0x152
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
osd was repeated being restarted, probably doing weird things.
[63357.773459] general protection fault: 0000 [#1] PREEMPT SMP [63357.773993] last sysfs file: /sys/class/net/lo/operstate [63357.773993] CPU 0 [63357.773993] Modules linked in: aes_x86_64 aes_generic ceph fan ac battery ehci_hcd uhci_hcd container processor thermal button [63357.773993] [63357.773993] Pid: 2845, comm: ceph-msgr/0 Not tainted 2.6.35-rc3+ #33 PDSMi+/PDSMi [63357.773993] RIP: 0010:[<ffffffff8105b30d>] [<ffffffff8105b30d>] __lock_acquire+0x41b/0x87e [63357.773993] RSP: 0018:ffff88011cab9a40 EFLAGS: 00010006 [63357.773993] RAX: 0000000000000002 RBX: 0000000000000246 RCX: 0000000000000000 [63357.773993] RDX: ffff8800b51d5038 RSI: 0000000000000000 RDI: ffff8800b51d5038 [63357.773993] RBP: ffff88011cab9aa0 R08: 0000000000000002 R09: 0000000000000000 [63357.773993] R10: 0000000000007d2b R11: ffffffff8144c593 R12: 5a5a5a5a5a5a5a5a [63357.773993] R13: ffff88011c9e0750 R14: 0000000000000002 R15: 0000000000000000 [63357.773993] FS: 0000000000000000(0000) GS:ffff880002600000(0000) knlGS:0000000000000000 [63357.773993] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [63357.773993] CR2: 00007f0f10c02e30 CR3: 000000011da0c000 CR4: 00000000000006f0 [63357.773993] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [63357.773993] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [63357.773993] Process ceph-msgr/0 (pid: 2845, threadinfo ffff88011cab8000, task ffff88011c9e0750) [63357.773993] Stack: [63357.773993] ffff88011cab9a50 0000000044524298 0000000000000002 0000000000000000 [63357.773993] <0> ffff8800b51d5038 ffff88011c9e0db0 ffff88011cab9ae0 0000000000000246 [63357.773993] <0> ffff88011c9e0750 0000000000000000 0000000000000002 0000000000000000 [63357.773993] Call Trace: [63357.773993] [<ffffffff8105b7f8>] lock_acquire+0x88/0xa5 [63357.773993] [<ffffffffa00450ee>] ? ceph_con_revoke_message+0x2c/0x152 [ceph] [63357.773993] [<ffffffffa00450ee>] ? ceph_con_revoke_message+0x2c/0x152 [ceph] [63357.773993] [<ffffffff8144d3f1>] mutex_lock_nested+0x62/0x314 [63357.773993] [<ffffffffa00450ee>] ? ceph_con_revoke_message+0x2c/0x152 [ceph] [63357.773993] [<ffffffff8102ce5c>] ? sub_preempt_count+0x92/0x9e [63357.773993] [<ffffffff8144d686>] ? mutex_lock_nested+0x2f7/0x314 [63357.773993] [<ffffffffa0053217>] ? alloc_msg+0x78/0x2e1 [ceph] [63357.773993] [<ffffffffa00450ee>] ceph_con_revoke_message+0x2c/0x152 [ceph] [63357.773993] [<ffffffffa0053290>] alloc_msg+0xf1/0x2e1 [ceph] [63357.773993] [<ffffffffa0046807>] try_read+0x77f/0x129b [ceph] [63357.773993] [<ffffffffa0048d86>] ? con_work+0xad/0x6b2 [ceph] [63357.773993] [<ffffffff8144d686>] ? mutex_lock_nested+0x2f7/0x314 [63357.773993] [<ffffffffa0048d86>] ? con_work+0xad/0x6b2 [ceph] [63357.773993] [<ffffffffa0048e02>] con_work+0x129/0x6b2 [ceph] [63357.773993] [<ffffffff81048406>] worker_thread+0x1e8/0x2fa [63357.773993] [<ffffffff810483ad>] ? worker_thread+0x18f/0x2fa [63357.773993] [<ffffffff8102ce5c>] ? sub_preempt_count+0x92/0x9e [63357.773993] [<ffffffffa0048cd9>] ? con_work+0x0/0x6b2 [ceph] [63357.773993] [<ffffffff8104b4c8>] ? autoremove_wake_function+0x0/0x38 [63357.773993] [<ffffffff8104821e>] ? worker_thread+0x0/0x2fa [63357.773993] [<ffffffff8104b196>] kthread+0x7d/0x85 [63357.773993] [<ffffffff81003794>] kernel_thread_helper+0x4/0x10 [63357.773993] [<ffffffff8144fc40>] ? restore_args+0x0/0x30 [63357.773993] [<ffffffff8104b119>] ? kthread+0x0/0x85 [63357.773993] [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10 [63357.773993] Code: f5 1c 00 85 c0 0f 84 72 04 00 00 83 3d fc e3 34 01 00 be 0b 03 00 00 0f 84 19 03 00 00 e9 5b 04 00 00 4d 85 e4 0f 84 52 04 00 00 <f0> 41 ff 84 24 98 01 00 00 8b 35 24 23 96 00 45 8b bd e8 05 00 [63357.773993] RIP [<ffffffff8105b30d>] __lock_acquire+0x41b/0x87e [63357.773993] RSP <ffff88011cab9a40> [63357.773993] ---[ end trace 1868120b9ce93406 ]---
Files
Updated by Sage Weil almost 14 years ago
r12 is 5a5a5a.., on this code
static inline void atomic_inc(atomic_t *v) { asm volatile(LOCK_PREFIX "incl %0" 32b9d: f0 41 ff 84 24 98 01 lock incl 0x198(%r12)
we're in
alloc_msg
-> ceph_con_revoke_message(req
AFAICS con is valid. is this some -rc bug?
Updated by Sage Weil almost 14 years ago
- Status changed from In Progress to Resolved
I think this is a different manifestation of #252, now fixed.
Actions