Project

General

Profile

Actions

Bug #210

closed

GPF in ceph_con_revoke_message+0x2c/0x152

Added by Sage Weil almost 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

osd was repeated being restarted, probably doing weird things.

[63357.773459] general protection fault: 0000 [#1] PREEMPT SMP 
[63357.773993] last sysfs file: /sys/class/net/lo/operstate
[63357.773993] CPU 0 
[63357.773993] Modules linked in: aes_x86_64 aes_generic ceph fan ac battery ehci_hcd uhci_hcd container processor thermal button
[63357.773993] 
[63357.773993] Pid: 2845, comm: ceph-msgr/0 Not tainted 2.6.35-rc3+ #33 PDSMi+/PDSMi
[63357.773993] RIP: 0010:[<ffffffff8105b30d>]  [<ffffffff8105b30d>] __lock_acquire+0x41b/0x87e
[63357.773993] RSP: 0018:ffff88011cab9a40  EFLAGS: 00010006
[63357.773993] RAX: 0000000000000002 RBX: 0000000000000246 RCX: 0000000000000000
[63357.773993] RDX: ffff8800b51d5038 RSI: 0000000000000000 RDI: ffff8800b51d5038
[63357.773993] RBP: ffff88011cab9aa0 R08: 0000000000000002 R09: 0000000000000000
[63357.773993] R10: 0000000000007d2b R11: ffffffff8144c593 R12: 5a5a5a5a5a5a5a5a
[63357.773993] R13: ffff88011c9e0750 R14: 0000000000000002 R15: 0000000000000000
[63357.773993] FS:  0000000000000000(0000) GS:ffff880002600000(0000) knlGS:0000000000000000
[63357.773993] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[63357.773993] CR2: 00007f0f10c02e30 CR3: 000000011da0c000 CR4: 00000000000006f0
[63357.773993] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[63357.773993] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[63357.773993] Process ceph-msgr/0 (pid: 2845, threadinfo ffff88011cab8000, task ffff88011c9e0750)
[63357.773993] Stack:
[63357.773993]  ffff88011cab9a50 0000000044524298 0000000000000002 0000000000000000
[63357.773993] <0> ffff8800b51d5038 ffff88011c9e0db0 ffff88011cab9ae0 0000000000000246
[63357.773993] <0> ffff88011c9e0750 0000000000000000 0000000000000002 0000000000000000
[63357.773993] Call Trace:
[63357.773993]  [<ffffffff8105b7f8>] lock_acquire+0x88/0xa5
[63357.773993]  [<ffffffffa00450ee>] ? ceph_con_revoke_message+0x2c/0x152 [ceph]
[63357.773993]  [<ffffffffa00450ee>] ? ceph_con_revoke_message+0x2c/0x152 [ceph]
[63357.773993]  [<ffffffff8144d3f1>] mutex_lock_nested+0x62/0x314
[63357.773993]  [<ffffffffa00450ee>] ? ceph_con_revoke_message+0x2c/0x152 [ceph]
[63357.773993]  [<ffffffff8102ce5c>] ? sub_preempt_count+0x92/0x9e
[63357.773993]  [<ffffffff8144d686>] ? mutex_lock_nested+0x2f7/0x314
[63357.773993]  [<ffffffffa0053217>] ? alloc_msg+0x78/0x2e1 [ceph]
[63357.773993]  [<ffffffffa00450ee>] ceph_con_revoke_message+0x2c/0x152 [ceph]
[63357.773993]  [<ffffffffa0053290>] alloc_msg+0xf1/0x2e1 [ceph]
[63357.773993]  [<ffffffffa0046807>] try_read+0x77f/0x129b [ceph]
[63357.773993]  [<ffffffffa0048d86>] ? con_work+0xad/0x6b2 [ceph]
[63357.773993]  [<ffffffff8144d686>] ? mutex_lock_nested+0x2f7/0x314
[63357.773993]  [<ffffffffa0048d86>] ? con_work+0xad/0x6b2 [ceph]
[63357.773993]  [<ffffffffa0048e02>] con_work+0x129/0x6b2 [ceph]
[63357.773993]  [<ffffffff81048406>] worker_thread+0x1e8/0x2fa
[63357.773993]  [<ffffffff810483ad>] ? worker_thread+0x18f/0x2fa
[63357.773993]  [<ffffffff8102ce5c>] ? sub_preempt_count+0x92/0x9e
[63357.773993]  [<ffffffffa0048cd9>] ? con_work+0x0/0x6b2 [ceph]
[63357.773993]  [<ffffffff8104b4c8>] ? autoremove_wake_function+0x0/0x38
[63357.773993]  [<ffffffff8104821e>] ? worker_thread+0x0/0x2fa
[63357.773993]  [<ffffffff8104b196>] kthread+0x7d/0x85
[63357.773993]  [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[63357.773993]  [<ffffffff8144fc40>] ? restore_args+0x0/0x30
[63357.773993]  [<ffffffff8104b119>] ? kthread+0x0/0x85
[63357.773993]  [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
[63357.773993] Code: f5 1c 00 85 c0 0f 84 72 04 00 00 83 3d fc e3 34 01 00 be 0b 03 00 00 0f 84 19 03 00 00 e9 5b 04 00 00 4d 85 e4 0f 84 52 04 00 00 <f0> 41 ff 84 24 98 01 00 00 8b 35 24 23 96 00 45 8b bd e8 05 00 
[63357.773993] RIP  [<ffffffff8105b30d>] __lock_acquire+0x41b/0x87e
[63357.773993]  RSP <ffff88011cab9a40>
[63357.773993] ---[ end trace 1868120b9ce93406 ]---

Files

sym (4.68 MB) sym ceph.ko symbols Sage Weil, 06/18/2010 10:34 AM
Actions #1

Updated by Sage Weil almost 14 years ago

Actions #2

Updated by Sage Weil almost 14 years ago

r12 is 5a5a5a.., on this code

static inline void atomic_inc(atomic_t *v)
{
        asm volatile(LOCK_PREFIX "incl %0" 
   32b9d:       f0 41 ff 84 24 98 01    lock incl 0x198(%r12)

we're in
alloc_msg
> get_reply
-> ceph_con_revoke_message(req
>r_con_filling_msg, req->r_reply)
> mutex_lock(con>mutex)

AFAICS con is valid. is this some -rc bug?

Actions #3

Updated by Sage Weil almost 14 years ago

  • Status changed from New to In Progress
Actions #4

Updated by Sage Weil almost 14 years ago

  • Status changed from In Progress to Resolved

I think this is a different manifestation of #252, now fixed.

Actions

Also available in: Atom PDF