Project

General

Profile

Actions

Bug #66

closed

BUG_ON(req->r_reply) at fs/ceph/mds_client.c:1841!

Added by Sage Weil about 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

[ 6447.063496] ------------[ cut here ]------------
[ 6447.065210] kernel BUG at fs/ceph/mds_client.c:1841!
[ 6447.065210] invalid opcode: 0000 [#1] PREEMPT SMP
[ 6447.065210] last sysfs file: /sys/kernel/uevent_seqnum
[ 6447.065210] CPU 0
[ 6447.065210] Modules linked in: aes_x86_64 aes_generic ceph fan ac battery container ehci_hcd uhci_hcd thermal button processor
[ 6447.065210]
[ 6447.065210] Pid: 2706, comm: ceph-msgr/0 Not tainted 2.6.34-rc3 #26 PDSMi+/PDSMi
[ 6447.065210] RIP: 0010:[<ffffffffa006db82>] [<ffffffffa006db82>] dispatch+0x6cc/0x1461 [ceph]

[ 6447.065210] RSP: 0000:ffff88011cbbfb20 EFLAGS: 00010286

[ 6447.065210] RAX: ffff88011cbbffd8 RBX: 00000000000c87fb RCX: ffff88011cb3ea06
[ 6447.065210] RDX: ffffffff81cbf370 RSI: ffff88011cb3ea18 RDI: 0000000000000001
[ 6447.065210] RBP: ffff88011cbbfc30 R08: 0000000000000000 R09: 0000000000000003
[ 6447.065210] R10: ffff88011e20cbe0 R11: ffff88011cbbfb10 R12: ffff88011cb70508
[ 6447.065210] R13: ffff88011cb7f650 R14: ffff88011e20cbe0 R15: ffff880105949d68
[ 6447.065210] FS: 0000000000000000(0000) GS:ffff880002600000(0000) knlGS:0000000000000000
[ 6447.065210] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6447.065210] CR2: 00002adf9835f028 CR3: 00000000df979000 CR4: 00000000000006f0
[ 6447.065210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6447.065210] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6447.065210] Process ceph-msgr/0 (pid: 2706, threadinfo ffff88011cbbe000, task ffff88011cb3e380)

[ 6447.065210] Stack:
[ 6447.065210] 0000000000000000 00000000001d2840 ffff880105949d68 ffff88011cb70510

Actions #1

Updated by Sage Weil almost 14 years ago

unable to reproduce... but, see #113

Actions #2

Updated by Sage Weil almost 14 years ago

  • Status changed from New to Resolved
  • Target version changed from v2.6.34 to v2.6.35

fixed by 'ceph: fix locking, error paths when waking reconnect requests'

Actions #3

Updated by Sage Weil almost 14 years ago

fixed by commit:9abf82b8bc93dd904738a71ca69aa5df356d4d24

Actions #4

Updated by Sage Weil almost 14 years ago

  • Status changed from Resolved to In Progress
  • Priority changed from Normal to High

hit this again, on commit:e84346b726ea90a8ed470bc81c4136a7b8710ea5

workload was kernel compilation.

[83869.489962] ------------[ cut here ]------------
[83869.492992] kernel BUG at fs/ceph/mds_client.c:1843!
[83869.492992] invalid opcode: 0000 [#1] PREEMPT SMP 
[83869.492992] last sysfs file: /sys/kernel/uevent_seqnum
[83869.492992] CPU 0 
[83869.492992] Modules linked in: aes_x86_64 aes_generic ceph fan ac battery container ehci_hcd uhci_hcd thermal button processor
[83869.492992] 
[83869.492992] Pid: 2818, comm: ceph-msgr/0 Not tainted 2.6.34-rc6 #27 PDSMi+/PDSMi
[83869.492992] RIP: 0010:[<ffffffffa006ebcd>]  [<ffffffffa006ebcd>] dispatch+0x6cc/0x1461 [ceph]
[83869.492992] RSP: 0000:ffff88011b5d7b20  EFLAGS: 00010286
[83869.492992] RAX: ffff88011b5d7fd8 RBX: 000000000017c24d RCX: ffff88011b66ef06
[83869.492992] RDX: ffffffff81cbb950 RSI: ffff88011b66ef98 RDI: 0000000000000001
[83869.492992] RBP: ffff88011b5d7c30 R08: 0000000000000000 R09: 0000000000000003
[83869.492992] R10: ffff88011e39e408 R11: ffff88011b5d7b10 R12: ffff88011a2a6508
[83869.492992] R13: ffff880043898258 R14: ffff88011e39e408 R15: ffff880105ae0ad0
[83869.492992] FS:  0000000000000000(0000) GS:ffff880002600000(0000) knlGS:0000000000000000
[83869.492992] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[83869.492992] CR2: 00002ad34b914000 CR3: 00000000438f0000 CR4: 00000000000006f0
[83869.492992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[83869.492992] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[83869.492992] Process ceph-msgr/0 (pid: 2818, threadinfo ffff88011b5d6000, task ffff88011b66e900)
[83869.492992] Stack:
[83869.492992]  0000000000000000 00000000001d2840 ffff880105ae0ad0 ffff88011a2a6510
[83869.492992] <0> ffff88011e39e408 000000000000a983 ffff88011b66ef98 0000000000000046
[83869.492992] <0> ffff88011b5d7b80 ffff88011b66e900 fffffffe00000000 ffff88011a2a6508
[83869.492992] Call Trace:
[83869.492992]  [<ffffffff81057f34>] ? mark_held_locks+0x49/0x64
[83869.492992]  [<ffffffff814265ac>] ? __mutex_unlock_slowpath+0x10d/0x130
[83869.492992]  [<ffffffff81058062>] ? trace_hardirqs_on_caller+0x113/0x13e
[83869.492992]  [<ffffffff8105809a>] ? trace_hardirqs_on+0xd/0xf
[83869.492992]  [<ffffffffa0066a5a>] try_read+0xd4a/0x1358 [ceph]
[83869.492992]  [<ffffffff81009a23>] ? native_sched_clock+0x37/0x71
[83869.492992]  [<ffffffff8104f3ba>] ? sched_clock_local+0x11/0x73
[83869.492992]  [<ffffffff8105aca5>] ? __lock_acquire+0x7ee/0x851
[83869.492992]  [<ffffffff81056811>] ? put_lock_stats+0xe/0x27
[83869.492992]  [<ffffffffa0068a14>] con_work+0x11a/0x6bc [ceph]
[83869.492992]  [<ffffffff810478be>] worker_thread+0x1e8/0x2fa
[83869.492992]  [<ffffffff81047865>] ? worker_thread+0x18f/0x2fa
[83869.492992]  [<ffffffffa00688fa>] ? con_work+0x0/0x6bc [ceph]
[83869.492992]  [<ffffffff8104a9a8>] ? autoremove_wake_function+0x0/0x38
[83869.492992]  [<ffffffff810476d6>] ? worker_thread+0x0/0x2fa
[83869.492992]  [<ffffffff8104a676>] kthread+0x7d/0x85
[83869.492992]  [<ffffffff810037d4>] kernel_thread_helper+0x4/0x10
[83869.492992]  [<ffffffff81429040>] ? restore_args+0x0/0x30
[83869.492992]  [<ffffffff8104a5f9>] ? kthread+0x0/0x85
[83869.492992]  [<ffffffff810037d0>] ? kernel_thread_helper+0x0/0x10
[83869.492992] Code: 89 b5 ff ff 48 85 c0 0f 85 ee fe ff ff 48 89 df 48 81 c7 a8 00 00 00 e8 95 9d fb e0 e9 da fe ff ff 49 83 bd 00 01 00 00 00 74 04 <0f> 0b eb fe 48 8b 85 58 ff ff ff 80 78 0c 00 75 29 49 8b 85 00 
[83869.492992] RIP  [<ffffffffa006ebcd>] dispatch+0x6cc/0x1461 [ceph]
[83869.492992]  RSP <ffff88011b5d7b20>
[83869.806557] ---[ end trace 01c1190c03223658 ]---
Actions #5

Updated by Sage Weil almost 14 years ago

  • Assignee set to Sage Weil
Actions #6

Updated by Sage Weil almost 14 years ago

  • Status changed from In Progress to Resolved
Actions #7

Updated by Sage Weil almost 14 years ago

  • Status changed from Resolved to In Progress
Actions #8

Updated by Sage Weil almost 14 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF