Project

General

Profile

Bug #784

kclient crash

Added by Greg Farnum about 13 years ago. Updated about 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I got this on a local rsync_better branch, which is built off of unstable commit 9c01177. As best I can tell it's not because of any of my changes.

Got this during my typical rsync test: Mount ceph, rsync a copy of ceph-client to the ceph mount. It was partway through the rsync, but not done, when UML crashed.

Backtrace via gdb:

Program terminated with signal 11, Segmentation fault.
#0  *__GI_abort () at abort.c:128
128     abort.c: No such file or directory.
        in abort.c
(gdb) bt
#0  *__GI_abort () at abort.c:128
#1  0x000000006002589a in os_dump_core () at arch/um/os-Linux/util.c:119
#2  0x0000000060017d5d in panic_exit (self=<value optimized out>, unused1=<value optimized out>, unused2=<value optimized out>) at arch/um/kernel/um_arch.c:233
#3  0x000000006004b172 in notifier_call_chain (nl=<value optimized out>, val=0, v=0x60413780, nr_to_call=-2, nr_calls=<value optimized out>) at kernel/notifier.c:93
#4  0x000000006004b1c0 in __atomic_notifier_call_chain (nh=<value optimized out>, val=18446744073709551615, v=0xffffffffffffffa8) at kernel/notifier.c:182
#5  atomic_notifier_call_chain (nh=<value optimized out>, val=18446744073709551615, v=0xffffffffffffffa8) at kernel/notifier.c:191
#6  0x000000006026ac73 in panic (fmt=0x602dcf2b "Kernel mode fault at addr 0x%lx, ip 0x%lx") at kernel/panic.c:99
#7  0x0000000060017b7b in segv (fi=..., ip=<value optimized out>, is_user=0, regs=0x60367600) at arch/um/kernel/trap.c:201
#8  0x0000000060017bda in segv_handler (sig=<value optimized out>, regs=0xffffffffffffffa8) at arch/um/kernel/trap.c:147
#9  0x0000000060024898 in sig_handler_common (sig=6, sc=0x603677d8) at arch/um/os-Linux/signal.c:49
#10 0x00000000600249de in sig_handler (sig=1, sc=0xffffffffffffffff) at arch/um/os-Linux/signal.c:226
#11 0x0000000060024c10 in handle_signal (sig=1, sc=0x603677d8) at arch/um/os-Linux/signal.c:158
#12 0x0000000060026684 in hard_handler (sig=1) at arch/um/os-Linux/sys-x86_64/signal.c:15
#13 <signal handler called>
#14 0x0000000000000000 in ?? ()
#15 0x000000006005b7fa in mask_ack_irq (irq=10, desc=0x6037d6c0) at kernel/irq/chip.c:387
#16 handle_edge_irq (irq=10, desc=0x6037d6c0) at kernel/irq/chip.c:622
#17 0x0000000060014ff9 in generic_handle_irq_desc (irq=10, regs=<value optimized out>) at include/linux/irqdesc.h:119
#18 generic_handle_irq (irq=10, regs=<value optimized out>) at include/linux/irqdesc.h:130
#19 do_IRQ (irq=10, regs=<value optimized out>) at arch/um/kernel/irq.c:337
#20 0x0000000060017614 in winch (sig=<value optimized out>, regs=0x6037d6c0) at arch/um/kernel/trap.c:246
#21 0x0000000060024898 in sig_handler_common (sig=1614192128, sc=0x60367c28) at arch/um/os-Linux/signal.c:49
#22 0x00000000600249de in sig_handler (sig=1614272192, sc=0x6037d6c0) at arch/um/os-Linux/signal.c:226
#23 0x0000000060024c10 in handle_signal (sig=1614272192, sc=0x60367c28) at arch/um/os-Linux/signal.c:158
#24 0x0000000060026684 in hard_handler (sig=1614272192) at arch/um/os-Linux/sys-x86_64/signal.c:15
#25 <signal handler called>
#26 0x00007fe1b8d4423c in ptrace (request=PTRACE_SETREGS) at ../sysdeps/unix/sysv/linux/ptrace.c:118
#27 0x0000000060027419 in userspace (regs=0xa020f428) at arch/um/os-Linux/skas/process.c:373
#28 0x0000000060015b9c in fork_handler () at arch/um/kernel/process.c:181
#29 0x0000000000000000 in ?? ()

The end of the kclient log:

[ 3268.080000] libceph:  osds timeout
[ 3268.080000] libceph:  __remove_old_osds 00000000a021a9f0
[ 3271.880000] libceph:  monc delayed_work
[ 3271.880000] libceph:  __send_subscribe sub_sent=0 exp=0 want_osd=2
[ 3271.880000] libceph:  __schedule_delayed after 2000
[ 3272.280000] 
[ 3272.280000] Modules linked in:
[ 3272.280000] Pid: 1214, comm: bash Not tainted 2.6.37-32890-g9651e88-dirty
[ 3272.280000] RIP: 0033:[<0000000000000000>]
[ 3272.280000] RSP: 00000000603679e8  EFLAGS: 00010246
[ 3272.280000] RAX: 0000000060369e00 RBX: 000000006037d6c0 RCX: 000000000000001c
[ 3272.280000] RDX: 0000000000000000 RSI: 000000006037d6c0 RDI: 000000006037d6c0
[ 3272.280000] RBP: 0000000060367a10 R08: 00000000a0254000 R09: 0000000010000000
[ 3272.280000] R10: 00000000a020f100 R11: 0000000000000206 R12: 000000006037d728
[ 3272.280000] R13: 000000000000000a R14: 0000000000000000 R15: 0000000010000000
[ 3272.280000] Call Trace: 
[ 3272.280000] 603674e8:  [<60017b5e>] segv+0x1f5/0x212
[ 3272.280000] 60367518:  [<6026d0a6>] _raw_spin_unlock_irqrestore+0x18/0x1c
[ 3272.280000] 60367538:  [<6002cfb7>] try_to_wake_up+0x86/0x98
[ 3272.280000] 603675c8:  [<60017bda>] segv_handler+0x5f/0x65
[ 3272.280000] 603675f8:  [<60024898>] sig_handler_common+0x84/0x98
[ 3272.280000] 603676a8:  [<601d76a9>] sk_reset_timer+0x17/0x23
[ 3272.280000] 603676c8:  [<60211493>] tcp_send_delayed_ack+0xb4/0xb6
[ 3272.280000] 60367728:  [<600249de>] sig_handler+0x30/0x3b
[ 3272.280000] 60367748:  [<60024c10>] handle_signal+0x6d/0xa3
[ 3272.280000] 60367798:  [<60026684>] hard_handler+0x10/0x14
[ 3272.280000] 603678b8:  [<601e2150>] process_backlog+0x116/0x128
[ 3272.280000] 603678e8:  [<60024963>] set_signals+0x1c/0x2e
[ 3272.280000] 60367908:  [<6005c43b>] rcu_qsctr_help+0x41/0x4a
[ 3272.280000] 60367918:  [<60036682>] __local_bh_enable+0x48/0x83
[ 3272.280000] 60367948:  [<600367cf>] __do_softirq+0x112/0x128
[ 3272.280000] 603679d8:  [<6026d122>] _raw_spin_lock+0x9/0xb
[ 3272.280000] 603679e8:  [<6005b7fa>] handle_edge_irq+0x5a/0x14c
[ 3272.280000] 60367a18:  [<60014ff9>] do_IRQ+0x2b/0x41
[ 3272.280000] 60367a38:  [<60017614>] winch+0xe/0x10
[ 3272.280000] 60367a48:  [<60024898>] sig_handler_common+0x84/0x98
[ 3272.280000] 60367a68:  [<60024945>] real_alarm_handler+0x3c/0x3e
[ 3272.280000] 60367af0:  [<60085e51>] fput+0x0/0x1ef
[ 3272.280000] 60367b38:  [<600248f7>] unblock_signals+0x4b/0x5d
[ 3272.280000] 60367b78:  [<600249de>] sig_handler+0x30/0x3b
[ 3272.280000] 60367b98:  [<60024c10>] handle_signal+0x6d/0xa3
[ 3272.280000] 60367be8:  [<60026684>] hard_handler+0x10/0x14
[ 3272.280000] 
[ 3272.280000] Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x0

 3272.280000] Call Trace: 
[ 3272.280000] 603673e8:  [<6026ac58>] panic+0xea/0x1dc
[ 3272.280000] 60367440:  [<6005523e>] __module_text_address+0xd/0x53
[ 3272.280000] 60367458:  [<6005528d>] is_module_text_address+0x9/0x11
[ 3272.280000] 60367468:  [<600455c4>] __kernel_text_address+0x65/0x6b
[ 3272.280000] 60367470:  [<60026684>] hard_handler+0x10/0x14
[ 3272.280000] 60367488:  [<6001694a>] show_trace+0x8e/0x95
[ 3272.280000] 603674b8:  [<6002a140>] show_regs+0x2b/0x2f
[ 3272.280000] 603674e8:  [<60017b7b>] segv_handler+0x0/0x65
[ 3272.280000] 60367518:  [<6026d0a6>] _raw_spin_unlock_irqrestore+0x18/0x1c
[ 3272.280000] 60367538:  [<6002cfb7>] try_to_wake_up+0x86/0x98
[ 3272.280000] 603675c8:  [<60017bda>] segv_handler+0x5f/0x65
[ 3272.280000] 603675f8:  [<60024898>] sig_handler_common+0x84/0x98
[ 3272.280000] 603676a8:  [<601d76a9>] sk_reset_timer+0x17/0x23
[ 3272.280000] 603676c8:  [<60211493>] tcp_send_delayed_ack+0xb4/0xb6
[ 3272.280000] 60367728:  [<600249de>] sig_handler+0x30/0x3b
[ 3272.280000] 60367748:  [<60024c10>] handle_signal+0x6d/0xa3
[ 3272.280000] 60367798:  [<60026684>] hard_handler+0x10/0x14
[ 3272.280000] 603678b8:  [<601e2150>] process_backlog+0x116/0x128
[ 3272.280000] 603678e8:  [<60024963>] set_signals+0x1c/0x2e
[ 3272.280000] 60367908:  [<6005c43b>] rcu_qsctr_help+0x41/0x4a
[ 3272.280000] 60367918:  [<60036682>] __local_bh_enable+0x48/0x83
[ 3272.280000] 60367948:  [<600367cf>] __do_softirq+0x112/0x128
[ 3272.280000] 603679d8:  [<6026d122>] _raw_spin_lock+0x9/0xb
[ 3272.280000] 603679e8:  [<6005b7fa>] handle_edge_irq+0x5a/0x14c
[ 3272.280000] 60367a18:  [<60014ff9>] do_IRQ+0x2b/0x41
[ 3272.280000] 60367a38:  [<60017614>] winch+0xe/0x10
[ 3272.280000] 60367a48:  [<60024898>] sig_handler_common+0x84/0x98
[ 3272.280000] 60367a68:  [<60024945>] real_alarm_handler+0x3c/0x3e
[ 3272.280000] 60367af0:  [<60085e51>] fput+0x0/0x1ef
[ 3272.280000] 60367b38:  [<600248f7>] unblock_signals+0x4b/0x5d
[ 3272.280000] 60367b78:  [<600249de>] sig_handler+0x30/0x3b
[ 3272.280000] 60367b98:  [<60024c10>] handle_signal+0x6d/0xa3
[ 3272.280000] 60367be8:  [<60026684>] hard_handler+0x10/0x14
[ 3272.280000] 
[ 3272.280000] 
[ 3272.280000] Modules linked in:
[ 3272.280000] Pid: 1214, comm: bash Not tainted 2.6.37-32890-g9651e88-dirty
[ 3272.280000] RIP: 0033:[<000000000047ed50>]
[ 3272.280000] RSP: 0000007fbfb83f58  EFLAGS: 00000246
[ 3272.280000] RAX: 0000000000000000 RBX: 000000004099f6a0 RCX: ffffffffffffffff
[ 3272.280000] RDX: 0000007fbfb83f60 RSI: 0000007fbfb84090 RDI: 000000000000001c
[ 3272.280000] RBP: 0000007fbfb843b7 R08: 0000000000000000 R09: 0000000000000001
[ 3272.280000] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
[ 3272.280000] R13: 00000000ffffffff R14: 0000007fbfb852e0 R15: 0000000000000000
[ 3272.280000] Call Trace: 
[ 3272.280000] 60367378:  [<60017d47>] panic_exit+0x2f/0x45
[ 3272.280000] 60367398:  [<6004b172>] notifier_call_chain+0x32/0x5e
[ 3272.280000] 603673d8:  [<6004b1c0>] atomic_notifier_call_chain+0x13/0x15
[ 3272.280000] 603673e8:  [<6026ac73>] panic+0x105/0x1dc
[ 3272.280000] 60367440:  [<6005523e>] __module_text_address+0xd/0x53
[ 3272.280000] 60367458:  [<6005528d>] is_module_text_address+0x9/0x11
[ 3272.280000] 60367468:  [<600455c4>] __kernel_text_address+0x65/0x6b
[ 3272.280000] 60367470:  [<60026684>] hard_handler+0x10/0x14
[ 3272.280000] 60367488:  [<6001694a>] show_trace+0x8e/0x95
[ 3272.280000] 603674b8:  [<6002a140>] show_regs+0x2b/0x2f
[ 3272.280000] 603674e8:  [<60017b7b>] segv_handler+0x0/0x65
[ 3272.280000] 60367518:  [<6026d0a6>] _raw_spin_unlock_irqrestore+0x18/0x1c

[ 3272.280000] 60367538:  [<6002cfb7>] try_to_wake_up+0x86/0x98
[ 3272.280000] 603675c8:  [<60017bda>] segv_handler+0x5f/0x65
[ 3272.280000] 603675f8:  [<60024898>] sig_handler_common+0x84/0x98
[ 3272.280000] 603676a8:  [<601d76a9>] sk_reset_timer+0x17/0x23
[ 3272.280000] 603676c8:  [<60211493>] tcp_send_delayed_ack+0xb4/0xb6
[ 3272.280000] 60367728:  [<600249de>] sig_handler+0x30/0x3b
[ 3272.280000] 60367748:  [<60024c10>] handle_signal+0x6d/0xa3
[ 3272.280000] 60367798:  [<60026684>] hard_handler+0x10/0x14
[ 3272.280000] 603678b8:  [<601e2150>] process_backlog+0x116/0x128
[ 3272.280000] 603678e8:  [<60024963>] set_signals+0x1c/0x2e
[ 3272.280000] 60367908:  [<6005c43b>] rcu_qsctr_help+0x41/0x4a
[ 3272.280000] 60367918:  [<60036682>] __local_bh_enable+0x48/0x83
[ 3272.280000] 60367948:  [<600367cf>] __do_softirq+0x112/0x128
[ 3272.280000] 603679d8:  [<6026d122>] _raw_spin_lock+0x9/0xb
[ 3272.280000] 603679e8:  [<6005b7fa>] handle_edge_irq+0x5a/0x14c
[ 3272.280000] 60367a18:  [<60014ff9>] do_IRQ+0x2b/0x41
[ 3272.280000] 60367a38:  [<60017614>] winch+0xe/0x10
[ 3272.280000] 60367a48:  [<60024898>] sig_handler_common+0x84/0x98
[ 3272.280000] 60367a68:  [<60024945>] real_alarm_handler+0x3c/0x3e
[ 3272.280000] 60367af0:  [<60085e51>] fput+0x0/0x1ef
[ 3272.280000] 60367b38:  [<600248f7>] unblock_signals+0x4b/0x5d
[ 3272.280000] 60367b78:  [<600249de>] sig_handler+0x30/0x3b
[ 3272.280000] 60367b98:  [<60024c10>] handle_signal+0x6d/0xa3
[ 3272.280000] 60367be8:  [<60026684>] hard_handler+0x10/0x14
[ 3272.280000] 

History

#1 Updated by Greg Farnum about 13 years ago

  • Project changed from Ceph to Linux kernel client

#2 Updated by Sage Weil about 13 years ago

if this was uml and you have a core file, gdb will give you a more useful backtrace...

#3 Updated by Greg Farnum about 13 years ago

That backtrace was via gdb. I kept the stuff around but I'm not much good at handling linux crashes so I just put it in here for now...

#4 Updated by Sage Weil about 13 years ago

Greg Farnum wrote:

That backtrace was via gdb. I kept the stuff around but I'm not much good at handling linux crashes so I just put it in here for now...

oh right.. nevermind :). frequently the console dump stops but the gdb one includes the real culprit. i don't see any ceph code at all in this case, though. let's see if it's reproducible?

#5 Updated by Sage Weil about 13 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF