Project

General

Profile

Actions

Bug #2260

closed

libceph: null pointer dereference at try_write+0x638+0xfb0

Added by Alex Elder about 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
libceph
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

It's not an exact match but it's close enough that I wanted to reopen
bug 1793 or 1866, but found myself unable to. So here's a new one...

I hit this while running xfstests, test 49 I believe. That may be a test
I crashed on once a few weeks ago, and now that I've got things running
pretty reliably I'll see if it is a reliable reproducer for this problem.

Here's some info from /var/log/syslog with a little context.

[86710.152267] XFS (loop0): Ending clean mount
[86712.305386] libceph: osd1 10.214.132.32:6800 socket closed
[86712.695642] libceph: osd0 10.214.132.31:6800 socket closed
[86712.725309] libceph: osd1 10.214.132.32:6800 socket closed
[86715.310218] libceph: osd1 10.214.132.32:6800 socket closed
[86715.337194] libceph: osd0 10.214.132.31:6800 socket closed
[86717.356958] libceph: osd1 10.214.132.32:6800 socket closed
[86717.381901] libceph: osd0 10.214.132.31:6800 socket closed
[86719.222524] libceph: osd1 10.214.132.32:6800 socket closed
[86719.246804] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[86719.289091] IP: [<ffffffffa01c8468>] try_write+0x638/0xfb0 [libceph]
[86719.314297] PGD 0
[86719.334655] Oops: 0000 [#1] SMP
[86719.356339] CPU 0
[86719.358597] Modules linked in: xfs exportfs aesni_intel cryptd aes_x86_64 aes_generic rbd libceph ipmi_devintf ipmi_si ipmi_msghandler lp dcdbas parport i7core_edac edac_core joydev serio_raw hed usbhid hid ixgbe mptsas mptscsih mptbase dca mdio scsi_transport_sas bnx2 btrfs zlib_deflate crc32c libcrc32c
[86719.486831]
[86719.509731] Pid: 15025, comm: kworker/0:1 Not tainted 3.3.0-ceph-00067-gafede88 #1 Dell Inc. PowerEdge R410/01V648
[86719.566377] RIP: 0010:[<ffffffffa01c8468>] [<ffffffffa01c8468>] try_write+0x638/0xfb0 [libceph]
[86719.624149] RSP: 0018:ffff88021c9b1b20 EFLAGS: 00010246
[86719.654956] RAX: 000000000000c000 RBX: ffff880222155030 RCX: 0000000000000000
[86719.688686] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88020d893f00
[86719.722829] RBP: ffff88021c9b1c20 R08: 0000000000000000 R09: 0000000000000000
[86719.757110] R10: 0000000000000000 R11: 0000000000000003 R12: ffffea0008124580
[86719.790899] R13: 0000000000001000 R14: ffff88021d6b5000 R15: 000000000001a000
[86719.825199] FS: 0000000000000000(0000) GS:ffff880227200000(0000) knlGS:0000000000000000
[86719.889006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[86719.924552] CR2: 0000000000000048 CR3: 0000000001c05000 CR4: 00000000000006f0
[86719.962732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[86720.001125] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[86720.039250] Process kworker/0:1 (pid: 15025, threadinfo ffff88021c9b0000, task ffff88020d893f00)
[86720.112205] Stack:
[86720.145509] ffff88021c9b1bd0 0000000000000000 ffff88021c9b1b40 ffff8802221551a8
[86720.216362] ffff88021c9b1c00 ffff88021c9b1bb0 ffffffffe698179f 0000000000000000
[86720.289080] ffff880222155230 ffff88020ca4c000 ffff880222155240 ffff880222155410
[86720.365199] Call Trace:
[86720.400789] [<ffffffffa01c7328>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
[86720.441250] [<ffffffffa01ca0a0>] con_work+0xbe0/0x1b40 [libceph]
[86720.480431] [<ffffffff81615dc0>] ? _raw_spin_unlock_irq+0x30/0x40
[86720.519519] [<ffffffff8106b256>] process_one_work+0x1a6/0x520
[86720.557594] [<ffffffff8106b1e7>] ? process_one_work+0x137/0x520
[86720.595152] [<ffffffffa01c94c0>] ? ceph_con_revoke_message+0x130/0x130 [libceph]
[86720.663868] [<ffffffff8106d593>] worker_thread+0x173/0x400
[86720.700509] [<ffffffff8106d420>] ? manage_workers+0x210/0x210
[86720.737008] [<ffffffff8107280e>] kthread+0xbe/0xd0
[86720.772363] [<ffffffff8161f5f4>] kernel_thread_helper+0x4/0x10
[86720.808985] [<ffffffff81616134>] ? retint_restore_args+0x13/0x13
[86720.845766] [<ffffffff81072750>] ? __init_kthread_worker+0x70/0x70
[86720.882830] [<ffffffff8161f5f0>] ? gs_change+0x13/0x13
[86720.882830] [<ffffffff8161f5f0>] ? gs_change+0x13/0x13
[86720.918532] Code: 84 36 fb ff ff e9 fa fa ff ff 49 83 be 90 00 00 00 00 90 0f 84 65 04 00 00 49 63 96 a0 00 00 00 49 8b 8e 98 00 00 00 48 c1 e2 04 <48> 03 51 48 4c 8b 22 8b 4a 0c 44 8b 6a 08 e9 d8 fb ff ff 49 8d
[86721.031469] RIP [<ffffffffa01c8468>] try_write+0x638/0xfb0 [libceph]
[86721.069757] RSP <ffff88021c9b1b20>
[86721.105146] CR2: 0000000000000048
[86721.186060] ---[ end trace f990acc1e958a213 ]---

That was followed by some other faults as well--recursive faults by
the same process.


Related issues 1 (0 open1 closed)

Related to Linux kernel client - Bug #2287: rbd: crashes with 10Gbit network and fioResolvedAlex Elder04/13/2012

Actions
Actions

Also available in: Atom PDF