Project

General

Profile

Actions

Bug #3204

closed

rbd client kernel panic when osd connection is lost

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From ML:


Date: Mon, 24 Sep 2012 18:23:14 +0800
From: Christian Huang <ythuang@gmail.com>
To: ceph-devel@vger.kernel.org
Cc: sage@inktank.com
Subject: CEPH RBD client kernel panic when OSD connection is lost on kernel 3.2,
    3.5, 3.5.4

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ANSI_X3.4-1968" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,
    we met the following issue while testing ceph cluster HA.
    Appreciate if anyone can shed some light.
    could this be related to the configuration ? (ie, 2 OSD nodes only)

    Issue description:
    ceph rbd client will kernel panic if an OSD server loses it's
network connectivity.
    so far, we can reproduce it with certainty.
    we have tried with the following kernels
    a. Stock kernel from 12.04 (3.2 series)
        3.5 series, as suggested in a previous mail by Sage
    b. 3.5.0-15 from quantal repo,
git://kernel.ubuntu.com/ubuntu/ubuntu-quantal.git, Ubuntu-3.5.0-15.22
tag
    c. v3.5.4-quantal,
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5.4-quantal/

    Environment:
    OS: Ubuntu 12.04 precise pangolin
    Ceph configuration:
        OSD nodes: 2 x 12 drives , 1 os drive, 11 are mapped to OSD
0-10, 10GbE link
        Monitor nodes: 3 x KVM virtual machines on ubuntu host.
        test client: fresh install of Ubuntu 12.04.1
        Ceph version used: 0.48, 0.48.1, 0.48.2, 0.51
        all nodes have the same kernel version.

    steps to reproduce:
    on the test client,
    1. load rbd modules
    2. create rbd device
    3. map rbd device
    4. use fio tool to create work load on the device, 8 threads is
used for workload
        we have also tried with iometer, 8 workers, 32k 50/50, same results.

    on one of the OSD nodes,
    1. sudo ifconfig eth0 down #where eth0 is the primary interface
configured for ceph.
    2. within 30 seconds, the test client will panic.

    this happens when there is IO activity on the RBD device, and one
of the OSD nodes loses connectivity.

    The netconsole output is available available from the following
dropbox link,
    zip: goo.gl/LHytr

Actions #1

Updated by Sage Weil over 11 years ago

  • Project changed from Ceph to rbd
Actions #2

Updated by Alex Elder over 11 years ago

I believe have reproduced this problem (or perhaps something
like it) using stock stable kernel 3.5.4. I'm not 100% certain,
because the output I have isn't identical to what was originally
reported. In any case, the following is what I find in the crash
I hit.

The crash occurred in read_partial_message_bio():

static int read_partial_message_bio(struct ceph_connection *con,
                                    struct bio **bio_iter, int *bio_seg,
                                    unsigned int data_len, bool do_datacrc)
{
        struct bio_vec *bv = bio_iovec_idx(*bio_iter, *bio_seg);   <--- HERE
        void *p;
        int ret, left;
...

This is how bio_iovec_idx() is defined:
#define bio_iovec_idx(bio, idx) (&((bio)->bi_io_vec[(idx)]))

The problem that leads to the crash is that *bio_iter is a null pointer.
So when it is dereferenced to get the value in its bio_iovec[]
array (at offset 0x48 = 72) we get an access violation.

Now I'll try to work backward to see why bio_iter (which comes
from &ceph_msg->bio_iter in the caller, read_partial_message())
is null at this point.

Actions #3

Updated by Alex Elder over 11 years ago

Here is an excerpt from the log that Chris originally provided.

[  203.119404] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  203.131376] IP: [<ffffffffa00dbe40>] write_partial_msg_pages+0x3f0/0x4b0 [libceph]
[  203.144437] PGD 0
[  203.151021] Oops: 0000 [#1] SMP
[  203.157738] CPU 0
[  203.157865] Modules linked in: rbd libceph libcrc32c vesafb ext2 netconsole configfs serio_raw joydev mac_hid i7core_edac edac_core lp ioatdma parport dca usbhid hid tg3
[  203.172111]
[  203.172114] Pid: 1901, comm: kworker/0:2 Not tainted 3.2.0-29-generic #46-Ubuntu Wistron Cloud Computing/P92TB2
[  203.172118] RIP: 0010:[<ffffffffa00dbe40>]  [<ffffffffa00dbe40>] write_partial_msg_pages+0x3f0/0x4b0 [libceph]
[  203.172126] RSP: 0018:ffff880617309d10  EFLAGS: 00010246
[  203.172128] RAX: 0000000000000000 RBX: ffff880612a2c030 RCX: 0000000000000010
[  203.172130] RDX: 0000000000052000 RSI: ffff880614e2a6a0 RDI: 0000000000000000
[  203.172132] RBP: ffff880617309d70 R08: ffff88061410f000 R09: 0000000000011ad0
[  203.172134] R10: 00000000000005a8 R11: 0000000000000000 R12: ffffea00182ddb40
[  203.172137] R13: 000000000002e000 R14: ffff880614e2a600 R15: 0000000000080000
[  203.172139] FS:  0000000000000000(0000) GS:ffff880627200000(0000) knlGS:0000000000000000
[  203.172142] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  203.172144] CR2: 0000000000000048 CR3: 0000000001c05000 CR4: 00000000000006f0
[  203.172146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  203.172149] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  203.172151] Process kworker/0:2 (pid: 1901, threadinfo ffff880617308000, task ffff880613014500)
[  203.172153] Stack:
[  203.172155]  ffff880613014548 0000000000000000 ffff880614e2a6a0 0008000017309d50
[  203.172158]  0000000000000000 0000000000080000 0000000000000000 ffff880612a2c030
[  203.172162]  0000160000000000 ffff880000000000 0000000000001000 ffff880612a2c308
[  203.172166] Call Trace:
[  203.172174]  [<ffffffffa00dc5ff>] try_write+0x11f/0x360 [libceph]
[  203.172182]  [<ffffffffa00ddb40>] ? try_read+0x470/0x470 [libceph]
[  203.172188]  [<ffffffffa00ddc5c>] con_work+0x11c/0x160 [libceph]
[  203.172195]  [<ffffffff810849ea>] process_one_work+0x11a/0x480
[  203.172199]  [<ffffffff81085794>] worker_thread+0x164/0x370
[  203.172203]  [<ffffffff81085630>] ? manage_workers.isra.29+0x130/0x130
[  203.172207]  [<ffffffff81089fbc>] kthread+0x8c/0xa0
[  203.172212]  [<ffffffff81664034>] kernel_thread_helper+0x4/0x10
[  203.172215]  [<ffffffff81089f30>] ? flush_kthread_worker+0xa0/0xa0
[  203.172219]  [<ffffffff81664030>] ? gs_change+0x13/0x13
[  203.172220] Code: c7 fc ff ff 0f 1f 80 00 00 00 00 49 83 be 90 00 00 00 00 0f 84 aa 00 00 00 49 63 be a0 00 00 00 49 8b 86 98 00 00 00 48 c1 e7 04 <48> 03 78 48 8b 45 c4 85 c0 4c 8b 27 8b 4f 0c 75 31 8b 93 c8 02
[  203.172245] RIP  [<ffffffffa00dbe40>] write_partial_msg_pages+0x3f0/0x4b0 [libceph]
[  203.172251]  RSP <ffff880617309d10>
[  203.172252] CR2: 0000000000000048
[  203.172436] ---[ end trace d8ab82ee4d487d2a ]---

Actions #4

Updated by Alex Elder over 11 years ago

I am able to reproduce this problem at will using stock
Linux 3.5.4, and running "fio" in a loop, and then making
one of the OSD network interfaces go down. (Chris provided
this recipe in e-mail.)

It can happen in the read path or the write path. But in
both, the error occurs in a call to bio_iovec_idx() made
with a null msg->bio_iter pointer.

I have not been able to gain much new insight beyond that.
I've attempted to use debug messages to try to provide some
history of activity immediately prior to a crash, but it
is hard to do so without inundating the console with output
and therefore affecting the result (or losing information
because it can't keep up).

So unfortunately I don't have much new to report, but I
wanted to at least check in.

Actions #5

Updated by Alex Elder over 11 years ago

I have a clue to pursue this morning.

I have noticed in two crashes that the state of the connection is
not one that we should be transferring bio data. That is, in one
instance, I found that the connection state included NEGOTIATING,
and in another it included CONNECTING. In those states we should
only be sending data (various connection establishing buffers) via
the out_kvec array.

So I'm going to try to find out: Why are we doing bio transfers
while in these states?

Actions #6

Updated by Alex Elder over 11 years ago

I found another bug fix that I thought might make this go
away. This time it was this one:
3b5ede07 libceph: fix fault locking; close socket on lossy fault

But once again, I got a similar crash.

On the theory I'm hitting multiple fixed problems, I'm now going
to turn my attention to building up a complete set of patches that
should be applied to 3.5.4 to create a stable version of the ceph
code. (This needs to be done anyway.) Along the way I hope that
this particular problem disappears, but if it does not I can at
least know I'm not getting hit by any other known issues.

Actions #7

Updated by Alex Elder over 11 years ago

OK, I think I found it.

a410702 libceph: (re)initialize bio_iter on start of message receive

I'm going to finish the process of reviewing all recent commits
and setting them up for inclusion in a 3.5 stable branch.

Actions #8

Updated by Alex Elder over 11 years ago

Well, I hit a similar issue with the previous fix in place.
However, this is in the write path. When looking at the
other fix I noticed it only addressed the issue in the read
path and I wondered whether the write path would be affected.

I'll take a closer look now, to implement the equivalent
change in the write path. The same pattern of code is there,
but the write path is structured slightly differently.

Actions #9

Updated by Alex Elder over 11 years ago

There already was a fix available that addressed the
write case. I put it in originally as a cleanup but
it turns out it preemptively fixed this bug. :)

It was this commit: 7c7a48c libceph: move init of bio_iter
(That one depended on another commit or two as well.)

I have been testing with this commit along with Sage's
commit that addressed the read case and have not hit
the original problem. I am becoming fairly satisfied
this is the root cause of the original problem.

I'm going to supply these commits as part of a ceph
stable branch, named "linux-3.5.4-ceph". I will also
submit this series of patches to
so they can be included an upcoming stable release.

Actions #10

Updated by Christian Yung-Tai Huang over 11 years ago

Hi Alex,
could you let us know which commits we should apply in order to address this issue? we would like to apply it first to our kernel, while we wait it to be included into the current stable.

Best Regards.
Chris.

Actions #11

Updated by Dan Mick over 11 years ago

I wonder if there's any chance this also affects http://tracker.newdream.net/issues/2937 ?

Actions #12

Updated by Christian Yung-Tai Huang over 11 years ago

Hi Alex,
Since kernel 3.6 was releases a few days ago (https://lkml.org/lkml/2012/9/30/152) , what would be the suggested version of kernel to use now?

Best Regards.
Chris.

Actions #13

Updated by Alex Elder over 11 years ago

There are two commits that fixed the problem--one on the read side
and one on the write side. They may require other commits in order
to apply without too much trouble. They are:

abdaa6a8 libceph: don't use bio_iter as a flag
a4107026 libceph: (re)initialize bio_iter on start of message receive

Those commit ids are as they appear in the 3.6 released kernel.

Actions #14

Updated by Alex Elder over 11 years ago

  • Status changed from New to Resolved

This problem has been resolved. The last thing I added
describes the commits that address it, and they're now
in the mainline kernel at version 3.6.

Actions #15

Updated by Dan Mick over 11 years ago

  • Assignee set to Alex Elder
Actions

Also available in: Atom PDF