Bug #11621: MDS Reconnection causes kernel mode null pointer dereference - Linux kernel client - Ceph

Actions

Copy link

Bug #11621

closed

MDS Reconnection causes kernel mode null pointer dereference

Added by Adam Tygart almost 9 years ago. Updated almost 9 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

Similar to my earlier bug report, may or may not be related.

Servers: CentOS 7.1 Ceph 0.94.1

Client: Gentoo, kernel version 3.19.5, kernel mode cephfs client.

~800 initial rsyncs, copying 323TB (~46 million files). When the crash occurred we were down to ~100 left.

This has happened 4 times today. I've attempted to preemptively turn on dynamic debugging in kernel for ceph, but it seems to either mitigate the issue, or lengthen the time to reproduce.

Attached is the kernel stacktrace as gathered from our logging host, it does get cut off by client machine crashing entirely.

Files

kernel-panic-hero29.log (9.28 KB) kernel-panic-hero29.log

MDS reconnection null pointer dereference trace

Adam Tygart, 05/13/2015 10:49 PM

Actions

Copy link

Updated by Zheng Yan almost 9 years ago

should be fixed by

commit c0bd50e2eeddf139d8f61e709d7003210301e93a
Author: Yan, Zheng <zyan@redhat.com>
Date:   Tue Apr 7 15:51:08 2015 +0800

    ceph: fix null pointer dereference in send_mds_reconnect()

    sb->s_root can be null when umounting

    Signed-off-by: Yan, Zheng <zyan@redhat.com>

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index fd5585b..0a2eb32 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2892,7 +2892,8 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
        spin_unlock(&session->s_cap_lock);

        /* trim unused caps to reduce MDS's cache rejoin time */
-       shrink_dcache_parent(mdsc->fsc->sb->s_root);
+       if (mdsc->fsc->sb->s_root)
+               shrink_dcache_parent(mdsc->fsc->sb->s_root);

        ceph_con_close(&session->s_con);
        ceph_con_open(&session->s_con,

This crash happens if MDS restarts while the FS is umounting. did you try umounting the fs before crash happen?

Actions

Copy link

Updated by Adam Tygart almost 9 years ago

Yes, I forgot that I had an mount/unmount function in a script to check ceph.dir.rbytes and ceph.dir.rfiles (via a separate mount and session) to get some idea of how far along the rsyncs were getting.

Sorry for the noise.

Actions

Copy link