Project

General

Profile

Actions

Bug #11621

closed

MDS Reconnection causes kernel mode null pointer dereference

Added by Adam Tygart almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Similar to my earlier bug report, may or may not be related.

Servers: CentOS 7.1 Ceph 0.94.1

Client: Gentoo, kernel version 3.19.5, kernel mode cephfs client.

~800 initial rsyncs, copying 323TB (~46 million files). When the crash occurred we were down to ~100 left.

This has happened 4 times today. I've attempted to preemptively turn on dynamic debugging in kernel for ceph, but it seems to either mitigate the issue, or lengthen the time to reproduce.

Attached is the kernel stacktrace as gathered from our logging host, it does get cut off by client machine crashing entirely.


Files

kernel-panic-hero29.log (9.28 KB) kernel-panic-hero29.log MDS reconnection null pointer dereference trace Adam Tygart, 05/13/2015 10:49 PM
Actions #1

Updated by Zheng Yan almost 9 years ago

should be fixed by

commit c0bd50e2eeddf139d8f61e709d7003210301e93a
Author: Yan, Zheng <zyan@redhat.com>
Date:   Tue Apr 7 15:51:08 2015 +0800

    ceph: fix null pointer dereference in send_mds_reconnect()

    sb->s_root can be null when umounting

    Signed-off-by: Yan, Zheng <zyan@redhat.com>

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index fd5585b..0a2eb32 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2892,7 +2892,8 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
        spin_unlock(&session->s_cap_lock);

        /* trim unused caps to reduce MDS's cache rejoin time */
-       shrink_dcache_parent(mdsc->fsc->sb->s_root);
+       if (mdsc->fsc->sb->s_root)
+               shrink_dcache_parent(mdsc->fsc->sb->s_root);

        ceph_con_close(&session->s_con);
        ceph_con_open(&session->s_con,

This crash happens if MDS restarts while the FS is umounting. did you try umounting the fs before crash happen?

Actions #2

Updated by Adam Tygart almost 9 years ago

Yes, I forgot that I had an mount/unmount function in a script to check ceph.dir.rbytes and ceph.dir.rfiles (via a separate mount and session) to get some idea of how far along the rsyncs were getting.

Sorry for the noise.

Actions #3

Updated by Zheng Yan almost 9 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF