Bug #11621
closed
MDS Reconnection causes kernel mode null pointer dereference
Added by Adam Tygart almost 9 years ago.
Updated almost 9 years ago.
Description
Similar to my earlier bug report, may or may not be related.
Servers: CentOS 7.1 Ceph 0.94.1
Client: Gentoo, kernel version 3.19.5, kernel mode cephfs client.
~800 initial rsyncs, copying 323TB (~46 million files). When the crash occurred we were down to ~100 left.
This has happened 4 times today. I've attempted to preemptively turn on dynamic debugging in kernel for ceph, but it seems to either mitigate the issue, or lengthen the time to reproduce.
Attached is the kernel stacktrace as gathered from our logging host, it does get cut off by client machine crashing entirely.
Files
should be fixed by
commit c0bd50e2eeddf139d8f61e709d7003210301e93a
Author: Yan, Zheng <zyan@redhat.com>
Date: Tue Apr 7 15:51:08 2015 +0800
ceph: fix null pointer dereference in send_mds_reconnect()
sb->s_root can be null when umounting
Signed-off-by: Yan, Zheng <zyan@redhat.com>
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index fd5585b..0a2eb32 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2892,7 +2892,8 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
spin_unlock(&session->s_cap_lock);
/* trim unused caps to reduce MDS's cache rejoin time */
- shrink_dcache_parent(mdsc->fsc->sb->s_root);
+ if (mdsc->fsc->sb->s_root)
+ shrink_dcache_parent(mdsc->fsc->sb->s_root);
ceph_con_close(&session->s_con);
ceph_con_open(&session->s_con,
This crash happens if MDS restarts while the FS is umounting. did you try umounting the fs before crash happen?
Yes, I forgot that I had an mount/unmount function in a script to check ceph.dir.rbytes and ceph.dir.rfiles (via a separate mount and session) to get some idea of how far along the rsyncs were getting.
Sorry for the noise.
- Status changed from New to Resolved
Also available in: Atom
PDF