Bug #11621
closedMDS Reconnection causes kernel mode null pointer dereference
0%
Description
Similar to my earlier bug report, may or may not be related.
Servers: CentOS 7.1 Ceph 0.94.1
Client: Gentoo, kernel version 3.19.5, kernel mode cephfs client.
~800 initial rsyncs, copying 323TB (~46 million files). When the crash occurred we were down to ~100 left.
This has happened 4 times today. I've attempted to preemptively turn on dynamic debugging in kernel for ceph, but it seems to either mitigate the issue, or lengthen the time to reproduce.
Attached is the kernel stacktrace as gathered from our logging host, it does get cut off by client machine crashing entirely.
Files
Updated by Zheng Yan almost 9 years ago
should be fixed by
commit c0bd50e2eeddf139d8f61e709d7003210301e93a Author: Yan, Zheng <zyan@redhat.com> Date: Tue Apr 7 15:51:08 2015 +0800 ceph: fix null pointer dereference in send_mds_reconnect() sb->s_root can be null when umounting Signed-off-by: Yan, Zheng <zyan@redhat.com> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index fd5585b..0a2eb32 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2892,7 +2892,8 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc, spin_unlock(&session->s_cap_lock); /* trim unused caps to reduce MDS's cache rejoin time */ - shrink_dcache_parent(mdsc->fsc->sb->s_root); + if (mdsc->fsc->sb->s_root) + shrink_dcache_parent(mdsc->fsc->sb->s_root); ceph_con_close(&session->s_con); ceph_con_open(&session->s_con,
This crash happens if MDS restarts while the FS is umounting. did you try umounting the fs before crash happen?
Updated by Adam Tygart almost 9 years ago
Yes, I forgot that I had an mount/unmount function in a script to check ceph.dir.rbytes and ceph.dir.rfiles (via a separate mount and session) to get some idea of how far along the rsyncs were getting.
Sorry for the noise.