Bug #22256
nfs-ganesha: crashes in free_delegrecall_context
0%
Description
I've been working on delegation support in cephfs for ganesha. The ceph pieces were recently merged, so I rebased my ceph delegations patch on top of the latest ganesha -next branch. I'm now seeing regular crashes when running the cthon special tests against it. This is one of them:
(gdb) bt #0 0x00007ffff5aa01f7 in raise () from /lib64/libc.so.6 #1 0x00007ffff5aa18e8 in abort () from /lib64/libc.so.6 #2 0x0000000000435bb5 in free_delegrecall_context (deleg_ctx=0x7ffef40008c0) at /home/jlayton/git/ganesha/src/FSAL_UP/fsal_up_top.c:1075 #3 0x0000000000436394 in delegrecall_completion_func (call=0x7ffef40009a8) at /home/jlayton/git/ganesha/src/FSAL_UP/fsal_up_top.c:1201 #4 0x000000000043e123 in nfs_rpc_call_process (cc=0x7ffef4000a20) at /home/jlayton/git/ganesha/src/MainNFSD/nfs_rpc_callback.c:921 #5 0x00007ffff63d0adf in svc_rqst_expire_task (wpe=0x7ffef4000a20) at /home/jlayton/git/ganesha/src/libntirpc/src/svc_rqst.c:293 #6 0x00007ffff63db89d in work_pool_thread (arg=0x7fff94000c30) at /home/jlayton/git/ganesha/src/libntirpc/src/work_pool.c:176 #7 0x00007ffff6805e25 in start_thread () from /lib64/libpthread.so.0 #8 0x00007ffff5b6334d in clone () from /lib64/libc.so.6
Essentially, it looks like the deleg_ctx has already been freed at this point, and the drc_clid pointer is now bogus. The code then asserts because pthread_mutex_lock returned EINVAL (probably because the mutex has been scribbled over).
The patch to add delegation support to ceph is here:
https://review.gerrithub.io/#/c/377714/
It's fairly straightforward. I mostly notice this when running the cthon special tests against the server. It eventually crashes during one of the rename tests.
History
#1 Updated by Jeff Layton over 5 years ago
- File ganesha.conf View added
Here's my ganesha.conf as well. I bisected the change down to 46a5e8535f978b1e12dcb15cbdcbf6d5e757d24e (nfs_rpc_call), if I base my ceph patch on top of the commit just before this, it works fine.
#2 Updated by Patrick Donnelly over 5 years ago
- Subject changed from crashes in free_delegrecall_context to nfs-ganesha: crashes in free_delegrecall_context
- Status changed from New to In Progress
- Assignee set to Jeff Layton
- Source set to Development
#3 Updated by Jeff Layton over 5 years ago
- Status changed from In Progress to Resolved
This was fixed by commit f332c172a2884c04a0d4e743c8858ff3e7f957a1 in ganesha (and the associated ntirpc changes).
#4 Updated by Patrick Donnelly over 4 years ago
- Category deleted (
109) - Component(FS) Client, Ganesha FSAL added