Project

General

Profile

Actions

Bug #46056

closed

assertion triggered in LRU::lru_touch in ganesha+libcephfs client

Added by Jeff Layton almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Ganesha FSAL
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Testing a simple vstart setup with these commands. ganesha is configured with 2 exports that are exporting the same filesystem:

ceph nfs export create cephfs "a" $cluster_id "/cephfs" 
ceph nfs export create cephfs "a" $cluster_id "/cephfs1" 

...then, on a nfs client:

mount -o port=$NFS_PORT ip_addr:/cephfs /mnt
echo foo > /mnt/foo
umount /mnt
mount -o port=$NFS_PORT ip_addr:/cephfs1 /mnt
echo foo > /mnt/foo

...the last command hangs, because ganesha has crashed. Here's the stack trace:

(gdb) bt
#0  0x00007faabdd55a25 in raise () from /lib64/libc.so.6
#1  0x00007faabdd3e895 in abort () from /lib64/libc.so.6
#2  0x00007faab155fc0e in ceph::__ceph_assert_fail (assertion=0x7faaa4719db7 "o->lru == this", file=0x7faaa4719cb0 "/home/jlayton/git/ceph/src/include/lru.h", line=112, 
    func=0x7faaa4719dc8 "bool LRU::lru_touch(LRUObject*)") at /home/jlayton/git/ceph/src/common/assert.cc:75
#3  0x00007faab155fcc1 in ceph::__ceph_assert_fail (ctx=...) at /home/jlayton/git/ceph/src/common/assert.cc:80
#4  0x00007faaa462fefa in LRU::lru_touch (this=0x563286e2c8a8, o=0x7fa9b000aed0) at /home/jlayton/git/ceph/src/include/lru.h:112
#5  0x00007faaa45d7fc2 in Client::touch_dn (this=0x563286e2c6e0, dn=0x7fa9b000aed0) at /home/jlayton/git/ceph/src/client/Client.cc:7552
#6  0x00007faaa45ce6e0 in Client::_lookup (this=0x563286e2c6e0, dir=0x7faa08007db0, dname="foo", mask=341, target=0x7faaa44e8388, perms=...)
    at /home/jlayton/git/ceph/src/client/Client.cc:6505
#7  0x00007faaa45f3633 in Client::ll_lookupx (this=0x563286e2c6e0, parent=0x7faa08007db0, name=0x7fa9b0007980 "foo", out=0x7faaa44e84b8, stx=0x7faaa44e84c0, want=8191, flags=0, perms=...)
    at /home/jlayton/git/ceph/src/client/Client.cc:10940
#8  0x00007faaa456e970 in ceph_ll_lookup (cmount=0x563286c65700, parent=0x7faa08007db0, name=0x7fa9b0007980 "foo", out=0x7faaa44e84b8, stx=0x7faaa44e84c0, want=8191, flags=0, 
    perms=0x7fa9b000b7c0) at /home/jlayton/git/ceph/src/libcephfs.cc:1608
#9  0x00007faabe0f5da6 in ceph_fsal_lookup () from /usr/lib64/ganesha/libfsalceph.so
#10 0x00007faabe01f699 in mdc_lookup_uncached () from /lib64/libganesha_nfsd.so.3.3
#11 0x00007faabe01fb32 in mdc_lookup () from /lib64/libganesha_nfsd.so.3.3
#12 0x00007faabe010ccf in mdcache_lookup () from /lib64/libganesha_nfsd.so.3.3
#13 0x00007faabdfdeff6 in open4_ex () from /lib64/libganesha_nfsd.so.3.3
#14 0x00007faabdfe11ca in nfs4_op_open () from /lib64/libganesha_nfsd.so.3.3
#15 0x00007faabdfcefb6 in process_one_op () from /lib64/libganesha_nfsd.so.3.3
#16 0x00007faabdfd0188 in nfs4_Compound () from /lib64/libganesha_nfsd.so.3.3
#17 0x00007faabdf52826 in nfs_rpc_process_request () from /lib64/libganesha_nfsd.so.3.3
#18 0x00007faabdcf48df in svc_request () from /lib64/libntirpc.so.3.3
#19 0x00007faabdcf237a in svc_rqst_xprt_task_recv () from /lib64/libntirpc.so.3.3
#20 0x00007faabdcf2eae in svc_rqst_epoll_loop () from /lib64/libntirpc.so.3.3
#21 0x00007faabdcfd69c in work_pool_thread () from /lib64/libntirpc.so.3.3
#22 0x00007faabdeec432 in start_thread () from /lib64/libpthread.so.0
#23 0x00007faabde1a9d3 in clone () from /lib64/libc.so.6

That assertion corresponds to this:

  // touch item -- move to head of lru
  bool lru_touch(LRUObject *o) {
    if (!o->lru) {
      lru_insert_top(o);
    } else {
      ceph_assert(o->lru == this);  // <<< ASSERTION HERE
      auto list = o->lru_link.get_list();
      ceph_assert(list == &top || list == &bottom || list == &pintail);
      top.push_front(&o->lru_link);
      adjust();
    }
    return true;
  }
Actions #1

Updated by Jeff Layton almost 4 years ago

Tried modifying the MulticlientSimple test to more closely follow ganesha's usage, but it didn't reproduce the issue. Perhaps it's related to use of the lowlevel interfaces though, so I'll need to try that next.

Actions #2

Updated by Patrick Donnelly almost 4 years ago

My current theory is that Ganesha is replacing one mount's Inode with the other because the inode numbers are identical. Then passing this Inode to the other will result in the assertion.

Actions #3

Updated by Jeff Layton almost 4 years ago

  • Assignee set to Jeff Layton

Patrick suggested that ganesha may be using an Inode pointer from one cephfs client and is trying to hand that pointer to the other one. I suspect that he's correct. I've confirmed that libcephfs gives us different Inode pointers back from the initial root lookup, so it seems to be doing the right thing.

Next I'll need to confirm whether that's the case.

Actions #4

Updated by Jeff Layton almost 4 years ago

Ok, I think I may see what's going on now.

Ganesha encodes the export ID into the filehandle on the wire, but that gets stripped off before the mdcache goes to find it. At that point it can't discriminate between the two and ends up finding the wrong one sometimes. So, we end up passing down Inode pointers to libcephfs that came from a completely different client.

There are several approaches we could take to fix it, but the best one is probably to create a cache of ceph_client objects and attempt to share them between different mounts of the same fs. I'm going to take a stab at a patch to do that, but it means reworking quite a bit of the FSAL_CEPH export creation code.

Actions #5

Updated by Jeff Layton almost 4 years ago

I started working on this, but there is a bit of a dilemma. My approach to fixing this was to share the ceph client between multiple exports. That means mounting / and walking down to the inodes that are the root of the exports from there.

I'm not sure now though if that will run afoul of path restricted client capabilities

Ramana, I think you did some work in that area. Is that a use case we need to preserve here? Are we planning to have ganesha use creds that won't be able to mount "/"?

Actions #6

Updated by Jeff Layton almost 4 years ago

sepia-liu also proposed a patch to ganesha to fix this: https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/495911

...but I don't think it's quite what we need.

The most simplest (and probably most effective) solution would be to teach ganesha how to not strip out the exportid from the filehandles that it hands off to the mdcache. If that's not really doable, then we may be able to do effectively the same thing in FSAL_CEPH, but it'd be quite a bit more complicated.

Actions #7

Updated by Jeff Layton almost 4 years ago

  • Component(FS) Ganesha FSAL added
Actions #8

Updated by Jeff Layton almost 4 years ago

I think what I can do is FSAL_CEPH embed the export ID inside the handle-key that it uses as the cache key, without embedding that inside the handle that goes out onto the wire. ganesha has a lot of overloadable operations that should make that possible, so I'm trying that approach now. That should still leave us with one client per export, but the exports shouldn't mix objects that way.

Actions #9

Updated by Jeff Layton almost 4 years ago

Ok, I have a patchset that fixes this for ganesha. It turns out not to be terribly invasive, but it does need more testing.

Actions #10

Updated by Ramana Raja almost 4 years ago

Jeff Layton wrote:

I started working on this, but there is a bit of a dilemma. My approach to fixing this was to share the ceph client between multiple exports. That means mounting / and walking down to the inodes that are the root of the exports from there.

I'm not sure now though if that will run afoul of path restricted client capabilities

Ramana, I think you did some work in that area. Is that a use case we need to preserve here? Are we planning to have ganesha use creds that won't be able to mount "/"?

Sorry, missed seeing this question earlier.

In manila's cephfs driver, each ganesha export is given a unique cephx auth ID with r/rw caps restricted to the subvolume path (e.g., /volumes/gp/share/) being exported.
https://github.com/openstack/manila/blob/stable/train/manila/share/drivers/cephfs/driver.py#L564

Looks like your current solution does not need the ganesha's ceph client to always mount "/" of the ceph filesystem.

Actions #11

Updated by Jeff Layton almost 4 years ago

Yeah, the latest scheme just ensures that handles (which are somewhat analogous to inodes) are never shared between exports. That's a bit less efficient than it could when clients are competing for access to the same inodes on different exports, but hopefully those cases will be rare in practice.

Actions #12

Updated by Jeff Layton almost 4 years ago

  • Status changed from New to Resolved

The main patch that fixes this in ganesha is here:

https://github.com/nfs-ganesha/nfs-ganesha/commit/e45743b4735ac4f6caad7d25ca21c62b08302625

There are a few prerequisite patches in the series however, and a few related cleanups and fixes.

Actions

Also available in: Atom PDF