Bug #17982: fuse client failing to trim disconnected inode on unmount - CephFS - Ceph

Actions

Copy link

Bug #17982

closed

fuse client failing to trim disconnected inode on unmount

Added by John Spray over 7 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Jeff Layton

Category:

Correctness/Safety

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Client

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This fuse client is failing to terminate after being unmounted. It has a disconnected inode 10000008bdc from an unlinked file, and something is going wrong with the client giving up its caps on that inode.

http://pulpito.ceph.com/teuthology-2016-11-20_11:10:02-fs-master---basic-smithi/564402/

client does try to give up caps in flush_caps_sync:
2016-11-21 07:36:02.092797 7f458e1efec0 10 client.4125 check_caps on 10000008bdc.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={1024=0,4096=0,8192=0} open={3=0} mode=100600 size=30183/0 mtime=2016-11-21 07:32:47.300939 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[10000008bdc ts 0/0 objects 0 dirty_or_tx 0] 0x7f45a6d42f00) wanted - used - issued pAsLsXsFsc revoking - flags=3
2016-11-21 07:36:02.092812 7f458e1efec0 10 client.4125  cap mds.0 issued pAsLsXsFsc implemented pAsLsXsFsc revoking -
2016-11-21 07:36:02.092817 7f458e1efec0 10 client.4125 send_cap 10000008bdc.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={1024=0,4096=0,8192=0} open={3=0} mode=100600 size=30183/0 mtime=2016-11-21 07:32:47.300939 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[10000008bdc ts 0/0 objects 0 dirty_or_tx 0] 0x7f45a6d42f00) mds.0 seq 8 sync  used - want - flush - retain p held pAsLsXsFsc revoking - dropping AsLsXsFsc
2016-11-21 07:36:02.092833 7f458e1efec0  1 -- 172.21.15.100:0/3009409066 --> 172.21.15.100:6808/1301957563 -- client_caps(update ino 10000008bdc 35806 seq 8 caps=p dirty=- wanted=- follows 0 size 30183/30183 ts 1/18446744073709551615 mtime 2016-11-21 07:32:47.300939) v10 -- 0x7f459950b400 con 0

...

2016-11-21 07:36:47.067930 7f45844fe700  1 -- 172.21.15.100:0/3009409066 <== mds.0 172.21.15.100:6808/1301957563 776955 ==== client_caps(grant ino 10000008bdc 35806 seq 9 caps=pAsLsXsFscr dirty=- wanted=Ls follows 0 size 30183/0 ts 1/18446744073709551615 mtime 2016-11-21 07:32:47.300939) v10 ==== 236+0+0 (3225150974 0 0) 0x7f459a334800 con 0x7f459777a800
2016-11-21 07:36:47.067950 7f45844fe700 10 client.4125  mds.0 seq now 316657
2016-11-21 07:36:47.067956 7f45844fe700  5 client.4125 handle_cap_grant on in 10000008bdc mds.0 seq 9 caps now pAsLsXsFscr was p
2016-11-21 07:36:47.067964 7f45844fe700 10 client.4125 update_inode_file_bits 10000008bdc.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={1024=0,4096=0,8192=0} open={3=0} mode=100600 size=30183/0 mtime=2016-11-21 07:32:47.300939 caps=-(0=p) objectset[10000008bdc ts 0/0 objects 0 dirty_or_tx 0] 0x7f45a6d42f00) - mtime 2016-11-21 07:32:47.300939
2016-11-21 07:36:47.067984 7f45844fe700 10 client.4125   grant, new caps are AsLsXsFscr
2016-11-21 07:36:47.067991 7f45844fe700 10 client.4125 unmounting: trim pass, size was 0+1
2016-11-21 07:36:47.067993 7f45844fe700 10 client.4125 unmounting: trim pass, size still 0+1

mds does see the client trying to give up caps:
2016-11-21 07:36:02.093067 7f4cdab3b700  1 -- 172.21.15.100:6808/1301957563 <== client.4125 172.21.15.100:0/3009409066 552162 ==== client_caps(update ino 10000008bdc 35806 seq 8 caps=p dirty=- wanted=- follows 0 size 30183/30183 ts 1/18446744073709551615 mtime 2016-11-21 07:32:47.300939) v10 ==== 236+0+0 (2468978433 0 0) 0x7f4cebeb1c00 con 0x7f4cead94000

Actions

Copy link

Updated by Jeff Layton over 7 years ago

Assignee set to Jeff Layton

I'll grab this for now since I was in this codepath recently. Looking at it though, I suspect that this is unrelated to the fsync changes that went in. That code doesn't really change how we wait on inodes or anything. It just requests that the mds flush the journal on the last cap release when flushing them all. Could the MDS be stuck, not giving us a reply?

These messages have a trim_cache() call between them, so I guess that this means that it didn't have any effect:

2016-11-21 07:36:47.067991 7f45844fe700 10 client.4125 unmounting: trim pass, size was 0+1 2016-11-21 07:36:47.067993 7f45844fe700 10 client.4125 unmounting: trim pass, size still 0+1

...the stuck inode seems to be 100000015df. It looks like we sent the cap release here:

2016-11-21 17:02:06.202955 7fbcace6bec0 10 client.4137 send_cap 100000015df.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={1024=0} open={3=0} mode=100600 size=0/0 mtime=2016-11-21 16:51:00.613191 caps=pAsLsXsFs(0=pAsLsXsFs) objectset[100000015df ts 0/0 objects 0 dirty_or_tx 0] 0x7fbcc5c0e500) mds.0 seq 5 sync used - want - flush - retain p held pAsLsXsFs revoking - dropping AsLsXsFs

Actions

Copy link

Updated by Jeff Layton over 7 years ago

One oddity in the check_caps code is that we always try to retain CEPH_CAP_PIN, even when unmounting. I'm going to test a patch that drops that cap when unmounting as well, and we can see whether that helps.

Actions

Copy link

Updated by John Spray over 7 years ago

Another instance
jspray-2016-11-23_14:06:32-fs-wip-jcsp-testing-20161122-distro-basic-smithi/572367

Actions

Copy link

Updated by John Spray over 7 years ago

http://pulpito.ceph.com/teuthology-2016-11-21_17:15:01-fs-master---basic-smithi/567231/

Actions

Copy link

Updated by Zheng Yan over 7 years ago

This looks like inode reference leak. I found something suspicious

int Client::ll_createx(...)
{
  unsigned caps = statx_to_mask(lflags, want);
  InodeRef in;
  Mutex::Locker lock(client_lock);

InodeRef is defined before the locker. It means inode reference is dropped after unlock the client_lock. We encountered this type of bug before https://github.com/ceph/ceph/commit/748777dd41283d686989903f69a65305448afc35

Actions

Copy link

Updated by Jeff Layton over 7 years ago

Thanks. To make sure i understand...

Since the inode is released after the mutex is dropped, then the refcount decrement is not adequately protected and is subject to races? That could definitely cause a refcount leak if so. I'll go over the client code in a bit and see whether I did that in any other places.

Actions

Copy link

Updated by Zheng Yan over 7 years ago

Jeff Layton wrote:

Thanks. To make sure i understand...

Since the inode is released after the mutex is dropped, then the refcount decrement is not adequately protected and is subject to races?

yes

Actions

Copy link