Project

General

Profile

Fix #2215

ceph-fuse does not invalidate page cache

Added by Greg Farnum about 12 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Right now the userspace client doesn't invalidate the page cache when it loses the cache capability on an inode. Apparently this is due to deadlock with the way that FUSE invalidation interacts with our implementation.

Obviously we'll need to fix it sometime before we tell people the filesystem is production ready.

Associated revisions

Revision fec19121 (diff)
Added by Sam Lang over 11 years ago

client: Fix #2215 with cache inval in thread

The client currently deadlocks with kernel buffer cache invalidation
enabled, due to the client lock calling the invalidate callback, which
in turn sends up calls back to the userspace process which try to lock
the same client lock. The fix is to invoke the invalidate callback in
a separate thread, allowing _release, _flushed, etc. to complete,
unlocking the client lock so that the invalidate callback avoids deadlock
when the up call is made.

We construct a separate work queue (Finisher) that allows scheduling
the invalidate callbacks in a separate thread. The thread only starts
when the invalidate callback is set. If no callback is set, the cache
capability reference is decremented inline as before.

Some callers of invalidate_inode_cache (flush and update_inode_file_bits)
don't expect the cache capability to be decremented. Pass a keep_caps flag to
only decrement the capability ref in the _release case.

Also, we need to make sure the mds is aware that the client has dropped
the cache capability, so we add a call to check_caps in put_cap_ref for the
CEPH_CAP_FILE_CACHE capability.

Signed-off-by: Sam Lang <>

Revision 65c31e1b (diff)
Added by Sage Weil about 11 years ago

ceph-fuse: invalidate cache by default

Closes: #2215
Signed-off-by: Sage Weil <>

History

#1 Updated by Sam Lang over 11 years ago

  • Status changed from New to 7
  • Assignee set to Sam Lang

#2 Updated by Sage Weil over 11 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (11)

#3 Updated by Sage Weil over 11 years ago

  • Target version set to v0.54b

#4 Updated by Sam Lang over 11 years ago

Running in teuthology with the multiclient test, I see a segfault in one of the fuse clients on the plana nodes. With the addition of a sleep in the thread before calling the ino_invalidate_cb, the segv is reproducible.

The problem is in the ceph-fuse.cc and fuse_ll.cc code. Teardown of the fuse session and channel was happening before we called Client::unmount, which breaks with the unmount calling the invalidate callback for inodes.

The proposed fix in wip-2215 is to separate out the fuse finalize code from ceph_fuse_ll_main and call it after the Client::unmount finishes.

#5 Updated by Sage Weil over 11 years ago

wip-2215 looks good to me!

#6 Updated by Sage Weil about 11 years ago

  • Priority changed from Normal to High

#7 Updated by Greg Farnum about 11 years ago

  • Target version deleted (v0.54b)

#8 Updated by Sage Weil about 11 years ago

  • Tracker changed from Bug to Fix

#9 Updated by Ian Colle about 11 years ago

  • Target version set to v0.59

#10 Updated by Ian Colle about 11 years ago

  • translation missing: en.field_story_points set to 1.00

#11 Updated by Greg Farnum about 11 years ago

  • Target version changed from v0.59 to v0.60

#12 Updated by Greg Farnum about 11 years ago

  • Target version changed from v0.60 to v0.59

#13 Updated by Sam Lang about 11 years ago

I add the fuse_use_invalidate_cb: true option in the ceph-qa-suite to the basic and verify fs suites (in the btrfs.yamls for now). Once those pass reliably we can adjust the default in the config.

#14 Updated by Greg Farnum about 11 years ago

Which automatic tests actually run those? I'm not sure that the nightlies do so right now.

#15 Updated by Sage Weil about 11 years ago

  • Target version changed from v0.59 to v0.60

#16 Updated by Sam Lang about 11 years ago

Those tests are part of the full regression test suite.

#17 Updated by Greg Farnum about 11 years ago

  • Status changed from 7 to Resolved

Sage is turning it on by default now following weeks of testing in the nightlies!

Also available in: Atom PDF