Project

General

Profile

Bug #23028

client: allow client to use caps that are revoked but not yet returned

Added by Jeff Layton about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
Start date:
02/16/2018
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, Ganesha FSAL, MDS
Labels (FS):
Pull request ID:

Description

I found a problem with ceph and delegations.

NFS client holds a read delegation on a file. That gets recalled by ganesha due to the MDS revoking some caps.

The NFS DELEGRETURN compound does a GETATTR just before the DELEGRETURN. (Mostly it does this for write delegations -- presumably the client will have flushed all of its cached writes just before returning the delegation and it wants an updated cache after that activity).

Ganesha calls down into cephfs to satisfy the getattr. That results in the ceph client sending a GETATTR request to the MDS, because the caps it had have now been revoked. The MDS won't satisfy the request however until the client returns the caps held by the delegation. Deadlock. Eventually ganesha revokes the delegation and things unwedge, but that's not a good outcome.

I think we can loosen this up a little bit. Currently in->caps_issued_mask only considers caps that the MDS has most recently granted, but revoked caps are still valid up until the point where a cap return is issued. Can we allow caps_issued_mask to also consider caps in this state to be valid?

I think that might help resolve this deadlock

0001-tests-add-a-test-for-doing-a-getattr-while-caps-are-.patch View (3.86 KB) Jeff Layton, 02/20/2018 06:02 PM


Related issues

Copied to fs - Backport #23314: luminous: client: allow client to use caps that are revoked but not yet returned Resolved

History

#1 Updated by Patrick Donnelly about 1 year ago

  • Subject changed from allow client to use caps that are revoked but not yet returned to mds: allow client to use caps that are revoked but not yet returned
  • Category set to Correctness/Safety
  • Priority changed from Normal to High
  • Backport set to luminous
  • Component(FS) MDS added

Sounds reasonable Jeff. I think only a change to the MDS is required?

#2 Updated by Jeff Layton about 1 year ago

Patrick Donnelly wrote:

Sounds reasonable Jeff. I think only a change to the MDS is required?

No, actually this would be a client-side change. Only the client knows whether it has any revoked but not yet returned caps.

I still need to figure out how to actually implement it there, so I'll don my welding mask next week and have a look at the caps tracking code again.

#3 Updated by Patrick Donnelly about 1 year ago

  • Subject changed from mds: allow client to use caps that are revoked but not yet returned to client: allow client to use caps that are revoked but not yet returned

#4 Updated by Greg Farnum about 1 year ago

You’ll need to be super careful here. It sounds like the point of this check is to prevent the Client from creating new references to capabilities which are being revoked revoked by the MDS. getattr returns instantly without adding new logical references, but plenty of other things like read/write don’t and we need them not to pass...

#5 Updated by Jeff Layton about 1 year ago

Yes. We don't want to livelock here, so we may want to restrict this access to those functions that complete the use of those caps before the client->lock is dropped.

FWIW, I've written a small testcase for this, but it's not working like I had expected. It's possible I may have misunderstood the problem. I'm looking into that now.

#6 Updated by Jeff Layton about 1 year ago

I spent most of yesterday trying to write a reproducer but so far I haven't been successful. I did add some debugging to the ceph client libs however. Here's basically what I'm seeing:

I basically have one client doing tail -f on a file (and getting a delegation) and then another client appends some data to that file. Not every time, but occasionally, I see the hang that I described above. Both clients hold some subset of the caps needed to satisfy a getattr, but one of them has outstanding references to those caps (via the Delegation):

2018-02-20 14:01:23.621 7f476ffff700 10 client.24109  mds.0 seq now 164
2018-02-20 14:01:23.621 7f476ffff700  5 client.24109 handle_cap_grant on in 0x10000000000 mds.0 seq 178 caps now pAsLsXsFcb was pAsLsXsFsxcrwb
2018-02-20 14:01:23.621 7f476ffff700 10 client.24109 update_inode_file_bits 0x10000000000.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={1=1,4=1,16=1,64=1,256=1,1024=1,2048=1,4096=0,8192=0} open={1=2,2=0,3=0} mode=100664 size=212/4194304 mtime=2018-02-20 14:00:39.297190 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 0] parents=0x1.head["foo"] 0x7f476800ee90) pAsLsXsFsxcrwb mtime 2018-02-20 14:00:39.297190
2018-02-20 14:01:23.621 7f476ffff700 25 client.24109 truncate_seq: mds 1 local 1 time_warp_seq: mds 0 local 0
2018-02-20 14:01:23.621 7f476ffff700 30 client.24109 Yay have enough caps to look at our times
2018-02-20 14:01:23.621 7f476ffff700 10 client.24109   revocation of Fsxrw
2018-02-20 14:01:23.621 7f476ffff700  0 recall_deleg: recalling 0x7f476800ee90
2018-02-20 14:01:23.621 7f476ffff700 10 client.24109 check_caps on 0x10000000000.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={1=1,4=1,16=1,64=1,256=1,1024=1,2048=1,4096=0,8192=0} open={1=2,2=0,3=0} mode=100664 size=212/4194304 mtime=2018-02-20 14:00:39.297190 caps=pAsLsXsFcb(0=pAsLsXsFcb) objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 0] parents=0x1.head["foo"] 0x7f476800ee90) wanted pAsLsXsFscr used pAsLsXsFscr issued pAsLsXsFcb revoking Fsxrw flags=0
2018-02-20 14:01:23.621 7f476ffff700 10 client.24109  cap mds.0 issued pAsLsXsFcb implemented pAsLsXsFsxcrwb revoking Fsxrw
2018-02-20 14:01:23.621 7f476ffff700 10 client.24109 send_cap 0x10000000000.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={1=1,4=1,16=1,64=1,256=1,1024=1,2048=1,4096=0,8192=0} open={1=2,2=0,3=0} mode=100664 size=212/4194304 mtime=2018-02-20 14:00:39.297190 caps=pAsLsXsFcb(0=pAsLsXsFcb) objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 0] parents=0x1.head["foo"] 0x7f476800ee90) mds.0 seq 178 async  used pAsLsXsFscr want pAsLsXsFscr flush - retain pAsxLsxXsxFcbl held pAsLsXsFsxcrwb revoking Fsxrw dropping -
2018-02-20 14:01:23.621 7f476ffff700 15 client.24109 auth cap, setting max_size = 0
2018-02-20 14:01:23.623 7f477c1b6700 15 caps_issued_mask: mask=pAsLsXsFs total caps=pAsLsXsFcb
2018-02-20 14:01:23.623 7f477c1b6700  3 client.24109 ll_getattrx not enough caps! mask= pAsLsXsFs
2018-02-20 14:01:23.623 7f477c1b6700  3 client.24109 _ll_getattr 0x10000000000.head
2018-02-20 14:01:23.623 7f477c1b6700 15 caps_issued_mask: mask=pAsLsXsFs total caps=pAsLsXsFcb
2018-02-20 14:01:23.623 7f477c1b6700 10 client.24109 _getattr mask pAsLsXsFs issued=0
2018-02-20 14:01:23.623 7f477c1b6700 15 inode.get on 0x7f476800ee90 0x10000000000.head now 6
2018-02-20 14:01:23.623 7f477c1b6700 20 client.24109 choose_target_mds starting with req->inode 0x10000000000.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1=1,4=1,16=1,64=1,256=1,1024=1,2048=1,4096=0,8192=0} open={1=2,2=0,3=0} mode=100664 size=212/4194304 mtime=2018-02-20 14:00:39.297190 caps=pAsLsXsFcb(0=pAsLsXsFcb) objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 0] parents=0x1.head["foo"] 0x7f476800ee90)
2018-02-20 14:01:23.623 7f477c1b6700 20 client.24109 choose_target_mds 0x10000000000.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1=1,4=1,16=1,64=1,256=1,1024=1,2048=1,4096=0,8192=0} open={1=2,2=0,3=0} mode=100664 size=212/4194304 mtime=2018-02-20 14:00:39.297190 caps=pAsLsXsFcb(0=pAsLsXsFcb) objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 0] parents=0x1.head["foo"] 0x7f476800ee90) is_hash=0 hash=0
2018-02-20 14:01:23.623 7f477c1b6700 10 client.24109 choose_target_mds from caps on inode 0x10000000000.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1=1,4=1,16=1,64=1,256=1,1024=1,2048=1,4096=0,8192=0} open={1=2,2=0,3=0} mode=100664 size=212/4194304 mtime=2018-02-20 14:00:39.297190 caps=pAsLsXsFcb(0=pAsLsXsFcb) objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 0] parents=0x1.head["foo"] 0x7f476800ee90)
2018-02-20 14:01:23.623 7f477c1b6700 20 client.24109 mds is 0
2018-02-20 14:01:23.623 7f477c1b6700 10 client.24109 send_request rebuilding request 20 for mds.0
2018-02-20 14:01:23.623 7f477c1b6700 20 client.24109 encode_cap_releases enter (req: 0x7f46ac0059b0, mds: 0)
2018-02-20 14:01:23.623 7f477c1b6700 25 client.24109 encode_cap_releases exit (req: 0x7f46ac0059b0, mds 0
2018-02-20 14:01:23.623 7f477c1b6700 20 client.24109 send_request set sent_stamp to 2018-02-20 14:01:23.623909
2018-02-20 14:01:23.623 7f477c1b6700 10 client.24109 send_request client_request(unknown.0:20 getattr pAsLsXsFs #0x10000000000 2018-02-20 14:01:23.623823 caller_uid=4447, caller_gid=4447{10,4447,}) v4 to mds.0
2018-02-20 14:01:23.623 7f477c1b6700 20 client.24109 awaiting reply|forward|kick on 0x7f477c1b1ed0

... the other client then comes in, and does a getattr (I think to revalidate the cache before opening the file for write?):

2018-02-20 14:01:23.623 7f477c1b6700 15 caps_issued_mask: mask=pAsLsXsFs total caps=pAsLsXsFcb
2018-02-20 14:01:23.623 7f477c1b6700  3 client.24109 ll_getattrx not enough caps! mask= pAsLsXsFs
2018-02-20 14:01:23.623 7f477c1b6700  3 client.24109 _ll_getattr 0x10000000000.head
2018-02-20 14:01:23.623 7f477c1b6700 15 caps_issued_mask: mask=pAsLsXsFs total caps=pAsLsXsFcb
2018-02-20 14:01:23.623 7f477c1b6700 10 client.24109 _getattr mask pAsLsXsFs issued=0
2018-02-20 14:01:23.623 7f477c1b6700 15 inode.get on 0x7f476800ee90 0x10000000000.head now 6
2018-02-20 14:01:23.623 7f477c1b6700 20 client.24109 choose_target_mds starting with req->inode 0x10000000000.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1=1,4=1,16=1,64=1,256=1,1024=1,2048=1,4096=0,8192=0} open={1=2,2=0,3=0} mode=100664 size=212/4194304 mtime=2018-02-20 14:00:39.297190 caps=pAsLsXsFcb(0=pAsLsXsFcb) objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 0] parents=0x1.head["foo"] 0x7f476800ee90)
2018-02-20 14:01:23.623 7f477c1b6700 20 client.24109 choose_target_mds 0x10000000000.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1=1,4=1,16=1,64=1,256=1,1024=1,2048=1,4096=0,8192=0} open={1=2,2=0,3=0} mode=100664 size=212/4194304 mtime=2018-02-20 14:00:39.297190 caps=pAsLsXsFcb(0=pAsLsXsFcb) objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 0] parents=0x1.head["foo"] 0x7f476800ee90) is_hash=0 hash=0
2018-02-20 14:01:23.623 7f477c1b6700 10 client.24109 choose_target_mds from caps on inode 0x10000000000.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={1=1,4=1,16=1,64=1,256=1,1024=1,2048=1,4096=0,8192=0} open={1=2,2=0,3=0} mode=100664 size=212/4194304 mtime=2018-02-20 14:00:39.297190 caps=pAsLsXsFcb(0=pAsLsXsFcb) objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 0] parents=0x1.head["foo"] 0x7f476800ee90)
2018-02-20 14:01:23.621 7f68344f5700 20 client.24103 mds is 0
2018-02-20 14:01:23.621 7f68344f5700 10 client.24103 send_request rebuilding request 102 for mds.0
2018-02-20 14:01:23.621 7f68344f5700 20 client.24103 encode_cap_releases enter (req: 0x7f6758011650, mds: 0)
2018-02-20 14:01:23.621 7f68344f5700 25 client.24103 encode_cap_releases exit (req: 0x7f6758011650, mds 0
2018-02-20 14:01:23.621 7f68344f5700 20 client.24103 send_request set sent_stamp to 2018-02-20 14:01:23.621773
2018-02-20 14:01:23.621 7f68344f5700 10 client.24103 send_request client_request(unknown.0:102 getattr pAsLsXsFs #0x10000000000 2018-02-20 14:01:23.621680 caller_uid=4447, caller_gid=4447{10,4447,}) v4 to mds.0
2018-02-20 14:01:23.621 7f68344f5700 20 client.24103 awaiting reply|forward|kick on 0x7f68344f0ed0
2018-02-20 14:01:24.612 7f6837fff700 21 client.24103 tick

...and at that point, the MDS can't get the rdlock on all of the caps it needs and stalls. Eventually the delegation gets revoked and things unwedge, but that takes ~60s.

#7 Updated by Jeff Layton about 1 year ago

This testcase seems to reproduce it reliably. The trick was that I needed the client holding the delegation to open the file for write first, and then close it before requesting the delegation. That gives that client some caps associated with writes.

Now looking at potential fixes.

#8 Updated by Jeff Layton about 1 year ago

Ok, I think I have something that fixes it. Basically, I think we can allow getattr* and ll_getattr* to use "implemented" caps in addition to "issued" caps. This is a nice optimization as well and should allow us to avoid some MDS calls.

We may need to expand this to other calls in the future, but for now limiting it to the getattr codepaths seems to be enough to fix the issue I was seeing.

#9 Updated by Jeff Layton about 1 year ago

PR is up for this:

https://github.com/ceph/ceph/pull/20502

...one patch needs to be squashed before merge, but it seems to DTRT.

#10 Updated by Patrick Donnelly about 1 year ago

  • Status changed from New to Pending Backport
  • Target version set to v13.0.0
  • Source set to Development

#11 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #23314: luminous: client: allow client to use caps that are revoked but not yet returned added

#12 Updated by Nathan Cutler 12 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF