Project

General

Profile

Actions

Bug #43960

open

MDS: incorrectly issues Fc for new opens when there is an existing writer

Added by Greg Farnum about 4 years ago. Updated almost 2 years ago.

Status:
Triaged
Priority:
Urgent
Assignee:
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Cloned from #43748, to cover the MDS-side issue. (Note that I have changed much of the text below to correct a few details.)
In an active/standby configuration of two clients managed by file locks, the standby client causes unbuffered I/O on the active client, which is to be expected if they both ask for Frw. Additionally, and more importantly, failure of the standby client causes the active client I/O to freeze because the standby client somehow receives Fc caps it cannot release.

This can be reproduced by starting two ceph-fuse clients. Then in the first "active" client, do:

flock -x bar1 -c 'date; echo enter; while sleep 1; do date; echo foobar | tee -a bar1; done'

Then in the "standby" client:

flock -x bar1 -c 'date; echo 2 enter; sleep 30; date; echo 2 bye;'

Note: the second client should never print output because it cannot obtain the exclusive lock on bar1.

Now, send SIGKILL to the standby ceph-fuse client. This will cause I/O to halt for the first client until the MDS autocloses the "standby" ceph-fuse client session after 60s.

Here's the log from the standby client:

2020-01-21T17:44:33.248+0000 7f1d0b7fe700 20 client.4278 trim_cache size 2 max 16384
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 45 ==== client_caps(revoke ino 0x10000000001 11 seq 34 caps=pAsLsXs dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c00a1e0 con 0x557c4d018dc0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 31
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 34 caps now pAsLsXs was pAsLsXsFr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=287/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:31.232141+0000 ctime=2020-01-21T17:44:31.232141+0000 caps=pAsLsXsFr(0=pAsLsXsFr) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXsFr ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278   revocation of Fr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 check_caps on 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=287/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) wanted pFscr used - issued pAsLsXs revoking Fr flags=0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278  cap mds.0 issued pAsLsXs implemented pAsLsXsFr revoking Fr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 completed revocation of Fr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 send_cap 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=287/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) mds.0 seq 34 used - want pFscr flush - retain pAsxLsxXsxFsxcwbl held pAsLsXsFr revoking Fr dropping -
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 15 client.4278 auth cap, setting max_size = 0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 --> [v2:127.0.0.1:6854/1480111679,v1:127.0.0.1:6855/1480111679] -- client_caps(update ino 0x10000000001 11 seq 34 caps=pAsLsXs dirty=- wanted=pFscr follows 0 size 287/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 -- 0x7f1d1c009850 con 0x557c4d018dc0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 46 ==== client_caps(grant ino 0x10000000001 11 seq 35 caps=pAsLsXsFc dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c00a1e0 con 0x557c4d018dc0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 32
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 35 caps now pAsLsXsFc was pAsLsXs
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=287/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXs ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 size 287 -> 294
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278   grant, new caps are Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 47 ==== client_caps(revoke ino 0x10000000001 11 seq 36 caps=pAsLsXs dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c009850 con 0x557c4d018dc0
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 33
2020-01-21T17:44:33.347+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 36 caps now pAsLsXs was pAsLsXsFc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXsFc(0=pAsLsXsFc) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXsFc ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278   revocation of Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278 check_caps on 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) wanted pFscr used - issued pAsLsXs revoking Fc flags=0
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278  cap mds.0 issued pAsLsXs implemented pAsLsXsFc revoking Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278 completed revocation of Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278 send_cap 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) mds.0 seq 36 used - want pFscr flush - retain pAsxLsxXsxFsxrwbl held pAsLsXsFc revoking Fc dropping -
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 15 client.4278 auth cap, setting max_size = 0
2020-01-21T17:44:33.347+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 --> [v2:127.0.0.1:6854/1480111679,v1:127.0.0.1:6855/1480111679] -- client_caps(update ino 0x10000000001 11 seq 36 caps=pAsLsXs dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 -- 0x7f1d1c009dd0 con 0x557c4d018dc0
2020-01-21T17:44:33.349+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 48 ==== client_caps(grant ino 0x10000000001 11 seq 37 caps=pAsLsXsFrw dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c009dd0 con 0x557c4d018dc0
2020-01-21T17:44:33.349+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 34
2020-01-21T17:44:33.349+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 37 caps now pAsLsXsFrw was pAsLsXs
2020-01-21T17:44:33.349+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXs ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.349+0000 7f1d0a7fc700 10 client.4278   grant, new caps are Frw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 49 ==== client_caps(revoke ino 0x10000000001 11 seq 38 caps=pAsLsXsFr dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c004260 con 0x557c4d018dc0
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 35
2020-01-21T17:44:33.350+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 38 caps now pAsLsXsFr was pAsLsXsFrw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXsFrw(0=pAsLsXsFrw) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXsFrw ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278   revocation of Fw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278 check_caps on 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXsFr(0=pAsLsXsFr) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) wanted pFscr used - issued pAsLsXsFr revoking Fw flags=0
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278  cap mds.0 issued pAsLsXsFr implemented pAsLsXsFrw revoking Fw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278 completed revocation of Fw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278 send_cap 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXsFr(0=pAsLsXsFr) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) mds.0 seq 38 used - want pFscr flush - retain pAsxLsxXsxFsxcrbl held pAsLsXsFrw revoking Fw dropping -
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 15 client.4278 auth cap, setting max_size = 0
2020-01-21T17:44:33.350+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 --> [v2:127.0.0.1:6854/1480111679,v1:127.0.0.1:6855/1480111679] -- client_caps(update ino 0x10000000001 11 seq 38 caps=pAsLsXsFr dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 -- 0x7f1d1c009dd0 con 0x557c4d018dc0

There's a lot of back-and-forth on the caps but the highlight here to me is that "grant, new caps are Fc". The standby client first notes it wants those caps as part of opening the file just prior to setfilelock, on the MDS:

2020-01-21T17:44:25.829+0000 7f651abf0700  7 mds.0.server dispatch_client_request client_request(client.4278:7 open #0x10000000001 2020-01-21T17:44:25.829383+0000 caller_uid=1156, caller_gid=1156{1001,1156,}) v4
...
2020-01-21T17:44:25.830+0000 7f651abf0700 20 mds.0.locker  client.4278 pending pAsLsXs allowed pAsLsXs wanted pFscr
2020-01-21T17:44:25.830+0000 7f651abf0700 10 mds.0.locker eval done
2020-01-21T17:44:25.830+0000 7f651abf0700 12 mds.0.server open issued caps pAsLsXs for client.4278 on [inode 0x10000000001 [2,head] /foo/bar1 auth v96 ap=1 dirtyparent s=252 n(v0 rc2020-01-21T17:44:25.531606+0000 b252 1=1+0) (ifile excl->sync) (iversion lock) cr={4274=0-4194304@1} caps={4274=pAsLsXsFscr/pAsLsXsFsxcrwb/pFscr@9,4278=pAsLsXs/pFscr@3},l=4274(-1) | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55f2eda8c800]
2020-01-21T17:44:25.830+0000 7f651abf0700 20 mds.0.bal hit_dir 0 pop is 1.99945, frag * size 1 [pop IRD:[C 2.00e+00] IWR:[C 9.31e+00] RDR:[C 0.00e+00] FET:[C 0.00e+00] STR:[C 0.00e+00] *LOAD:20.6]
2020-01-21T17:44:25.830+0000 7f651abf0700  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.4278:7 open #0x10000000001 2020-01-21T17:44:25.829383+0000 caller_uid=1156, caller_gid=1156{1001,1156,}) v4
...
2020-01-21T17:44:25.833+0000 7f651abf0700  1 -- [v2:127.0.0.1:6854/1480111679,v1:127.0.0.1:6855/1480111679] <== client.4278 127.0.0.1:0/361750000 16 ==== client_request(client.4278:8 setfilelock rule 2, type 2, owner 17160986462880593306, pid 0, start 0, length 0, wait 1 #0x10000000001 2020-01-21T17:44:25.832830+0000 caller_uid=1156, caller_gid=1156{1001,1156,}) v4 ==== 152+0+0 (crc 0 0 0) 0x55f2edb8ec00 con 0x55f2edb31680

So, why on earth is the MDS issuing Fc to the standby client when the already-active client wants Fw? Perhaps it's evaluating based on what is actually granted?


Related issues 1 (1 open0 closed)

Copied from CephFS - Bug #43748: client: improve wanted handling so we don't request unused caps (active-standby exclusive file lock case)Fix Under ReviewZheng Yan

Actions
Actions #1

Updated by Greg Farnum about 4 years ago

  • Copied from Bug #43748: client: improve wanted handling so we don't request unused caps (active-standby exclusive file lock case) added
Actions #2

Updated by Greg Farnum about 4 years ago

From the original ticket:
Jeff Layton wrote:

Greg pointed out some things in a face-to-face discussion the other day that lead me to question whether this ought to be a client-side bug.

[...]

Why are Frw being revoked here? We aren't changing the layout of the file, so that seems a bit strange.

There may be some routine things happening that will do this periodically and briefly (I think some metadata flushes such as mtime technically require this, although we've relaxed consistency for most of them IIRC), but I agree that in this context it looks strange.

It's probably fine to hand out Fc caps to a singleton client that has the file open O_RDWR, but once the second client opens the file with O_RDWR (or O_WRONLY) Fc should be revoked and should not be handed out again until there aren't multiple writers.

ISTM that the client ought to be free to set "wanted" in its mask as it sees fit without much regard for other clients, and that it's up to the MDS to mediate "disputes" between them.

Yes, that all sounds correct to me!

Actions #3

Updated by Greg Farnum about 4 years ago

Okay, I dove into this a bit today. No final conclusions but reminding myself about how some of this works and several points of interest:
As written, that flock command involves opening the bar1 file, checking the size, and appending to the end of it every second.
On the MDS, part of that process is a lookup op which is responsible for most of the cap churn — the MDS has to revoke Fw from the "primary" client, then issues it back Fscr as part of the lookup reply. But then for some reason (not fully explored) the inode filelock gets moved into the LOCK_SYNC_LOCK state, and that only allows Fc caps — so the primary client gets a revoke of Fsr, but the secondary client gets a grant of Fc even though it didn't start with them (they are in the wanted set)! I'm not certain if this harms anything directly (at the end of this process the secondary end up with Fr) but it certainly doesn't help.

When you kill the secondary, it holds some cap and isn't available to acknowledge the revoke, so you have to wait through the blacklist timeout. There's not a great solution to this other than somehow blocking it from holding any caps at all...

==================

However, things are quite different if we instead set the tee operation to merely overwrite. We still get the 1-minute hang when killing the secondary process, but the cap cycling is a lot simpler and might be easier to adjust/fix.
I believe the differences are rooted in the active client not needing to get a file size, but merely issuing blind writes to offset 0. It looks like in that case, there is a small cap loop on the events but the main problem is that the secondary client wants Fscr, yet is GRANTED Frw as part of the active client's updates!
I haven't diagnosed exactly what's going on there but it looks like the CInode::get_caps_wanted() interfaces are incorrectly compressing the client wanted caps into a single mask which is then used to grant caps back out to everybody with any wanted bits.

Actions #4

Updated by Zheng Yan about 4 years ago

Greg Farnum wrote:

Okay, I dove into this a bit today. No final conclusions but reminding myself about how some of this works and several points of interest:
As written, that flock command involves opening the bar1 file, checking the size, and appending to the end of it every second.
On the MDS, part of that process is a lookup op which is responsible for most of the cap churn — the MDS has to revoke Fw from the "primary" client, then issues it back Fscr as part of the lookup reply. But then for some reason (not fully explored) the inode filelock gets moved into the LOCK_SYNC_LOCK state, and that only allows Fc caps — so the primary client gets a revoke of Fsr, but the secondary client gets a grant of Fc even though it didn't start with them (they are in the wanted set)! I'm not certain if this harms anything directly (at the end of this process the secondary end up with Fr) but it certainly doesn't help.

When you kill the secondary, it holds some cap and isn't available to acknowledge the revoke, so you have to wait through the blacklist timeout. There's not a great solution to this other than somehow blocking it from holding any caps at all...

==================

However, things are quite different if we instead set the tee operation to merely overwrite. We still get the 1-minute hang when killing the secondary process, but the cap cycling is a lot simpler and might be easier to adjust/fix.
I believe the differences are rooted in the active client not needing to get a file size, but merely issuing blind writes to offset 0. It looks like in that case, there is a small cap loop on the events but the main problem is that the secondary client wants Fscr, yet is GRANTED Frw as part of the active client's updates!
I haven't diagnosed exactly what's going on there but it looks like the CInode::get_caps_wanted() interfaces are incorrectly compressing the client wanted caps into a single mask which is then used to grant caps back out to everybody with any wanted bits.

It's because of CInode::get_caps_liked(). mds always issue any file caps allowed lock state to client.

Actions #5

Updated by Patrick Donnelly about 4 years ago

  • Assignee set to Greg Farnum
Actions #6

Updated by Patrick Donnelly about 4 years ago

  • Target version changed from v15.0.0 to v16.0.0
  • Backport set to octopus,nautilus
Actions #7

Updated by Patrick Donnelly over 3 years ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport changed from octopus,nautilus to pacific,octopus,nautilus
Actions #8

Updated by Patrick Donnelly over 2 years ago

  • Status changed from New to Triaged
  • Assignee changed from Greg Farnum to Xiubo Li
  • Backport changed from pacific,octopus,nautilus to pacific,octopus
Actions #9

Updated by Xiubo Li over 2 years ago

Whenever switching to a different lock state the MDS will try to issue the allowed caps to the clients, even some caps it not wanted by the clients.

So when the filelock state is in LOCK_SYNC_LOCK(ifile sync->lock) interim state:

```
[LOCK_SYNC_LOCK] = { LOCK_LOCK, false, LOCK_LOCK, ANY, 0, REQ, 0, 0, 0, 0, CEPH_CAP_GCACHE,0,0,0 },
```

The "CEPH_CAP_GCACHE" is allowed and then the Fc cap will be issued to both the primary and standby clients, even the standby client has died because the MDS doesn't know yet. Then the caps to standby client will be something likes "4260=pAsLsXsFc/pAsLsXsFcr/pFscr@135":

```
2021-09-07T09:04:53.204+0800 153bfa680700 10 mds.0.cache.ino(0x10000000000) auth_unpin by 0x55f05e580b40 on [inode 0x10000000000 [2,head] /bar1 auth v1168 ap=1 DIRTYPARENT s=5077 n(v0 rc2021-09-07T09:04:52.128215+0800 b5077 1=1+0) (ifile sync->lock) (iversion lock) cr={4254=0-4194304@1 } caps={4254=pAsLsXsFc/pAsLsXsFscr/pFscr@244,4260=pAsLsXsFc/pAsLsXsFcr/pFscr@135} | ptrwaiter=2 request=1 lock=0 caps=1 dirtyparent=1 dirty=1 w aiter=1 authpin=1 0x55f05e1cb600] now 1
```

So when the primary client wants to get the Fw cap and in the corresponding lock state, it will revoke the Fc cap back from the standby client, but this will be stuck until the stale cap to the standby client is force revoked.

The question is why the ifile lock state was switched to LOCK_SYNC_LOCK(ifile sync->lock) interim state, that is because after the "lookup" replied, the primary client will update caps to MDS and in MDS it will update the max size, which need to take the wr lock for the ifile lock. The stable LOCK_SYNC lock state needs to do the state transition to LOCK_LOCK, because the wr lock is not allowed in LOCK_SYNC stable state.

So when the "ifile" lock is in the LOCK_SYNC_LOCK interim state, the standby client dies, it will cause the primary client to be stuck by waiting the MDS, who is waiting for the Fc is revoked from standby client, to grant the Fw cap to it.

I just have a test that do not allow the LOCK_SYNC_LOCK interim state issue the Fc cap to clients, I will hit a new and similar Fr cap revoking stuck issue in several cases in other interim states, such as the LOCK_SYNC_MIX interim state. All this cases, which are inevitable to do the lock state transition, will cause the primary client to be stuck.

Even after the standby client dies the MDS could issue new caps when the filelock state changes, and after that if the filelock change again and needs to revoke some caps from standby client, it won't get any response from it.

So IMO if we want to fix this in MDS side we may need to re-code the ifile lock's transition mechanism a lot, which seems much risky.

Actions #10

Updated by Xiubo Li over 2 years ago

Could we use the socket keepalive to detect the socket connection peer's aliveness in the caps_tick() if there has revoking caps delayed for more than 5 seconds ?

If we can detect that it died, so just try to force remove the stale caps ?

Will this make sense ? Or will it cause any potential issue ?

Actions #11

Updated by Greg Farnum over 2 years ago

Xiubo Li wrote:

Whenever switching to a different lock state the MDS will try to issue the allowed caps to the clients, even some caps it not wanted by the clients.

So when the filelock state is in LOCK_SYNC_LOCK(ifile sync->lock) interim state:

```
[LOCK_SYNC_LOCK] = { LOCK_LOCK, false, LOCK_LOCK, ANY, 0, REQ, 0, 0, 0, 0, CEPH_CAP_GCACHE,0,0,0 },
```

The "CEPH_CAP_GCACHE" is allowed and then the Fc cap will be issued to both the primary and standby clients, even the standby client has died because the MDS doesn't know yet. Then the caps to standby client will be something likes "4260=pAsLsXsFc/pAsLsXsFcr/pFscr@135":

```
2021-09-07T09:04:53.204+0800 153bfa680700 10 mds.0.cache.ino(0x10000000000) auth_unpin by 0x55f05e580b40 on [inode 0x10000000000 [2,head] /bar1 auth v1168 ap=1 DIRTYPARENT s=5077 n(v0 rc2021-09-07T09:04:52.128215+0800 b5077 1=1+0) (ifile sync->lock) (iversion lock) cr={4254=0-4194304@1 } caps={4254=pAsLsXsFc/pAsLsXsFscr/pFscr@244,4260=pAsLsXsFc/pAsLsXsFcr/pFscr@135} | ptrwaiter=2 request=1 lock=0 caps=1 dirtyparent=1 dirty=1 w aiter=1 authpin=1 0x55f05e1cb600] now 1
```

So when the primary client wants to get the Fw cap and in the corresponding lock state, it will revoke the Fc cap back from the standby client, but this will be stuck until the stale cap to the standby client is force revoked.

The question is why the ifile lock state was switched to LOCK_SYNC_LOCK(ifile sync->lock) interim state, that is because after the "lookup" replied, the primary client will update caps to MDS and in MDS it will update the max size, which need to take the wr lock for the ifile lock. The stable LOCK_SYNC lock state needs to do the state transition to LOCK_LOCK, because the wr lock is not allowed in LOCK_SYNC stable state.

So when the "ifile" lock is in the LOCK_SYNC_LOCK interim state, the standby client dies, it will cause the primary client to be stuck by waiting the MDS, who is waiting for the Fc is revoked from standby client, to grant the Fw cap to it.

I just have a test that do not allow the LOCK_SYNC_LOCK interim state issue the Fc cap to clients, I will hit a new and similar Fr cap revoking stuck issue in several cases in other interim states, such as the LOCK_SYNC_MIX interim state. All this cases, which are inevitable to do the lock state transition, will cause the primary client to be stuck.

Well, Fr on one client won't prevent another client from doing writes, it will just force them both to do sync IO to the OSDs rather than caching or buffering, which is MUCH better than a hang. If removing the Fc allowance is safe and will actually fix this, we should do so.

However, IIRC those transitional states tend to have things like Fc allowed so that we can move into them while those caps are still issued, or to avoid forcing clients to drop full caches on a simple file size extension. And so we probably need to allow Fc to be held throughout.

Can we instead just be less generous in issuing caps, and only do so if caps are part of a client's existing or wanted sets, rather than everything in allowed? (Or, be aggressive in issuing caps to loners, but conservative when there are multiple clients with caps on a file?) We probably aggressively issue allowed caps because we expect clients to do IO to files they have handles for, and it seemed like pre-issuing caps would reduce latency. But it's clearly not true in this case and I'm not sure we have any data indicating it's really better in general.

Xiubo Li wrote:

Could we use the socket keepalive to detect the socket connection peer's aliveness in the caps_tick() if there has revoking caps delayed for more than 5 seconds ?

If we can detect that it died, so just try to force remove the stale caps ?

Will this make sense ? Or will it cause any potential issue ?

I don't think that's safe. While clients should keep a connection established if they have caps issued from an MDS, their timeout period is part of the network contract and data we share with them. If there's a network issue, the client could have a disconnected socket which takes more than 5 seconds to re-establish.

Actions #12

Updated by Xiubo Li over 2 years ago

Greg Farnum wrote:

Xiubo Li wrote:

Whenever switching to a different lock state the MDS will try to issue the allowed caps to the clients, even some caps it not wanted by the clients.

So when the filelock state is in LOCK_SYNC_LOCK(ifile sync->lock) interim state:

```
[LOCK_SYNC_LOCK] = { LOCK_LOCK, false, LOCK_LOCK, ANY, 0, REQ, 0, 0, 0, 0, CEPH_CAP_GCACHE,0,0,0 },
```

The "CEPH_CAP_GCACHE" is allowed and then the Fc cap will be issued to both the primary and standby clients, even the standby client has died because the MDS doesn't know yet. Then the caps to standby client will be something likes "4260=pAsLsXsFc/pAsLsXsFcr/pFscr@135":

```
2021-09-07T09:04:53.204+0800 153bfa680700 10 mds.0.cache.ino(0x10000000000) auth_unpin by 0x55f05e580b40 on [inode 0x10000000000 [2,head] /bar1 auth v1168 ap=1 DIRTYPARENT s=5077 n(v0 rc2021-09-07T09:04:52.128215+0800 b5077 1=1+0) (ifile sync->lock) (iversion lock) cr={4254=0-4194304@1} caps={4254=pAsLsXsFc/pAsLsXsFscr/pFscr@244,4260=pAsLsXsFc/pAsLsXsFcr/pFscr@135} | ptrwaiter=2 request=1 lock=0 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x55f05e1cb600] now 1
```

So when the primary client wants to get the Fw cap and in the corresponding lock state, it will revoke the Fc cap back from the standby client, but this will be stuck until the stale cap to the standby client is force revoked.

The question is why the ifile lock state was switched to LOCK_SYNC_LOCK(ifile sync->lock) interim state, that is because after the "lookup" replied, the primary client will update caps to MDS and in MDS it will update the max size, which need to take the wr lock for the ifile lock. The stable LOCK_SYNC lock state needs to do the state transition to LOCK_LOCK, because the wr lock is not allowed in LOCK_SYNC stable state.

So when the "ifile" lock is in the LOCK_SYNC_LOCK interim state, the standby client dies, it will cause the primary client to be stuck by waiting the MDS, who is waiting for the Fc is revoked from standby client, to grant the Fw cap to it.

I just have a test that do not allow the LOCK_SYNC_LOCK interim state issue the Fc cap to clients, I will hit a new and similar Fr cap revoking stuck issue in several cases in other interim states, such as the LOCK_SYNC_MIX interim state. All this cases, which are inevitable to do the lock state transition, will cause the primary client to be stuck.

Well, Fr on one client won't prevent another client from doing writes, it will just force them both to do sync IO to the OSDs rather than caching or buffering, which is MUCH better than a hang. If removing the Fc allowance is safe and will actually fix this, we should do so.

Yeah, that's right. The problem is that in MDS the filelock transition will be stuck in interim state of LOCK_SYNC_LOCK when doing lookup and changing the max size range just before the primary client sends the open request with requiring the Fw caps.

I have tried to issue the caps that only needed|wanted, still the same and will be stuck:

Please see the following logs, I just killed the "client.4230" client and trying to revoke the Fr caps and stuck:

 13610 2021-10-25T11:28:03.180+0800 7f93451e7640 10 MDSContext::complete: 18C_MDL_CheckMaxSize
 13611 2021-10-25T11:28:03.180+0800 7f93451e7640 10 mds.0.locker check_inode_max_size new_ranges 1 update_size 0 on [inode 0x10000000000 [2,head] /bar1 auth v84 ap=1 DIRTYPARENT s=105 n(v0 rc2021-10-25T11:28:02.154463+0800 b105 1=1+0) (iversion lock) cr={4252=0-4194304@1} caps={4230=pFr/pFscr@48,4252=pAsLsXsFscr/pFscr@126} | ptrwaiter=2 request=1 lock=0 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x55594c5dcb00]
 13612 2021-10-25T11:28:03.180+0800 7f93451e7640  7 mds.0.locker simple_lock on (ifile sync) on [inode 0x10000000000 [2,head] /bar1 auth v84 ap=1 DIRTY PARENT s=105 n(v0 rc2021-10-25T11:28:02.154463+0800 b105 1=1+0) (iversion lock) cr={4252=0-4194304@1} caps= 4230=pFr/pFscr@48,4252=pAsLsXsFscr/pFscr@126} | ptrwaiter=2 request=1 lock=0 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x55594c5dcb00]
 13613 2021-10-25T11:28:03.180+0800 7f93451e7640  7 mds.0.locker issue_caps allowed=pAsLsXsFc, xlocker allowed=pAsLsXsFc on [inode 0x10000000000 [2,head] /bar1 auth v84 ap=1 DIRTYPARENT s=105 n(v0 rc2021-10-25T11:28:02.154463+0800 b105 1=1+0) (ifile sync->lock) (iversion lock) cr={4252=0-4194304@1} caps={4230=pFr/pFscr@48,4252=pAsLsXsFscr/pFscr@126} | ptrwaiter=2 request=1 lock=0 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x55594c5dcb00]
 13614 2021-10-25T11:28:03.180+0800 7f93451e7640 20 mds.0.locker  client.4230 pending pFr allowed pAsLsXsFc wanted pFscr
 13615 2021-10-25T11:28:03.180+0800 7f93451e7640  7 mds.0.locker     sending MClientCaps to client.4230 seq 49 new pending p was pFr
 13616 2021-10-25T11:28:03.180+0800 7f93451e7640 20 mds.0.cache.ino(0x10000000000) encode_cap_message pfile 0 pauth 0 plink 0 pxattr 0 ctime 2021-10-25T11:28:02.154463+0800
 13617 2021-10-25T11:28:03.180+0800 7f93451e7640 10 mds.0.4 send_message_client_counted client.4230 seq 48 client_caps(revoke ino 0x10000000000 3 seq 49 caps=p dirty=- wanted=pFscr follows 0 size 105/0 ts 1/18446744073709551615 mtime 2021-10-25T11:28:02.154463+0800) v11
 13618 2021-10-25T11:28:03.180+0800 7f93451e7640  1 -- v1:192.168.0.105:6814/3831691734 --> v1:192.168.0.105:0/3709597457 -- client_caps(revoke ino 0x10000000000 3 seq 49 caps=p dirty=- wanted=pFscr follows 0 size 105/0 ts 1/18446744073709551615 mtime 2021-10-25T11:28:02.154463+0800) v11 -- 0x55594c6bf200 con 0x55594c627000
 13619 2021-10-25T11:28:03.180+0800 7f93451e7640 20 mds.0.locker  client.4252 pending pAsLsXsFscr allowed pAsLsXsFc wanted pFscr
 13620 2021-10-25T11:28:03.180+0800 7f93451e7640  7 mds.0.locker    sending MClientCaps to client.4252 seq 127 new pending pFc was pAsLsXsFscr
 13621 2021-10-25T11:28:03.180+0800 7f93451e7640 20 mds.0.cache.ino(0x10000000000) encode_cap_message pfile 0 pauth 0 plink 0 pxattr 0 ctime 2021-10-25T11:28:02.154463+0800
 13622 2021-10-25T11:28:03.180+0800 7f93451e7640 10 mds.0.4 send_message_client_counted client.4252 seq 92 client_caps(revoke ino 0x10000000000 1 seq 127 caps=pFc dirty=- wanted=pFscr follows 0 size 105/4194304 ts 1/18446744073709551615 mtime 2021-10-25T11:28:02.154463+0800) v11
 13623 2021-10-25T11:28:03.180+0800 7f93451e7640  1 -- v1:192.168.0.105:6814/3831691734 --> v1:192.168.0.105:0/49034973 -- client_caps(revoke ino 0x10000000000 1 seq 127 caps=pFc dirty=- wanted=pFscr follows 0 size 105/4194304 ts 1/18446744073709551615 mtime 2021-10-25T11:28:02.154463+0800) v11-- 0x55594c6bf500 con 0x55594c625000                                                                                             

Since the "client.4230" has been dead and when "client.4252" tries to open the file and it succeeds:

 13713 2021-10-25T11:28:03.181+0800 7f934b1f3640  1 -- v1:192.168.0.105:6814/3831691734 <== client.4252 v1:192.168.0.105:0/49034973 161 ==== client_request(client.4252:55 open #0x10000000000 2021-10-25T11:28:03.181240+0800 caller_uid=1000, caller_gid=1000{10,1000,}) v5 ==== 156+0+0 (unknown
 3288560875 0 0) 0x55594c4b6b00 con 0x55594c625000                                                                                                  
 13714 2021-10-25T11:28:03.181+0800 7f934b1f3640  4 mds.0.server handle_client_request client_request(client.4252:55 open #0x10000000000 2021-10-25T11:28:03.181240+0800 caller_uid=1000, caller_gid=1000{10,1000,}) v5
 13715 2021-10-25T11:28:03.181+0800 7f934b1f3640 20 mds.0.4 get_session have 0x55594c603900 client.4252 v1:192.168.0.105:0/49034973 state open
 13716 2021-10-25T11:28:03.181+0800 7f934b1f3640 15 mds.0.server  oldest_client_tid=54
 13717 2021-10-25T11:28:03.181+0800 7f934b1f3640  7 mds.0.cache request_start request(client.4252:55 nref=2 cr=0x55594c4b6b00)
 13718 2021-10-25T11:28:03.181+0800 7f934b1f3640  7 mds.0.server dispatch_client_request client_request(client.4252:55 open #0x10000000000 2021-10-25T11:28:03.181240+0800 caller_uid=1000, caller_gid=1000{10,1000,}) v5
 13719 2021-10-25T11:28:03.181+0800 7f934b1f3640  7 mds.0.server open on #0x10000000000
 13720 2021-10-25T11:28:03.181+0800 7f934b1f3640 10 mds.0.server rdlock_path_pin_ref request(client.4252:55 nref=2 cr=0x55594c4b6b00) #0x10000000000
 13721 2021-10-25T11:28:03.181+0800 7f934b1f3640  7 mds.0.cache traverse: opening base ino 0x10000000000 snap head
 13722 2021-10-25T11:28:03.181+0800 7f934b1f3640 10 mds.0.locker try_rdlock_snap_layout request(client.4252:55 nref=3 cr=0x55594c4b6b00) [inode 0x10000000000 [2,head] /bar1 auth v84 ap=1 DIRTYPARENT s=105 n(v0 rc2021-10-25T11:28:02.154463+0800 b105 1=1+0) (ifile sync->lock) (iversion lock) cr={4252=0-4194304@1} caps={4230=pFc/pFcr/pFscr@50,4252=pAsLsXsFc/pFscr@128} | ptrwaiter=2 request=1 lock=0 caps=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x55594c5dcb00]

...

 13744 2021-10-25T11:28:03.181+0800 7f934b1f3640  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.4252:55 open #0x10000000000 2021-10-25T11:28:03.181240+0800 caller_uid=1000, caller_gid=1000{10,1000,}) v5

But the the "client.4252" will be stuck by waiting for Fw caps, which in the MDS side it's stuck in doing the filelock transition:

 4492 2021-10-25T11:28:03.181+0800 7f6f777fe640 10 client.4252 send_request client_request(unknown.0:55 open #0x10000000000 2021-10-25T11:28:03.181240+0800 caller_uid=1000, caller_gid=1000{10,1000,}) v5 to mds.0
...
 4497 2021-10-25T11:28:03.181+0800 7f6f7effd640  1 -- 192.168.0.105:0/49034973 <== mds.0 v1:192.168.0.105:6814/3831691734 151 ==== client_reply(???:55 = 0 (0) Success) v1 ==== 397+0+0 (unknown 220954709 0 0) 0x7f6f88009300 con 0x55c960c93e20                                                       
 4498 2021-10-25T11:28:03.181+0800 7f6f7effd640 20 client.4252 handle_client_reply got a reply. Safe:1 tid 55

...

 4963 2021-10-25T11:29:05.408+0800 7f6f7effd640  1 -- 192.168.0.105:0/49034973 <== mds.0 v1:192.168.0.105:6814/3831691734 156 ==== client_caps(revoke ino 0x10000000000 1 seq 131 caps=pAsXs dirty=- wanted=pAsxXsxFsxcrwb follows 0 size 105/4194304 ts 1/18446744073709551615 mtime 2021-10-25T11:28:02.154463+0800) v11 ==== 252+0+0 (unknown 1169439206 0 0) 0x7f6f88009760 con 0x55c960c93e20
 4964 2021-10-25T11:29:05.408+0800 7f6f7effd640 10 client.4252  mds.0 seq now 93
 4965 2021-10-25T11:29:05.408+0800 7f6f7effd640  5 client.4252 handle_cap_grant on in 0x10000000000 mds.0 seq 131 caps now pAsXs was pAsLsXsFc
 4966 2021-10-25T11:29:05.408+0800 7f6f7effd640 10 client.4252 update_inode_file_time 0x10000000000.head(faked_ino=0 nref=5 ll_ref=19 cap_refs={4=0,1024=0,4096=0,8192=0} open={1=1,2=1} mode=100664 size=105/4194304 nlink=1 btime=2021-10-25T11:27:45.802019+0800 mtime=2021-10-25T11:28:02.154463+080      0 ctime=2021-10-25T11:28:02.154463+0800 caps=pAsLsXsFc(0=pAsLsXsFc) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["bar1"] 0x7f6f80009b40) pAsLsXsFc ctime 2021-10-25T11:28:02.154463+0800 mtime 2021-10-25T11:28:02.154463+0800
 4967 2021-10-25T11:29:05.408+0800 7f6f7effd640 10 client.4252   revocation of LsFc
 4968 2021-10-25T11:29:05.408+0800 7f6f7effd640 10 client.4252 check_caps on 0x10000000000.head(faked_ino=0 nref=5 ll_ref=19 cap_refs={4=0,1024=0,4096=0,8192=0} open={1=1,2=1} mode=100664 size=105/4194304 nlink=1 btime=2021-10-25T11:27:45.802019+0800 mtime=2021-10-25T11:28:02.154463+0800 ctime=2021-10-25T11:28:02.154463+0800 caps=pAsXs(0=pAsXs) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["bar1"] 0x7f6f80009b40) wanted pAsxXsxFsxcrwb used - issued pAsXs revoking LsFc flags=0
 4969 2021-10-25T11:29:05.408+0800 7f6f7effd640 10 client.4252  cap mds.0 issued pAsXs implemented pAsLsXsFc revoking LsFc
 4970 2021-10-25T11:29:05.408+0800 7f6f7effd640 10 client.4252 completed revocation of LsFc
 4971 2021-10-25T11:29:05.408+0800 7f6f7effd640 10 client.4252 send_cap 0x10000000000.head(faked_ino=0 nref=5 ll_ref=19 cap_refs={4=0,1024=0,4096=0,8192=0} open={1=1,2=1} mode=100664 size=105/4194304 nlink=1 btime=2021-10-25T11:27:45.802019+0800 mtime=2021-10-25T11:28:02.154463+0800 ctime=2021-10-25T11:28:02.154463+0800 caps=pAsXs(0=pAsXs) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["bar1"] 0x7f6f80009b40) mds.0 seq 131 used - want pAsxXsxFsxcrwb flush - retain pAsxLxXsxFsxrwbl held pAsLsXsFc revoking LsFc
 dropping -
 4972 2021-10-25T11:29:05.408+0800 7f6f7effd640 15 client.4252 auth cap, requesting max_size 7
 4973 2021-10-25T11:29:05.408+0800 7f6f7effd640  1 -- 192.168.0.105:0/49034973 --> v1:192.168.0.105:6814/3831691734 -- client_caps(update ino 0x10000000000 1 seq 131 caps=pAsXs dirty=- wanted=pAsxXsxFsxcrwb follows 0 size 105/7 ts 1/18446744073709551615 mtime 2021-10-25T11:28:02.154463+0800) v11 -- 0x7f6f8800a210 con 0x55c960c93e20
 4974 2021-10-25T11:29:05.408+0800 7f6f777fe640 10 client.4252 waiting for caps 0x10000000000.head(faked_ino=0 nref=5 ll_ref=19 cap_refs={4=0,1024=0,4096=0,8192=0} open={1=1,2=1} mode=100664 size=105/4194304 nlink=1 btime=2021-10-25T11:27:45.802019+0800 mtime=2021-10-25T11:28:02.154463+0800 ctime=2021-10-25T11:28:02.154463+0800 caps=pAsXs(0=pAsXs) objectset[0x10000000000 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1.head["bar1"] 0x7f6f8000      9b40) need AsFw want Fb
 4975 2021-10-25T11:29:05.408+0800 7f6f7effd640  1 -- 192.168.0.105:0/49034973 <== mds.0 v1:192.168.0.105:6814/3831691734 157 ==== client_caps(grant ino 0x10000000000 1 seq 132 caps=pAsxXsxFrw dirty=- wanted=pAsxXsxFsxcrwb follows 0 size 105/4194304 ts 1/18446744073709551615 mtime 2021-10-25T11:      28:02.154463+0800) v11 ==== 252+0+0 (unknown 2243220156 0 0) 0x7f6f88009300 con 0x55c960c93e20 

However, IIRC those transitional states tend to have things like Fc allowed so that we can move into them while those caps are still issued, or to avoid forcing clients to drop full caches on a simple file size extension. And so we probably need to allow Fc to be held throughout.

Yeah, that's reasonable.
In each transitional state, it will always issue all the allowed & (wanted | liked) caps, that means for files if Fc is allowed it will always issue it to all the clients, no matter the clients need it or not.

Can we instead just be less generous in issuing caps, and only do so if caps are part of a client's existing or wanted sets, rather than everything in allowed? (Or, be aggressive in issuing caps to loners, but conservative when there are multiple clients with caps on a file?) We probably aggressively issue allowed caps because we expect clients to do IO to files they have handles for, and it seemed like pre-issuing caps would reduce latency. But it's clearly not true in this case and I'm not sure we have any data indicating it's really better in general.

As above logs and my tests this still won't work. I will check more to see whether could we improve it in other methods.

Actions #13

Updated by Patrick Donnelly almost 2 years ago

  • Target version deleted (v17.0.0)
Actions

Also available in: Atom PDF