Project

General

Profile

Bug #43748

client: improve wanted handling so we don't request unused caps (active-standby exclusive file lock case)

Added by Patrick Donnelly about 1 month ago. Updated 15 days ago.

Status:
New
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature:

Description

In an active/standby configuration of two clients managed by file locks, the standby client causes unbuffered I/O on the active client. Additionally, and more importantly, failure of the standby client causes the active client I/O to freeze because the standby client receives Frw caps it cannot release.

This can be reproduced by starting two ceph-fuse clients. Then in the first "active" client, do:

flock -x bar1 -c 'date; echo enter; while sleep 1; do date; echo foobar | tee -a bar1; done'

Then in the "standby" client:

flock -x bar1 -c 'date; echo 2 enter; sleep 30; date; echo 2 bye;'

Note: the second client should never print output because it cannot obtain the exclusive lock on bar1.

Now, send SIGKILL to the standby ceph-fuse client. This will cause I/O to halt for the first client until the MDS autocloses the "standby" ceph-fuse client session after 60s.

Here's the log from the standby client:

2020-01-21T17:44:33.248+0000 7f1d0b7fe700 20 client.4278 trim_cache size 2 max 16384
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 45 ==== client_caps(revoke ino 0x10000000001 11 seq 34 caps=pAsLsXs dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c00a1e0 con 0x557c4d018dc0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 31
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 34 caps now pAsLsXs was pAsLsXsFr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=287/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:31.232141+0000 ctime=2020-01-21T17:44:31.232141+0000 caps=pAsLsXsFr(0=pAsLsXsFr) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXsFr ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278   revocation of Fr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 check_caps on 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=287/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) wanted pFscr used - issued pAsLsXs revoking Fr flags=0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278  cap mds.0 issued pAsLsXs implemented pAsLsXsFr revoking Fr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 completed revocation of Fr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 send_cap 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=287/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) mds.0 seq 34 used - want pFscr flush - retain pAsxLsxXsxFsxcwbl held pAsLsXsFr revoking Fr dropping -
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 15 client.4278 auth cap, setting max_size = 0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 --> [v2:127.0.0.1:6854/1480111679,v1:127.0.0.1:6855/1480111679] -- client_caps(update ino 0x10000000001 11 seq 34 caps=pAsLsXs dirty=- wanted=pFscr follows 0 size 287/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 -- 0x7f1d1c009850 con 0x557c4d018dc0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 46 ==== client_caps(grant ino 0x10000000001 11 seq 35 caps=pAsLsXsFc dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c00a1e0 con 0x557c4d018dc0
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 32
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 35 caps now pAsLsXsFc was pAsLsXs
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=287/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXs ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 size 287 -> 294
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278   grant, new caps are Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 47 ==== client_caps(revoke ino 0x10000000001 11 seq 36 caps=pAsLsXs dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c009850 con 0x557c4d018dc0
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 33
2020-01-21T17:44:33.347+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 36 caps now pAsLsXs was pAsLsXsFc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXsFc(0=pAsLsXsFc) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXsFc ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278   revocation of Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278 check_caps on 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) wanted pFscr used - issued pAsLsXs revoking Fc flags=0
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278  cap mds.0 issued pAsLsXs implemented pAsLsXsFc revoking Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278 completed revocation of Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278 send_cap 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) mds.0 seq 36 used - want pFscr flush - retain pAsxLsxXsxFsxrwbl held pAsLsXsFc revoking Fc dropping -
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 15 client.4278 auth cap, setting max_size = 0
2020-01-21T17:44:33.347+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 --> [v2:127.0.0.1:6854/1480111679,v1:127.0.0.1:6855/1480111679] -- client_caps(update ino 0x10000000001 11 seq 36 caps=pAsLsXs dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 -- 0x7f1d1c009dd0 con 0x557c4d018dc0
2020-01-21T17:44:33.349+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 48 ==== client_caps(grant ino 0x10000000001 11 seq 37 caps=pAsLsXsFrw dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c009dd0 con 0x557c4d018dc0
2020-01-21T17:44:33.349+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 34
2020-01-21T17:44:33.349+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 37 caps now pAsLsXsFrw was pAsLsXs
2020-01-21T17:44:33.349+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXs(0=pAsLsXs) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXs ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.349+0000 7f1d0a7fc700 10 client.4278   grant, new caps are Frw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 <== mds.0 v2:127.0.0.1:6854/1480111679 49 ==== client_caps(revoke ino 0x10000000001 11 seq 38 caps=pAsLsXsFr dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 ==== 252+0+0 (crc 0 0 0) 0x7f1d1c004260 con 0x557c4d018dc0
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278  mds.0 seq now 35
2020-01-21T17:44:33.350+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 38 caps now pAsLsXsFr was pAsLsXsFrw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278 update_inode_file_time 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXsFrw(0=pAsLsXsFrw) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) pAsLsXsFrw ctime 2020-01-21T17:44:32.297260+0000 mtime 2020-01-21T17:44:32.297260+0000
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278   revocation of Fw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278 check_caps on 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXsFr(0=pAsLsXsFr) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) wanted pFscr used - issued pAsLsXsFr revoking Fw flags=0
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278  cap mds.0 issued pAsLsXsFr implemented pAsLsXsFrw revoking Fw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278 completed revocation of Fw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278 send_cap 0x10000000001.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={1=1} mode=100640 size=294/0 nlink=1 btime=0.000000 mtime=2020-01-21T17:44:32.297260+0000 ctime=2020-01-21T17:44:32.297260+0000 caps=pAsLsXsFr(0=pAsLsXsFr) objectset[0x10000000001 ts 0/0 objects 0 dirty_or_tx 0] parents=0x10000000000.head["bar1"] 0x7f1cf4005a10) mds.0 seq 38 used - want pFscr flush - retain pAsxLsxXsxFsxcrbl held pAsLsXsFrw revoking Fw dropping -
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 15 client.4278 auth cap, setting max_size = 0
2020-01-21T17:44:33.350+0000 7f1d0a7fc700  1 -- 127.0.0.1:0/361750000 --> [v2:127.0.0.1:6854/1480111679,v1:127.0.0.1:6855/1480111679] -- client_caps(update ino 0x10000000001 11 seq 38 caps=pAsLsXsFr dirty=- wanted=pFscr follows 0 size 294/0 ts 1/18446744073709551615 mtime 2020-01-21T17:44:32.297260+0000) v11 -- 0x7f1d1c009dd0 con 0x557c4d018dc0

There's a lot of back-and-forth on the caps but the highlight here to me is "wanted=pFscr". The standby client first notes it wants those caps as part of opening the file just prior to setfilelock, on the MDS:

2020-01-21T17:44:25.829+0000 7f651abf0700  7 mds.0.server dispatch_client_request client_request(client.4278:7 open #0x10000000001 2020-01-21T17:44:25.829383+0000 caller_uid=1156, caller_gid=1156{1001,1156,}) v4
...
2020-01-21T17:44:25.830+0000 7f651abf0700 20 mds.0.locker  client.4278 pending pAsLsXs allowed pAsLsXs wanted pFscr
2020-01-21T17:44:25.830+0000 7f651abf0700 10 mds.0.locker eval done
2020-01-21T17:44:25.830+0000 7f651abf0700 12 mds.0.server open issued caps pAsLsXs for client.4278 on [inode 0x10000000001 [2,head] /foo/bar1 auth v96 ap=1 dirtyparent s=252 n(v0 rc2020-01-21T17:44:25.531606+0000 b252 1=1+0) (ifile excl->sync) (iversion lock) cr={4274=0-4194304@1} caps={4274=pAsLsXsFscr/pAsLsXsFsxcrwb/pFscr@9,4278=pAsLsXs/pFscr@3},l=4274(-1) | ptrwaiter=0 request=1 lock=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55f2eda8c800]
2020-01-21T17:44:25.830+0000 7f651abf0700 20 mds.0.bal hit_dir 0 pop is 1.99945, frag * size 1 [pop IRD:[C 2.00e+00] IWR:[C 9.31e+00] RDR:[C 0.00e+00] FET:[C 0.00e+00] STR:[C 0.00e+00] *LOAD:20.6]
2020-01-21T17:44:25.830+0000 7f651abf0700  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.4278:7 open #0x10000000001 2020-01-21T17:44:25.829383+0000 caller_uid=1156, caller_gid=1156{1001,1156,}) v4
...
2020-01-21T17:44:25.833+0000 7f651abf0700  1 -- [v2:127.0.0.1:6854/1480111679,v1:127.0.0.1:6855/1480111679] <== client.4278 127.0.0.1:0/361750000 16 ==== client_request(client.4278:8 setfilelock rule 2, type 2, owner 17160986462880593306, pid 0, start 0, length 0, wait 1 #0x10000000001 2020-01-21T17:44:25.832830+0000 caller_uid=1156, caller_gid=1156{1001,1156,}) v4 ==== 152+0+0 (crc 0 0 0) 0x55f2edb8ec00 con 0x55f2edb31680

So, I see two issues here:

- The client is saying it "wants" Fscr with no internal requests actually requiring those caps. In general, it's no problem for the client to start by asking for those caps as part of opening a file for the first time (indeed, it's an optimization!) but in this case it's hurting performance for the lifetime of the client. Question: does the kernel client behave the same way?

- The MDS is issuing caps based on "wanted" to the standby client blocking on setfilelock. I'm not sure if this is easily fixed in a way that doesn't break other things. I think ideally the MDS would ignore "wanted" for clients which do not hold the exclusive lock.


Related issues

Copied to fs - Bug #43960: MDS: incorrectly issues Fc for new opens when there is an existing writer New

History

#1 Updated by Patrick Donnelly about 1 month ago

  • Description updated (diff)

#2 Updated by Zheng Yan about 1 month ago

Now, send SIGKILL to the standby ceph-fuse client. This will cause I/O to halt for the first client until the MDS autocloses the "standby" ceph-fuse client session after 60s.

this is too brute. user should kill the standby application, instead of ceph-fuse.

- The client is saying it "wants" Fscr with no internal requests actually requiring those caps. In general, it's no problem for the client to start by asking for those caps as part of opening a file for the first time (indeed, it's an optimization!) but in this case it's hurting performance for the lifetime of the client. Question: does the kernel client behave the same way?

yes. One way to optimize this is record latest use time of caps. If a cap is never used or hasn't been used for a long time, client does not say it want this cap

- The MDS is issuing caps based on "wanted" to the standby client blocking on setfilelock. I'm not sure if this is easily fixed in a way that doesn't break other things. I think ideally the MDS would ignore "wanted" for clients which do not hold the exclusive lock.

mds does not know if filelock and read are correlated. So it can't ignore 'wanted'.

#3 Updated by Patrick Donnelly about 1 month ago

Zheng Yan wrote:

Now, send SIGKILL to the standby ceph-fuse client. This will cause I/O to halt for the first client until the MDS autocloses the "standby" ceph-fuse client session after 60s.

this is too brute. user should kill the standby application, instead of ceph-fuse.

The example use-case is an application built with active and standby instances on a part of the file system. If the machine the standby instance is on goes down, the active instance should not be disrupted.

- The client is saying it "wants" Fscr with no internal requests actually requiring those caps. In general, it's no problem for the client to start by asking for those caps as part of opening a file for the first time (indeed, it's an optimization!) but in this case it's hurting performance for the lifetime of the client. Question: does the kernel client behave the same way?

yes. One way to optimize this is record latest use time of caps. If a cap is never used or hasn't been used for a long time, client does not say it want this cap

This would be acceptable.

- The MDS is issuing caps based on "wanted" to the standby client blocking on setfilelock. I'm not sure if this is easily fixed in a way that doesn't break other things. I think ideally the MDS would ignore "wanted" for clients which do not hold the exclusive lock.

mds does not know if filelock and read are correlated. So it can't ignore 'wanted'.

Alright, makes sense.

#4 Updated by Jeff Layton 28 days ago

I see a couple of other potential fixes:

1) we could not ask for those caps on an OPEN/CREATE and just rely on the MDS to opportunistically hand those out if they are not contended.

2) we could ask for them, but clear them from the wanted set when the OPEN reply comes in. We might lose them before an actual read or write call happens, but if so that's not a huge problem.

#5 Updated by Patrick Donnelly 25 days ago

  • Subject changed from mds: synchronous I/O for two clients with file locks to client: synchronous I/O for two clients with file locks
  • Component(FS) deleted (MDS, kceph)

Changing this to just a client bug. The kernel driver issue will be tracked via: https://bugzilla.redhat.com/show_bug.cgi?id=1795309

#6 Updated by Jeff Layton 17 days ago

Greg pointed out some things in a face-to-face discussion the other day that lead me to question whether this ought to be a client-side bug.

[jlayton@tleilax ~]$ egrep '(grant|revocation)' /tmp/log | grep -v client_caps
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 34 caps now pAsLsXs was pAsLsXsFr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278   revocation of Fr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278 completed revocation of Fr
2020-01-21T17:44:33.343+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 35 caps now pAsLsXsFc was pAsLsXs
2020-01-21T17:44:33.343+0000 7f1d0a7fc700 10 client.4278   grant, new caps are Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 36 caps now pAsLsXs was pAsLsXsFc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278   revocation of Fc
2020-01-21T17:44:33.347+0000 7f1d0a7fc700 10 client.4278 completed revocation of Fc
2020-01-21T17:44:33.349+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 37 caps now pAsLsXsFrw was pAsLsXs
2020-01-21T17:44:33.349+0000 7f1d0a7fc700 10 client.4278   grant, new caps are Frw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700  5 client.4278 handle_cap_grant on in 0x10000000001 mds.0 seq 38 caps now pAsLsXsFr was pAsLsXsFrw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278   revocation of Fw
2020-01-21T17:44:33.350+0000 7f1d0a7fc700 10 client.4278 completed revocation of Fw

Why are Frw being revoked here? We aren't changing the layout of the file, so that seems a bit strange.

It's probably fine to hand out Fc caps to a singleton client that has the file open O_RDWR, but once the second client opens the file with O_RDWR (or O_WRONLY) Fc should be revoked and should not be handed out again until there aren't multiple writers.

ISTM that the client ought to be free to set "wanted" in its mask as it sees fit without much regard for other clients, and that it's up to the MDS to mediate "disputes" between them.

#7 Updated by Greg Farnum 17 days ago

  • Copied to Bug #43960: MDS: incorrectly issues Fc for new opens when there is an existing writer added

#8 Updated by Greg Farnum 17 days ago

  • Subject changed from client: synchronous I/O for two clients with file locks to client: improve wanted handling so we don't request unused caps (active-standby exclusive file lock case)

#9 Updated by Zheng Yan 15 days ago

most unexpected cap revokes were because of fuse API limitation. lookup and gettattr always want CEPH_STAT_CAP_INODE_ALL.

Also available in: Atom PDF