Bug #45935
closedmds: cap revoking requests didn't success when the client doing reconnection
0%
Description
The kclient node log:
53 2020-06-03T01:04:06.610312+00:00 smithi070 kernel: [ 825.023480] ceph: mds0 reconnect start 54 2020-06-03T01:04:06.658346+00:00 smithi070 kernel: [ 825.069176] ceph: mds0 reconnect success 55 2020-06-03T01:04:08.634315+00:00 smithi070 kernel: [ 827.045544] ceph: mds0 recovery completed
mds.c daemon send the revoke request:
181639 2020-06-03T01:04:08.637+0000 7fbd51f2a700 20 mds.0.locker client.4394 pending pAsLsXsFsc allowed pAsxLsXsxFrw wanted pAsxXsxFxcwb 181640 2020-06-03T01:04:08.637+0000 7fbd51f2a700 7 mds.0.locker sending MClientCaps to client.4394 seq 1 new pending pAsLsXs was pAsLsX sFsc 181641 2020-06-03T01:04:08.637+0000 7fbd51f2a700 20 mds.0.cache.ino(0x10000000392) encode_cap_message pfile 0 pauth 0 plink 0 pxattr 0 ctim e 2020-06-03T01:00:43.722299+0000 181642 2020-06-03T01:04:08.637+0000 7fbd51f2a700 10 mds.0.cache.ino(0x10000000392) including xattrs v 1 181643 2020-06-03T01:04:08.637+0000 7fbd51f2a700 10 mds.0.23 send_message_client_counted client.4394 seq 5 client_caps(revoke ino 0x1000000 0392 65 seq 1 caps=pAsLsXs dirty=- wanted=pAsxXsxFxcwb follows 0 size 104857600/213909504 ts 1/18446744073709551615 mtime 2020-06-03 T01:00:43.722299+0000 xattrs(v=1 l=4)) v11 181644 2020-06-03T01:04:08.637+0000 7fbd51f2a700 1 -- [v2:172.21.15.107:6834/313709009,v1:172.21.15.107:6835/313709009] --> v1:172.21.15.7 0:0/3065332795 -- client_caps(revoke ino 0x10000000392 65 seq 1 caps=pAsLsXs dirty=- wanted=pAsxXsxFxcwb follows 0 size 104857600/21 3909504 ts 1/18446744073709551615 mtime 2020-06-03T01:00:43.722299+0000 xattrs(v=1 l=4)) v11 -- 0x56150dcf3c00 con 0x56150c865800 181645 2020-06-03T01:04:08.637+0000 7fbd51f2a700 10 mds.0.locker eval done 181646 2020-06-03T01:04:08.637+0000 7fbd51f2a700 1 -- [v2:172.21.15.107:6834/313709009,v1:172.21.15.107:6835/313709009] --> [v2:172.21.15. 107:6802/15489,v1:172.21.15.107:6803/15489] -- osd_op(unknown.0.23:31 3.3 3:ddb4e4d7:::200.00000002:head [write 1516350~1094 in=1094 b] snapc 0=[] ondisk+write+known_if_redirected+full_force e26) v8 -- 0x56150d6172c0 con 0x56150c864400
mds.c received the cap update request from kclient, but didn't trigger cap->confirm_receipt() to remove the cap from the revoking list:
189429 2020-06-03T01:04:09.249+0000 7fbd51f2a700 1 -- [v2:172.21.15.107:6834/313709009,v1:172.21.15.107:6835/313709009] <== client.4394 v1 :172.21.15.70:0/3065332795 78 ==== client_caps(update ino 0x10000000392 65 seq 1 caps=pAsLsXs dirty=- wanted=pAsxXsxFxwb follows 0 s ize 104857600/0 mtime 2020-06-03T01:04:03.072493+0000 xattrs(v=17179869185 l=0)) v10 ==== 236+0+0 (unknown 2146284701 0 0) 0x56150d5 17300 con 0x56150c865800 189430 2020-06-03T01:04:09.249+0000 7fbd51f2a700 7 mds.0.locker handle_client_caps on 0x10000000392 tid 0 follows 0 op update flags 0x0 189431 2020-06-03T01:04:09.249+0000 7fbd51f2a700 20 mds.0.23 get_session have 0x56150c8e3a80 client.4394 v1:172.21.15.70:0/3065332795 state open 189432 2020-06-03T01:04:09.249+0000 7fbd51f2a700 10 mds.0.locker head inode [inode 0x10000000392 [2,head] /client.0/tmp/tmp/data/datafile7 1 auth v2721 dirtyparent s=104857600 n(v0 rc2020-06-03T01:04:03.072493+0000 b104857600 1=1+0)/n(v0 rc2020-06-03T01:00:43.722299+0000 b104857600 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4394=0-213909504@1} caps={4394=pAsxLsXsxFsxcrwb/pAsxX sxFxcwb@3},l=4394 | ptrwaiter=0 lock=0 caps=1 dirtyrstat=1 dirtyparent=1 dirty=1 authpin=0 0x56150dc1d000] 189433 2020-06-03T01:04:09.249+0000 7fbd51f2a700 10 mds.0.locker follows 0 retains pAsLsXs dirty - on [inode 0x10000000392 [2,head] /clien t.0/tmp/tmp/data/datafile71 auth v2721 dirtyparent s=104857600 n(v0 rc2020-06-03T01:04:03.072493+0000 b104857600 1=1+0)/n(v0 rc2020- 06-03T01:00:43.722299+0000 b104857600 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4394=0-213909504@1} caps={4 394=pAsxLsXsxFsxcrwb/pAsxXsxFxcwb@3},l=4394 | ptrwaiter=0 lock=0 caps=1 dirtyrstat=1 dirtyparent=1 dirty=1 authpin=0 0x56150dc1d000] 189434 2020-06-03T01:04:09.249+0000 7fbd51f2a700 10 mds.0.locker wanted pAsxXsxFxcwb -> pAsxXsxFxwb 189435 2020-06-03T01:04:09.249+0000 7fbd51f2a700 10 mds.0.locker _do_cap_update dirty - issued pAsxLsXsxFsxcrwb wanted pAsxXsxFxwb on [inod e 0x10000000392 [2,head] /client.0/tmp/tmp/data/datafile71 auth v2721 dirtyparent s=104857600 n(v0 rc2020-06-03T01:04:03.072493+0000 b104857600 1=1+0)/n(v0 rc2020-06-03T01:00:43.722299+0000 b104857600 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4394=0-213909504@1} caps={4394=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@3},l=4394 | ptrwaiter=0 lock=0 caps=1 dirtyrstat=1 dirtyparent=1 dir ty=1 authpin=0 0x56150dc1d000]
Then at last the revoking caps timeout:
278783 2020-06-03T01:05:11.103+0000 7fbd50727700 20 mds.0.locker caps_tick age = 62.463717 client.4394.0x10000000392 278784 2020-06-03T01:05:11.103+0000 7fbd50727700 0 log_channel(cluster) log [WRN] : client.4394 isn't responding to mclientcaps(revoke), i no 0x10000000392 pending pAsxLsXsxFsxcrwb issued pAsxLsXsxFsxcrwb, sent 62.463717 seconds ago 278785 2020-06-03T01:05:11.103+0000 7fbd50727700 20 mds.0.locker caps_tick client.4394 isn't responding to mclientcaps(revoke), ino 0x10000 000392 pending pAsxLsXsxFsxcrwb issued pAsxLsXsxFsxcrwb, sent 62.463717 seconds ago
Updated by Patrick Donnelly almost 4 years ago
- Target version set to v16.0.0
- Source set to Development
Xiubo, which teuthology test is this from?
Updated by Xiubo Li almost 4 years ago
Patrick Donnelly wrote:
Xiubo, which teuthology test is this from?
Sorry, forgot to copy it, this the second issue from your mail:
Updated by Xiubo Li almost 4 years ago
From the mds.c logs, we can see that:
client_caps(revoke ino 0x10000000392 65 seq 1 > pending pAsLsXs was pAsLsXsFsc
client_caps(grant ino 0x10000000392 65 seq 2 > pending pAsxLsXsxFrw was pAsLsXs
client_caps(grant ino 0x10000000392 65 seq 3 > pending pAsxLsXsxFsxcrwb was pAsxLsXsxFrw
client_caps(update ino 0x10000000392 65 seq 1 > caps=pAsLsXs dirty=- wanted=pAsxXsxFxwb
client_caps(update ino 0x10000000392 65 seq 3 ==> caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFxwb
Before the revoke is completed, there has 2 new grant issued, this will cause the cap->confirm_receipt() won't remove the cap from the revoking_caps list.
Updated by Xiubo Li almost 4 years ago
- Status changed from In Progress to Fix Under Review
Updated by Xiubo Li almost 4 years ago
Xiubo Li wrote:
From the mds.c logs, we can see that:
client_caps(revoke ino 0x10000000392 65 seq 1 > pending pAsLsXs was pAsLsXsFsc
client_caps(grant ino 0x10000000392 65 seq 2 > pending pAsxLsXsxFrw was pAsLsXs
client_caps(grant ino 0x10000000392 65 seq 3 > pending pAsxLsXsxFsxcrwb was pAsxLsXsxFrw
client_caps(update ino 0x10000000392 65 seq 1 > caps=pAsLsXs dirty=- wanted=pAsxXsxFxwb
client_caps(update ino 0x10000000392 65 seq 3 ==> caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFxwbBefore the revoke is completed, there has 2 new grant issued, this will cause the cap->confirm_receipt() won't remove the cap from the revoking_caps list.
The second grant will remove the revoking caps from cap->_revokes and update the cap->_pending, but left the cap item in the Locker::revoking_caps_by_client and Locker::revoking_caps lists. And even the revoke reply arrived, it won't get any chance to be removed from the two lists.
Updated by Xiubo Li almost 4 years ago
Xiubo Li wrote:
From the mds.c logs, we can see that:
client_caps(revoke ino 0x10000000392 65 seq 1 > pending pAsLsXs was pAsLsXsFsc
client_caps(grant ino 0x10000000392 65 seq 2 > pending pAsxLsXsxFrw was pAsLsXs
client_caps(grant ino 0x10000000392 65 seq 3 > pending pAsxLsXsxFsxcrwb was pAsxLsXsxFrw
The revoke request was issued just after the MDS replayed the pending cap flush requests from kclient, which were arrived when the MDS daemon was in reconnect state. And just after the cap flush requests replayed, the revoke request was issued.
And just after that the MDS daemon will run `mdcache->reissue_all_caps();`, and it will issue the followed cap grant requests.
Because the grant requests were issued before the update replay for the revoke request, and the Locker's _pending/_issued... member were already all updated by them.
client_caps(update ino 0x10000000392 65 seq 1 > caps=pAsLsXs dirty=- wanted=pAsxXsxFxwb
client_caps(update ino 0x10000000392 65 seq 3 ==> caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=pAsxXsxFxwbBefore the revoke is completed, there has 2 new grant issued, this will cause the cap->confirm_receipt() won't remove the cap from the revoking_caps list.
Updated by Xiubo Li almost 4 years ago
LOCK_MIX is is a transition state for muti MDSs could do read/write
at the same time, but the Fcb caps are not allowed.
When the Fc cap is issued, the lock state should be excl.
Updated by Patrick Donnelly almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to octopus,nautilus
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46190: octopus: mds: cap revoking requests didn't success when the client doing reconnection added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46191: nautilus: mds: cap revoking requests didn't success when the client doing reconnection added
Updated by Nathan Cutler over 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".