Project

General

Profile

Actions

Bug #61781

open

mds: couldn't successfully calculate the locker caps

Added by Xiubo Li 11 months ago. Updated 10 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):


Subtasks 1 (0 open1 closed)

Linux kernel client - Bug #61892: [testing] qa: test_snapshot_remove (tasks.cephfs.test_strays.TestStrays)DuplicateXiubo Li

Actions

Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #62197: quincy: mds: couldn't successfully calculate the locker capsDuplicateXiubo LiActions
Copied to CephFS - Backport #62198: reef: mds: couldn't successfully calculate the locker capsDuplicateXiubo LiActions
Copied to CephFS - Backport #62199: pacific: mds: couldn't successfully calculate the locker capsDuplicateXiubo LiActions
Actions #1

Updated by Xiubo Li 11 months ago

After partial caps was revoked, it couldn't correctly update the caps:

2023-06-15T06:33:23.297+0000 7fde93b16700  1 -- [v2:172.21.15.29:6832/1469037096,v1:172.21.15.29:6833/1469037096] <== client.4941 192.168.0.1:0/1731885576 4979 ==== client_caps(update ino 0x3000000025b 1 seq 18 caps=pAsLsXsFcrb dirty=- wanted=Fcr follows 0 size 5984600/12582912 ts 1/18446744073709551615 mtime 2023-06-15T06:33:20.586624+0000 ctime 2023-06-15T06:33:20.586624+0000 change_attr 136 tws 1) v12 ==== 260+0+0 (crc 0 0 0) 0x556a82154380 con 0x556a81cc5c00
2023-06-15T06:33:23.297+0000 7fde93b16700  7 mds.2.locker handle_client_caps  on 0x3000000025b tid 0 follows 0 op update flags 0x0
2023-06-15T06:33:23.297+0000 7fde93b16700 20 mds.2.12 get_session have 0x556a81c89900 client.4941 192.168.0.1:0/1731885576 state open
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker  head inode [inode 0x3000000025b [...2,head] /client.0/tmp/tmp.o8Xb5OZu0n/p8/db/d23/f28 auth v128 ap=2 snaprealm=0x556a8355cb40 DIRTYPARENT s=5984600 nl=2 n(v0 rc2023-06-15T06:33:20.586624+0000 b5984600 1=1+0) (ifile excl->lock) (iversion lock) cr={4941=0-12582912@1} caps={4941=pAsLsXsFcb/pAsLsXsFxcrb/Fcr@18},l=4941 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x556a83664000]
2023-06-15T06:33:23.297+0000 7fde93b16700 10 Capability revocation is not totally finished yet on [inode 0x3000000025b [...2,head] /client.0/tmp/tmp.o8Xb5OZu0n/p8/db/d23/f28 auth v128 ap=2 snaprealm=0x556a8355cb40 DIRTYPARENT s=5984600 nl=2 n(v0 rc2023-06-15T06:33:20.586624+0000 b5984600 1=1+0) (ifile excl->lock) (iversion lock) cr={4941=0-12582912@1} caps={4941=pAsLsXsFcb/pAsLsXsFcrb/Fcr@18},l=4941 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x556a83664000], the session  (4941)
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker  follows 0 retains pAsLsXsFcrb dirty - on [inode 0x3000000025b [...2,head] /client.0/tmp/tmp.o8Xb5OZu0n/p8/db/d23/f28 auth v128 ap=2 snaprealm=0x556a8355cb40 DIRTYPARENT s=5984600 nl=2 n(v0 rc2023-06-15T06:33:20.586624+0000 b5984600 1=1+0) (ifile excl->lock) (iversion lock) cr={4941=0-12582912@1} caps={4941=pAsLsXsFcb/Fcr@18},l=4941 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x556a83664000]
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker _do_cap_update dirty - issued pAsLsXsFcb wanted Fcr on [inode 0x3000000025b [...2,head] /client.0/tmp/tmp.o8Xb5OZu0n/p8/db/d23/f28 auth v128 ap=2 snaprealm=0x556a8355cb40 DIRTYPARENT s=5984600 nl=2 n(v0 rc2023-06-15T06:33:20.586624+0000 b5984600 1=1+0) (ifile excl->lock) (iversion lock) cr={4941=0-12582912@1} caps={4941=pAsLsXsFcb/Fcr@18},l=4941 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x556a83664000]
2023-06-15T06:33:23.297+0000 7fde93b16700 20 mds.2.locker inode is file
2023-06-15T06:33:23.297+0000 7fde93b16700 20 mds.2.locker client has write caps; m->get_max_size=12582912; old_max=12582912
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker eval 3648 [inode 0x3000000025b [...2,head] /client.0/tmp/tmp.o8Xb5OZu0n/p8/db/d23/f28 auth v128 ap=2 snaprealm=0x556a8355cb40 DIRTYPARENT s=5984600 nl=2 n(v0 rc2023-06-15T06:33:20.586624+0000 b5984600 1=1+0) (ifile excl->lock) (iversion lock) cr={4941=0-12582912@1} caps={4941=pAsLsXsFcb/Fcr@18},l=4941 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x556a83664000]
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker eval_gather (ifile excl->lock) on [inode 0x3000000025b [...2,head] /client.0/tmp/tmp.o8Xb5OZu0n/p8/db/d23/f28 auth v128 ap=2 snaprealm=0x556a8355cb40 DIRTYPARENT s=5984600 nl=2 n(v0 rc2023-06-15T06:33:20.586624+0000 b5984600 1=1+0) (ifile excl->lock) (iversion lock) cr={4941=0-12582912@1} caps={4941=pAsLsXsFcb/Fcr@18},l=4941 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x556a83664000]
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker  next state is lock issued/allows loner cb/cb xlocker /cb other /cb
2023-06-15T06:33:23.297+0000 7fde93b16700  7 mds.2.locker eval_gather finished gather on (ifile excl->lock) on [inode 0x3000000025b [...2,head] /client.0/tmp/tmp.o8Xb5OZu0n/p8/db/d23/f28 auth v128 ap=2 snaprealm=0x556a8355cb40 DIRTYPARENT s=5984600 nl=2 n(v0 rc2023-06-15T06:33:20.586624+0000 b5984600 1=1+0) (ifile excl->lock) (iversion lock) cr={4941=0-12582912@1} caps={4941=pAsLsXsFcb/Fcr@18},l=4941 | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x556a83664000]

The cap update msg tell MDS that it still holding the caps=pAsLsXsFcrb caps, but the MDS was trying to revoke the Fxr caps and only the Fx caps was released and still holding the Fr caps and then the MDS just overridded the inode's issued caps as caps={4941=pAsLsXsFcb/Fcr@18}, which is incorrect.

Actions #2

Updated by Xiubo Li 11 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 52176
Actions #3

Updated by Venky Shankar 10 months ago

Xiubo - simialr failure here: /a/vshankar-2023-07-12_07:14:06-fs-wip-vshankar-testing-20230712.041849-testing-default-smithi/7334887

Could you confirm if its the same issue as detailed in this tracker?

Actions #4

Updated by Xiubo Li 10 months ago

Venky Shankar wrote:

Xiubo - simialr failure here: /a/vshankar-2023-07-12_07:14:06-fs-wip-vshankar-testing-20230712.041849-testing-default-smithi/7334887

Could you confirm if its the same issue as detailed in this tracker?

Sure, will do it tomorrow. Thanks.

Actions #5

Updated by Venky Shankar 10 months ago

Xiubo Li wrote:

Venky Shankar wrote:

Xiubo - simialr failure here: /a/vshankar-2023-07-12_07:14:06-fs-wip-vshankar-testing-20230712.041849-testing-default-smithi/7334887

Could you confirm if its the same issue as detailed in this tracker?

Sure, will do it tomorrow. Thanks.

No rush. The PRs under testing were user-space changes, so its not related to that change. But just wanted to double-check if its not a new bug we are running into.

Actions #6

Updated by Xiubo Li 10 months ago

Venky Shankar wrote:

Xiubo Li wrote:

Venky Shankar wrote:

Xiubo - simialr failure here: /a/vshankar-2023-07-12_07:14:06-fs-wip-vshankar-testing-20230712.041849-testing-default-smithi/7334887

Could you confirm if its the same issue as detailed in this tracker?

Sure, will do it tomorrow. Thanks.

No rush. The PRs under testing were user-space changes, so its not related to that change. But just wanted to double-check if its not a new bug we are running into.

This is a known issue https://tracker.ceph.com/issues/61818.

Actions #7

Updated by Venky Shankar 10 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to reef,quincy,pacific
Actions #8

Updated by Backport Bot 10 months ago

  • Copied to Backport #62197: quincy: mds: couldn't successfully calculate the locker caps added
Actions #9

Updated by Backport Bot 10 months ago

  • Copied to Backport #62198: reef: mds: couldn't successfully calculate the locker caps added
Actions #10

Updated by Backport Bot 10 months ago

  • Copied to Backport #62199: pacific: mds: couldn't successfully calculate the locker caps added
Actions #11

Updated by Backport Bot 10 months ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF