Actions
Bug #65182
closedmds: quiesce_inode op waiting on remote auth pins is not killed correctly during quiesce timeout/expiration
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Development
Tags:
backport_processed
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
{ "description": "internal op quiesce_path:mds.1:1048 fp=#0x1/volumes/_nogroup/sv_new_1_def_11/0d61d4d2-d869-46f0-93a0-d9b9e74401c2", "initiated_at": "2024-03-26T10:06:14.974850+0000", "age": 101818.022728012, "duration": 101818.025116246, "continuous": true, "type_data": { "result": -2147483648, "flag_point": "cleaned up request", "reqid": { "entity": { "type": "mds", "num": 1 }, "tid": 1048 }, "op_type": "internal_op", "internal_op": 5384, "op_name": "quiesce_path", "events": [ { "time": "2024-03-26T10:06:14.974850+0000", "event": "initiated" }, { "time": "2024-03-26T10:06:14.974850+0000", "event": "throttled" }, { "time": "2024-03-26T10:06:14.974850+0000", "event": "header_read" }, { "time": "2024-03-26T10:06:14.974850+0000", "event": "all_read" }, { "time": "2024-03-26T10:06:14.974850+0000", "event": "dispatched" }, { "time": "2024-03-26T10:06:14.974869+0000", "event": "acquired locks" }, { "time": "2024-03-26T10:06:14.974879+0000", "event": "acquired locks" }, { "time": "2024-03-26T10:06:14.974888+0000", "event": "acquired locks" }, { "time": "2024-03-26T10:06:14.974898+0000", "event": "acquired locks" }, { "time": "2024-03-26T10:06:21.501232+0000", "event": "killing request" }, { "time": "2024-03-26T10:06:21.501253+0000", "event": "cleaned up request" } ], "locks": [] } }, ... { "description": "internal op quiesce_inode:mds.1:1049 fp=#0x100008e255a fp2=#0x100008e255a", "initiated_at": "2024-03-26T10:06:14.974908+0000", "age": 101818.022670109, "duration": 101818.02511086701, "continuous": true, "type_data": { "result": -2147483648, "flag_point": "quiesce complete for non-auth inode", "reqid": { "entity": { "type": "mds", "num": 1 }, "tid": 1049 }, "op_type": "internal_op", "internal_op": 5385, "op_name": "quiesce_inode", "events": [ { "time": "2024-03-26T10:06:14.974908+0000", "event": "initiated" }, { "time": "2024-03-26T10:06:14.974908+0000", "event": "throttled" }, { "time": "2024-03-26T10:06:14.974908+0000", "event": "header_read" }, { "time": "2024-03-26T10:06:14.974908+0000", "event": "all_read" }, { "time": "2024-03-26T10:06:14.974908+0000", "event": "dispatched" }, { "time": "2024-03-26T10:06:14.974977+0000", "event": "requesting remote authpins" }, { "time": "2024-03-26T10:06:21.615411+0000", "event": "acquired locks" }, { "time": "2024-03-26T10:06:21.615458+0000", "event": "quiesce complete for non-auth inode" } ], "locks": [ { "object": { "is_auth": false, "auth_state": { "replicas": {} }, "replica_state": { "authority": [ 0, -2 ], "replica_nonce": 1 }, "auth_pins": 0, "is_frozen": false, "is_freezing": false, "pins": { "request": 1, "lock": 1 }, "nref": 2 }, "object_string": "[inode 0x100008e255a [...2ae,head] /volumes/_nogroup/sv_new_1_def_11/0d61d4d2-d869-46f0-93a0-d9b9e74401c2/ rep@0.1 v1696 snaprealm=0x55b78d09f440 f(v0 m2024-03-26T10:05:13.326074+0000 10=2+8) n(v56 rc2024-03-26T10:17:04.624239+0000 b2670077140 31541=28967+2574)/n(v0 rc2024-03-26T09:40:15.892764+0000 b1027604480 138=3+135) (inest mix) (iquiesce lock x=1 by request(mds.1:1049 nref=3)) | request=1 lock=1 0x55b78d1b4580]", "lock": { "gather_set": [], "state": "lock", "type": "iquiesce", "is_leased": false, "num_rdlocks": 0, "num_wrlocks": 0, "num_xlocks": 1, "xlock_by": { "reqid": { "entity": { "type": "mds", "num": 1 }, "tid": 1049 } } }, "flags": 4, "wrlock_target": -1 } ] } },
This is an op dump from a QE test cluster. The quiesce_path was killed and shortly after the quiesce_inode op received the remote authpins allowing it to proceed. However, MDCache::request_kill does not actually kill a request waiting on remote authpins so it is allowed to proceed with its quiesce.
Updated by Patrick Donnelly about 1 month ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 56536
Updated by Patrick Donnelly about 1 month ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot about 1 month ago
- Copied to Backport #65214: squid: mds: quiesce_inode op waiting on remote auth pins is not killed correctly during quiesce timeout/expiration added
Updated by Patrick Donnelly 22 days ago
- Status changed from Pending Backport to Resolved
Actions