Project

General

Profile

Actions

Bug #65182

closed

mds: quiesce_inode op waiting on remote auth pins is not killed correctly during quiesce timeout/expiration

Added by Patrick Donnelly about 1 month ago. Updated 22 days ago.

Status:
Resolved
Priority:
Urgent
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Tags:
backport_processed
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

        {
            "description": "internal op quiesce_path:mds.1:1048 fp=#0x1/volumes/_nogroup/sv_new_1_def_11/0d61d4d2-d869-46f0-93a0-d9b9e74401c2",
            "initiated_at": "2024-03-26T10:06:14.974850+0000",
            "age": 101818.022728012,
            "duration": 101818.025116246,
            "continuous": true,
            "type_data": {
                "result": -2147483648,
                "flag_point": "cleaned up request",
                "reqid": {
                    "entity": {
                        "type": "mds",
                        "num": 1
                    },
                    "tid": 1048
                },
                "op_type": "internal_op",
                "internal_op": 5384,
                "op_name": "quiesce_path",
                "events": [
                    {
                        "time": "2024-03-26T10:06:14.974850+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974850+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974850+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974850+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974850+0000",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974869+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974879+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974888+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974898+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2024-03-26T10:06:21.501232+0000",
                        "event": "killing request" 
                    },
                    {
                        "time": "2024-03-26T10:06:21.501253+0000",
                        "event": "cleaned up request" 
                    }
                ],
                "locks": []
            }
        },
...
        {
            "description": "internal op quiesce_inode:mds.1:1049 fp=#0x100008e255a fp2=#0x100008e255a",
            "initiated_at": "2024-03-26T10:06:14.974908+0000",
            "age": 101818.022670109,
            "duration": 101818.02511086701,
            "continuous": true,
            "type_data": {
                "result": -2147483648,
                "flag_point": "quiesce complete for non-auth inode",
                "reqid": {
                    "entity": {
                        "type": "mds",
                        "num": 1
                    },
                    "tid": 1049
                },
                "op_type": "internal_op",
                "internal_op": 5385,
                "op_name": "quiesce_inode",
                "events": [
                    {
                        "time": "2024-03-26T10:06:14.974908+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974908+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974908+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974908+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974908+0000",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2024-03-26T10:06:14.974977+0000",
                        "event": "requesting remote authpins" 
                    },
                    {
                        "time": "2024-03-26T10:06:21.615411+0000",
                        "event": "acquired locks" 
                    },
                    {
                        "time": "2024-03-26T10:06:21.615458+0000",
                        "event": "quiesce complete for non-auth inode" 
                    }
                ],
                "locks": [
                    {
                        "object": {
                            "is_auth": false,
                            "auth_state": {
                                "replicas": {}
                            },
                            "replica_state": {
                                "authority": [
                                    0,
                                    -2
                                ],
                                "replica_nonce": 1
                            },
                            "auth_pins": 0,
                            "is_frozen": false,
                            "is_freezing": false,
                            "pins": {
                                "request": 1,
                                "lock": 1
                            },
                            "nref": 2
                        },
                        "object_string": "[inode 0x100008e255a [...2ae,head] /volumes/_nogroup/sv_new_1_def_11/0d61d4d2-d869-46f0-93a0-d9b9e74401c2/ rep@0.1 v1696 snaprealm=0x55b78d09f440 f(v0 m2024-03-26T10:05:13.326074+0000 10=2+8) n(v56 rc2024-03-26T10:17:04.624239+0000 b2670077140 31541=28967+2574)/n(v0 rc2024-03-26T09:40:15.892764+0000 b1027604480 138=3+135) (inest mix) (iquiesce lock x=1 by request(mds.1:1049 nref=3)) | request=1 lock=1 0x55b78d1b4580]",
                        "lock": {
                            "gather_set": [],
                            "state": "lock",
                            "type": "iquiesce",
                            "is_leased": false,
                            "num_rdlocks": 0,
                            "num_wrlocks": 0,
                            "num_xlocks": 1,
                            "xlock_by": {
                                "reqid": {
                                    "entity": {
                                        "type": "mds",
                                        "num": 1
                                    },
                                    "tid": 1049
                                }
                            }
                        },
                        "flags": 4,
                        "wrlock_target": -1
                    }
                ]
            }
        },

This is an op dump from a QE test cluster. The quiesce_path was killed and shortly after the quiesce_inode op received the remote authpins allowing it to proceed. However, MDCache::request_kill does not actually kill a request waiting on remote authpins so it is allowed to proceed with its quiesce.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #65214: squid: mds: quiesce_inode op waiting on remote auth pins is not killed correctly during quiesce timeout/expirationResolvedPatrick DonnellyActions
Actions #1

Updated by Patrick Donnelly about 1 month ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 56536
Actions #2

Updated by Patrick Donnelly about 1 month ago

  • Status changed from Fix Under Review to Pending Backport
Actions #3

Updated by Backport Bot about 1 month ago

  • Copied to Backport #65214: squid: mds: quiesce_inode op waiting on remote auth pins is not killed correctly during quiesce timeout/expiration added
Actions #4

Updated by Backport Bot about 1 month ago

  • Tags set to backport_processed
Actions #5

Updated by Patrick Donnelly 22 days ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF