Project

General

Profile

Actions

Bug #49922

closed

MDS slow request lookupino #0x100 on rank 1 block forever on dispatched

Added by 玮文 胡 about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific,octopus,nautilus
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have two MDSs deployed by cephadm.

Several hours ago, we got a health warning:

[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.cephfs.gpu006.ddpekw(mds.1): 1 slow requests are blocked > 30 secs

"ceph tell mds.cephfs.gpu006.ddpekw ops" shows:

{
    "ops": [
        {
            "description": "client_request(client.6464550:33851 lookupino #0x100 2021-03-22T09:57:18.273820+0000 RETRY=1 caller_uid=859600029, caller_gid=859600029{})",
            "initiated_at": "2021-03-22T11:16:38.962352+0000",
            "age": 1871.030724191,
            "duration": 1871.030801101,
            "type_data": {
                "flag_point": "dispatched",
                "reqid": "client.6464550:33851",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.6464550",
                    "tid": 33851
                },
                "events": [
                    {
                        "time": "2021-03-22T11:16:38.962352+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962356+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962352+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962369+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962428+0000",
                        "event": "dispatched" 
                    }
                ]
            }
        }
    ],
    "num_ops": 1
}

The "RETRY=1" is because we tried to restart this MDS to resolve this. But apparently, it does not work.

This warning seems to be harmless, IO on client.6464550 and mds.cephfs.gpu006.ddpekw seems not affected.

The inode #0x100 is a special inode that corresponding to MDS rank 0. But this slow op appeared in MDS rank 1, which is weird to me.

We just upgraded our cluster to 5.2.10, but I don't know if this is relevant. "client.6464550" is a Ubuntu kernel client at version "5.4.0-67-generic"


Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #50282: pacific: MDS slow request lookupino #0x100 on rank 1 block forever on dispatchedResolvedPatrick DonnellyActions
Copied to CephFS - Backport #50283: octopus: MDS slow request lookupino #0x100 on rank 1 block forever on dispatchedResolvedNathan CutlerActions
Copied to CephFS - Backport #50284: nautilus: MDS slow request lookupino #0x100 on rank 1 block forever on dispatchedRejectedNathan CutlerActions
Actions

Also available in: Atom PDF