Bug #49922: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched - CephFS - Ceph

Actions

Copy link

Bug #49922

closed

MDS slow request lookupino #0x100 on rank 1 block forever on dispatched

Added by 玮文胡 about 3 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Patrick Donnelly

Category:

Target version:

Ceph - v17.0.0

% Done:

Source:

Community (user)

Tags:

Backport:

pacific,octopus,nautilus

Regression:

Yes

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.10

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

40389

Crash signature (v1):

Crash signature (v2):

Description

We have two MDSs deployed by cephadm.

Several hours ago, we got a health warning:

[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.cephfs.gpu006.ddpekw(mds.1): 1 slow requests are blocked > 30 secs

"ceph tell mds.cephfs.gpu006.ddpekw ops" shows:

{
    "ops": [
        {
            "description": "client_request(client.6464550:33851 lookupino #0x100 2021-03-22T09:57:18.273820+0000 RETRY=1 caller_uid=859600029, caller_gid=859600029{})",
            "initiated_at": "2021-03-22T11:16:38.962352+0000",
            "age": 1871.030724191,
            "duration": 1871.030801101,
            "type_data": {
                "flag_point": "dispatched",
                "reqid": "client.6464550:33851",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.6464550",
                    "tid": 33851
                },
                "events": [
                    {
                        "time": "2021-03-22T11:16:38.962352+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962356+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962352+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962369+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962428+0000",
                        "event": "dispatched" 
                    }
                ]
            }
        }
    ],
    "num_ops": 1
}

The "RETRY=1" is because we tried to restart this MDS to resolve this. But apparently, it does not work.

This warning seems to be harmless, IO on client.6464550 and mds.cephfs.gpu006.ddpekw seems not affected.

The inode #0x100 is a special inode that corresponding to MDS rank 0. But this slow op appeared in MDS rank 1, which is weird to me.

We just upgraded our cluster to 5.2.10, but I don't know if this is relevant. "client.6464550" is a Ubuntu kernel client at version "5.4.0-67-generic"

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #49922

MDS slow request lookupino #0x100 on rank 1 block forever on dispatched

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Loïc Dachary almost 3 years ago

Updated by 玮文胡 over 2 years ago

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #49922

MDS slow request lookupino #0x100 on rank 1 block forever on dispatched

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by 玮文 胡 about 3 years ago

Updated by 玮文 胡 about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by 玮文 胡 about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Loïc Dachary almost 3 years ago

Updated by 玮文 胡 over 2 years ago

Updated by 玮文胡 about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by 玮文胡 over 2 years ago