Bug #49922: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched - CephFS - Ceph

Actions

Copy link

Bug #49922

closed

MDS slow request lookupino #0x100 on rank 1 block forever on dispatched

Added by 玮文胡 about 3 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Patrick Donnelly

Category:

Target version:

Ceph - v17.0.0

% Done:

Source:

Community (user)

Tags:

Backport:

pacific,octopus,nautilus

Regression:

Yes

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.10

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

40389

Crash signature (v1):

Crash signature (v2):

Description

We have two MDSs deployed by cephadm.

Several hours ago, we got a health warning:

[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.cephfs.gpu006.ddpekw(mds.1): 1 slow requests are blocked > 30 secs

"ceph tell mds.cephfs.gpu006.ddpekw ops" shows:

{
    "ops": [
        {
            "description": "client_request(client.6464550:33851 lookupino #0x100 2021-03-22T09:57:18.273820+0000 RETRY=1 caller_uid=859600029, caller_gid=859600029{})",
            "initiated_at": "2021-03-22T11:16:38.962352+0000",
            "age": 1871.030724191,
            "duration": 1871.030801101,
            "type_data": {
                "flag_point": "dispatched",
                "reqid": "client.6464550:33851",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.6464550",
                    "tid": 33851
                },
                "events": [
                    {
                        "time": "2021-03-22T11:16:38.962352+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962356+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962352+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962369+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962428+0000",
                        "event": "dispatched" 
                    }
                ]
            }
        }
    ],
    "num_ops": 1
}

The "RETRY=1" is because we tried to restart this MDS to resolve this. But apparently, it does not work.

This warning seems to be harmless, IO on client.6464550 and mds.cephfs.gpu006.ddpekw seems not affected.

The inode #0x100 is a special inode that corresponding to MDS rank 0. But this slow op appeared in MDS rank 1, which is weird to me.

We just upgraded our cluster to 5.2.10, but I don't know if this is relevant. "client.6464550" is a Ubuntu kernel client at version "5.4.0-67-generic"

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Patrick Donnelly about 3 years ago

Assignee set to Jeff Layton
Target version set to v17.0.0
Source set to Community (user)

Actions

Copy link

Updated by Jeff Layton about 3 years ago

I'm unfamiliar with the MDS code so some notes as I peruse it:

Ok, so the TrackedOp entries get put on the list when tracking_start() is called, and taken off on last reference put() for the tracking op. TrackedOp::put() looks racy to me, but I assume something must protect it from running concurrently with other put() calls -- I'm just not sure what yet.

Beyond that, I'm not sure how to get more info about why this particular op is stuck from the info above. Was there anything useful in the logs from mds 1 about the op and what state it's in?

Beyond that, it's not clear why the client is requesting this particular inode number. If you're correct that 0x100 is the inode number that's being requested:

#define MAX_MDS                   0x100
    #define MDS_INO_MDSDIR_OFFSET     (1*MAX_MDS)

...then that's the mdsdir for MDS0. The client only does lookupino requests to satisfy knfsd requests (or open_by_handle_at()) or for looking up quota info.

In the latter case, the ino to look up appears to ultimately be provided by the MDS via snap notifications or traces.

Actions

Copy link

Updated by 玮文胡 about 3 years ago

Jeff Layton wrote:

Was there anything useful in the logs from mds 1 about the op and what state it's in?

I set debug_mds to 20/20 for a while and found this:

2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache handle_open_ino_reply openinoreply(18898 0x100 0 []) v1
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache do_open_ino_peer 0x100 active 0,1 all 0,1 checked
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache handle_open_ino_reply openinoreply(18898 0x100 0 []) v1
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache do_open_ino_peer 0x100 active 0,1 all 0,1 checked
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache handle_open_ino_reply openinoreply(18898 0x100 0 []) v1
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache do_open_ino_peer 0x100 active 0,1 all 0,1 checked
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache handle_open_ino_reply openinoreply(18898 0x100 0 []) v1
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache do_open_ino_peer 0x100 active 0,1 all 0,1 checked

It is repeating tens of thousands of times per second. Maybe a dead loop?

If you're correct that 0x100 is the inode number that's being requested

I get 0x100 from the output of "ceph tell mds.cephfs.gpu006.ddpekw ops", it should be correct.

Thank you for your reply. If you want to know any other information, please contact me.

Actions

Copy link

Updated by 玮文胡 about 3 years ago

Logs from mds.0. Also repeating at the same frequency.

2021-03-23T05:16:57.387+0000 7f6f10838700 10 mds.0.cache handle_open_ino openino(18898 0x100 []) v1 err 0
2021-03-23T05:16:57.387+0000 7f6f10838700 10 mds.0.cache  have [inode 0x100 [...2,head] ~mds0/ auth v121235 snaprealm=0x55b87c389400 f(v0 10=0+10) n(v25786 rc2021-03-23T05:15:35.539258+0000 b821344990 6045=6032+13)/n(v0 rc2020-09-15T04:11:00.679066+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=0 0x55b87cf93000]
2021-03-23T05:16:57.391+0000 7f6f10838700 10 mds.0.cache handle_open_ino openino(18898 0x100 []) v1 err 0
2021-03-23T05:16:57.391+0000 7f6f10838700 10 mds.0.cache  have [inode 0x100 [...2,head] ~mds0/ auth v121235 snaprealm=0x55b87c389400 f(v0 10=0+10) n(v25786 rc2021-03-23T05:15:35.539258+0000 b821344990 6045=6032+13)/n(v0 rc2020-09-15T04:11:00.679066+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=0 0x55b87cf93000]
2021-03-23T05:16:57.391+0000 7f6f10838700 10 mds.0.cache handle_open_ino openino(18898 0x100 []) v1 err 0
2021-03-23T05:16:57.391+0000 7f6f10838700 10 mds.0.cache  have [inode 0x100 [...2,head] ~mds0/ auth v121235 snaprealm=0x55b87c389400 f(v0 10=0+10) n(v25786 rc2021-03-23T05:15:35.539258+0000 b821344990 6045=6032+13)/n(v0 rc2020-09-15T04:11:00.679066+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=0 0x55b87cf93000]

Actions

Copy link

Updated by Patrick Donnelly about 3 years ago

Status changed from New to In Progress
Assignee changed from Jeff Layton to Patrick Donnelly

Actions

Copy link

Updated by Patrick Donnelly about 3 years ago

Status changed from In Progress to Fix Under Review
Backport set to pacific,octopus,nautilus
Pull request ID set to 40389

Actions

Copy link

Updated by Jeff Layton about 3 years ago

Yeah, that repeated call makes it look like the client is repeatedly calling in to the MDS for that inode number. It's not clear what

There are 2 possibilities here. The MDS gave us a bogus inode number or the client came up with one on its own:

One possibility is that you're exporting cephfs via knfsd and a NFS client handed it a filehandle that made it do the lookupino here. Are you exporting your ceph mount via knfsd?

A local application could also do something similar with open_by_handle_at(). Aside from that, the only kernel code that does a lookupino is quota related. As best I can tell, any inode number we would call that against would have come from the MDS originally.

I think we need to fix the client to throw a warning when it tries to look up an inode number that's in this range, so we can figure out why it's doing this. That'd potentially give us a call stack so we can determine why the client is doing this.

Actions

Copy link

Updated by Patrick Donnelly about 3 years ago

Jeff Layton wrote:

Yeah, that repeated call makes it look like the client is repeatedly calling in to the MDS for that inode number. It's not clear what

Actually, a single inolookup for 0x100 will cause infinite ino lookups between the two MDS.

There are 2 possibilities here. The MDS gave us a bogus inode number or the client came up with one on its own:

One possibility is that you're exporting cephfs via knfsd and a NFS client handed it a filehandle that made it do the lookupino here. Are you exporting your ceph mount via knfsd?

A local application could also do something similar with open_by_handle_at(). Aside from that, the only kernel code that does a lookupino is quota related. As best I can tell, any inode number we would call that against would have come from the MDS originally.

I think we need to fix the client to throw a warning when it tries to look up an inode number that's in this range, so we can figure out why it's doing this. That'd potentially give us a call stack so we can determine why the client is doing this.

I'm also curious how this lookup from the client originated.

Actions

Copy link

Updated by 玮文胡 about 3 years ago

Jeff Layton wrote:

Are you exporting your ceph mount via knfsd?

No, I don't have anything related to NFS deployed.

Aside from that, the only kernel code that does a lookupino is quota related.

I do have some quotas set up.

Now, I rebooted the client and mds.1. The warning has gone. Thank you.

Actions

Copy link

#10

Updated by Patrick Donnelly about 3 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#11

Updated by Backport Bot about 3 years ago

Copied to Backport #50282: pacific: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched added

Actions

Copy link

#12

Updated by Backport Bot about 3 years ago

Copied to Backport #50283: octopus: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched added

Actions

Copy link

#13

Updated by Backport Bot about 3 years ago

Copied to Backport #50284: nautilus: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched added

Actions

Copy link

#14

Updated by Loïc Dachary almost 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

#15

Updated by 玮文胡 over 2 years ago

We are triggering the new warning: https://tracker.ceph.com/issues/53180

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #49922

MDS slow request lookupino #0x100 on rank 1 block forever on dispatched

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Loïc Dachary almost 3 years ago

Updated by 玮文胡 over 2 years ago

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #49922

MDS slow request lookupino #0x100 on rank 1 block forever on dispatched

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by 玮文 胡 about 3 years ago

Updated by 玮文 胡 about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by 玮文 胡 about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Backport Bot about 3 years ago

Updated by Loïc Dachary almost 3 years ago

Updated by 玮文 胡 over 2 years ago

Updated by 玮文胡 about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by 玮文胡 about 3 years ago

Updated by 玮文胡 over 2 years ago