Project

General

Profile

Bug #49922

MDS slow request lookupino #0x100 on rank 1 block forever on dispatched

Added by 玮文 胡 about 1 month ago. Updated 12 days ago.

Status:
Pending Backport
Priority:
Normal
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific,octopus,nautilus
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have two MDSs deployed by cephadm.

Several hours ago, we got a health warning:

[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.cephfs.gpu006.ddpekw(mds.1): 1 slow requests are blocked > 30 secs

"ceph tell mds.cephfs.gpu006.ddpekw ops" shows:

{
    "ops": [
        {
            "description": "client_request(client.6464550:33851 lookupino #0x100 2021-03-22T09:57:18.273820+0000 RETRY=1 caller_uid=859600029, caller_gid=859600029{})",
            "initiated_at": "2021-03-22T11:16:38.962352+0000",
            "age": 1871.030724191,
            "duration": 1871.030801101,
            "type_data": {
                "flag_point": "dispatched",
                "reqid": "client.6464550:33851",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.6464550",
                    "tid": 33851
                },
                "events": [
                    {
                        "time": "2021-03-22T11:16:38.962352+0000",
                        "event": "initiated" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962356+0000",
                        "event": "throttled" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962352+0000",
                        "event": "header_read" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962369+0000",
                        "event": "all_read" 
                    },
                    {
                        "time": "2021-03-22T11:16:38.962428+0000",
                        "event": "dispatched" 
                    }
                ]
            }
        }
    ],
    "num_ops": 1
}

The "RETRY=1" is because we tried to restart this MDS to resolve this. But apparently, it does not work.

This warning seems to be harmless, IO on client.6464550 and mds.cephfs.gpu006.ddpekw seems not affected.

The inode #0x100 is a special inode that corresponding to MDS rank 0. But this slow op appeared in MDS rank 1, which is weird to me.

We just upgraded our cluster to 5.2.10, but I don't know if this is relevant. "client.6464550" is a Ubuntu kernel client at version "5.4.0-67-generic"


Related issues

Copied to CephFS - Backport #50282: pacific: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched In Progress
Copied to CephFS - Backport #50283: octopus: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched In Progress
Copied to CephFS - Backport #50284: nautilus: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched In Progress

History

#1 Updated by Patrick Donnelly about 1 month ago

  • Assignee set to Jeff Layton
  • Target version set to v17.0.0
  • Source set to Community (user)

#2 Updated by Jeff Layton about 1 month ago

I'm unfamiliar with the MDS code so some notes as I peruse it:

Ok, so the TrackedOp entries get put on the list when tracking_start() is called, and taken off on last reference put() for the tracking op. TrackedOp::put() looks racy to me, but I assume something must protect it from running concurrently with other put() calls -- I'm just not sure what yet.

Beyond that, I'm not sure how to get more info about why this particular op is stuck from the info above. Was there anything useful in the logs from mds 1 about the op and what state it's in?

Beyond that, it's not clear why the client is requesting this particular inode number. If you're correct that 0x100 is the inode number that's being requested:

#define MAX_MDS                   0x100
#define MDS_INO_MDSDIR_OFFSET (1*MAX_MDS)

...then that's the mdsdir for MDS0. The client only does lookupino requests to satisfy knfsd requests (or open_by_handle_at()) or for looking up quota info.

In the latter case, the ino to look up appears to ultimately be provided by the MDS via snap notifications or traces.

#3 Updated by 玮文 胡 about 1 month ago

Jeff Layton wrote:

Was there anything useful in the logs from mds 1 about the op and what state it's in?

I set debug_mds to 20/20 for a while and found this:

2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache handle_open_ino_reply openinoreply(18898 0x100 0 []) v1
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache do_open_ino_peer 0x100 active 0,1 all 0,1 checked
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache handle_open_ino_reply openinoreply(18898 0x100 0 []) v1
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache do_open_ino_peer 0x100 active 0,1 all 0,1 checked
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache handle_open_ino_reply openinoreply(18898 0x100 0 []) v1
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache do_open_ino_peer 0x100 active 0,1 all 0,1 checked
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache handle_open_ino_reply openinoreply(18898 0x100 0 []) v1
2021-03-22T16:53:15.529+0000 7f0211e2e700 10 mds.1.cache do_open_ino_peer 0x100 active 0,1 all 0,1 checked

It is repeating tens of thousands of times per second. Maybe a dead loop?

If you're correct that 0x100 is the inode number that's being requested

I get 0x100 from the output of "ceph tell mds.cephfs.gpu006.ddpekw ops", it should be correct.

Thank you for your reply. If you want to know any other information, please contact me.

#4 Updated by 玮文 胡 29 days ago

Logs from mds.0. Also repeating at the same frequency.

2021-03-23T05:16:57.387+0000 7f6f10838700 10 mds.0.cache handle_open_ino openino(18898 0x100 []) v1 err 0
2021-03-23T05:16:57.387+0000 7f6f10838700 10 mds.0.cache  have [inode 0x100 [...2,head] ~mds0/ auth v121235 snaprealm=0x55b87c389400 f(v0 10=0+10) n(v25786 rc2021-03-23T05:15:35.539258+0000 b821344990 6045=6032+13)/n(v0 rc2020-09-15T04:11:00.679066+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=0 0x55b87cf93000]
2021-03-23T05:16:57.391+0000 7f6f10838700 10 mds.0.cache handle_open_ino openino(18898 0x100 []) v1 err 0
2021-03-23T05:16:57.391+0000 7f6f10838700 10 mds.0.cache  have [inode 0x100 [...2,head] ~mds0/ auth v121235 snaprealm=0x55b87c389400 f(v0 10=0+10) n(v25786 rc2021-03-23T05:15:35.539258+0000 b821344990 6045=6032+13)/n(v0 rc2020-09-15T04:11:00.679066+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=0 0x55b87cf93000]
2021-03-23T05:16:57.391+0000 7f6f10838700 10 mds.0.cache handle_open_ino openino(18898 0x100 []) v1 err 0
2021-03-23T05:16:57.391+0000 7f6f10838700 10 mds.0.cache  have [inode 0x100 [...2,head] ~mds0/ auth v121235 snaprealm=0x55b87c389400 f(v0 10=0+10) n(v25786 rc2021-03-23T05:15:35.539258+0000 b821344990 6045=6032+13)/n(v0 rc2020-09-15T04:11:00.679066+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=0 0x55b87cf93000]

#5 Updated by Patrick Donnelly 28 days ago

  • Status changed from New to In Progress
  • Assignee changed from Jeff Layton to Patrick Donnelly

#6 Updated by Patrick Donnelly 28 days ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to pacific,octopus,nautilus
  • Pull request ID set to 40389

#7 Updated by Jeff Layton 28 days ago

Yeah, that repeated call makes it look like the client is repeatedly calling in to the MDS for that inode number. It's not clear what

There are 2 possibilities here. The MDS gave us a bogus inode number or the client came up with one on its own:

One possibility is that you're exporting cephfs via knfsd and a NFS client handed it a filehandle that made it do the lookupino here. Are you exporting your ceph mount via knfsd?

A local application could also do something similar with open_by_handle_at(). Aside from that, the only kernel code that does a lookupino is quota related. As best I can tell, any inode number we would call that against would have come from the MDS originally.

I think we need to fix the client to throw a warning when it tries to look up an inode number that's in this range, so we can figure out why it's doing this. That'd potentially give us a call stack so we can determine why the client is doing this.

#8 Updated by Patrick Donnelly 28 days ago

Jeff Layton wrote:

Yeah, that repeated call makes it look like the client is repeatedly calling in to the MDS for that inode number. It's not clear what

Actually, a single inolookup for 0x100 will cause infinite ino lookups between the two MDS.

There are 2 possibilities here. The MDS gave us a bogus inode number or the client came up with one on its own:

One possibility is that you're exporting cephfs via knfsd and a NFS client handed it a filehandle that made it do the lookupino here. Are you exporting your ceph mount via knfsd?

A local application could also do something similar with open_by_handle_at(). Aside from that, the only kernel code that does a lookupino is quota related. As best I can tell, any inode number we would call that against would have come from the MDS originally.

I think we need to fix the client to throw a warning when it tries to look up an inode number that's in this range, so we can figure out why it's doing this. That'd potentially give us a call stack so we can determine why the client is doing this.

I'm also curious how this lookup from the client originated.

#9 Updated by 玮文 胡 28 days ago

Jeff Layton wrote:

Are you exporting your ceph mount via knfsd?

No, I don't have anything related to NFS deployed.

Aside from that, the only kernel code that does a lookupino is quota related.

I do have some quotas set up.

Now, I rebooted the client and mds.1. The warning has gone. Thank you.

#10 Updated by Patrick Donnelly 12 days ago

  • Status changed from Fix Under Review to Pending Backport

#11 Updated by Backport Bot 12 days ago

  • Copied to Backport #50282: pacific: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched added

#12 Updated by Backport Bot 12 days ago

  • Copied to Backport #50283: octopus: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched added

#13 Updated by Backport Bot 12 days ago

  • Copied to Backport #50284: nautilus: MDS slow request lookupino #0x100 on rank 1 block forever on dispatched added

Also available in: Atom PDF