Actions
Bug #49922
closedMDS slow request lookupino #0x100 on rank 1 block forever on dispatched
% Done:
0%
Source:
Community (user)
Tags:
Backport:
pacific,octopus,nautilus
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We have two MDSs deployed by cephadm.
Several hours ago, we got a health warning:
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests mds.cephfs.gpu006.ddpekw(mds.1): 1 slow requests are blocked > 30 secs
"ceph tell mds.cephfs.gpu006.ddpekw ops" shows:
{
"ops": [
{
"description": "client_request(client.6464550:33851 lookupino #0x100 2021-03-22T09:57:18.273820+0000 RETRY=1 caller_uid=859600029, caller_gid=859600029{})",
"initiated_at": "2021-03-22T11:16:38.962352+0000",
"age": 1871.030724191,
"duration": 1871.030801101,
"type_data": {
"flag_point": "dispatched",
"reqid": "client.6464550:33851",
"op_type": "client_request",
"client_info": {
"client": "client.6464550",
"tid": 33851
},
"events": [
{
"time": "2021-03-22T11:16:38.962352+0000",
"event": "initiated"
},
{
"time": "2021-03-22T11:16:38.962356+0000",
"event": "throttled"
},
{
"time": "2021-03-22T11:16:38.962352+0000",
"event": "header_read"
},
{
"time": "2021-03-22T11:16:38.962369+0000",
"event": "all_read"
},
{
"time": "2021-03-22T11:16:38.962428+0000",
"event": "dispatched"
}
]
}
}
],
"num_ops": 1
}
The "RETRY=1" is because we tried to restart this MDS to resolve this. But apparently, it does not work.
This warning seems to be harmless, IO on client.6464550 and mds.cephfs.gpu006.ddpekw seems not affected.
The inode #0x100 is a special inode that corresponding to MDS rank 0. But this slow op appeared in MDS rank 1, which is weird to me.
We just upgraded our cluster to 5.2.10, but I don't know if this is relevant. "client.6464550" is a Ubuntu kernel client at version "5.4.0-67-generic"
Actions