Bug #19955: Too many stat ops when MDS trying to probe a large file - CephFS - Ceph

Actions

Copy link

Bug #19955

closed

Too many stat ops when MDS trying to probe a large file

Added by Sandy Xu almost 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Zheng Yan

Category:

Target version:

% Done:

Source:

Tags:

Backport:

jewel, kraken

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v10.2.5

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

When MDS recovers, it may emit tons of `stat` ops to OSDs trying to probe a large file, which as a result prevents clients to access this file. In our environment, the cluster complains:

2017-05-16 09:29:20.131180 7f9cc4914700 0 log_channel(cluster) log [WRN] : slow request 962.445029 seconds old, received at 2017-05-16 09:13:17.686086: client_request(client.1225271:14693 getattr pAsLsXsFs #10000012f8f 2017-05-16 09:02:36.373213 RETRY=2) currently failed to rdlock, waiting

The MDS objecter log shows a lot of `stat` ops(inode 10000012f8f refers to a file of size 10TB):

2017-05-16 11:29:15.129724 7f9cc2910700 10 mds.0.objecter _op_submit op 0x55664bc241c0
2017-05-16 11:29:15.129902 7f9cc2910700 10 mds.0.objecter _calc_target pgid 2.b5d43f4c acting [63,69,50,23,64,49]
2017-05-16 11:29:15.129911 7f9cc2910700 10 mds.0.objecter _op_submit oid 10000012f8f.00003b8d '@2' '@2' [stat] tid 5729647 osd.63
2017-05-16 11:29:15.129920 7f9cc2910700 5 mds.0.objecter 1 unacked, 0 uncommitted
2017-05-16 11:29:15.138341 7f9cb737e700 10 mds.0.objecter ms_dispatch 0x55664b4a0000 osd_op_reply(5729647 10000012f8f.00003b8d [stat] v0'0 uv7450 ondisk = 0) v7
2017-05-16 11:29:15.138352 7f9cb737e700 10 mds.0.objecter in handle_osd_op_reply
2017-05-16 11:29:15.138356 7f9cb737e700 7 mds.0.objecter handle_osd_op_reply 5729647 ondisk v 0'0 uv 7450 in 2.b5d43f4c attempt 0
2017-05-16 11:29:15.138359 7f9cb737e700 10 mds.0.objecter op 0 rval 0 len 16
2017-05-16 11:29:15.138365 7f9cb737e700 5 mds.0.objecter 0 unacked, 0 uncommitted
2017-05-16 11:29:15.138418 7f9cc2910700 10 mds.0.objecter _op_submit op 0x55664ec53200
2017-05-16 11:29:15.138594 7f9cc2910700 10 mds.0.objecter _calc_target pgid 2.876e8608 acting [34,1,57,30,41,2]
2017-05-16 11:29:15.138603 7f9cc2910700 10 mds.0.objecter _op_submit oid 10000012f8f.00003b8c '@2' '@2' [stat] tid 5729648 osd.34
2017-05-16 11:29:15.138613 7f9cc2910700 5 mds.0.objecter 1 unacked, 0 uncommitted
2017-05-16 11:29:15.139957 7f9cb6772700 10 mds.0.objecter ms_dispatch 0x55664b4a0000 osd_op_reply(5729648 10000012f8f.00003b8c [stat] v0'0 uv7416 ondisk = 0) v7
2017-05-16 11:29:15.139969 7f9cb6772700 10 mds.0.objecter in handle_osd_op_reply
2017-05-16 11:29:15.139974 7f9cb6772700 7 mds.0.objecter handle_osd_op_reply 5729648 ondisk v 0'0 uv 7416 in 2.876e8608 attempt 0
2017-05-16 11:29:15.139980 7f9cb6772700 10 mds.0.objecter op 0 rval 0 len 16

I guess the MDS is probably busy running here:
https://github.com/ceph/ceph/blob/master/src/mds/RecoveryQueue.cc#L102-L104

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: Atom PDF

	Copied to CephFS - Backport #20148: jewel: Too many stat ops when MDS trying to probe a large file	Resolved	Nathan Cutler				Actions
	Copied to CephFS - Backport #20149: kraken: Too many stat ops when MDS trying to probe a large file	Rejected					Actions

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #19955

Too many stat ops when MDS trying to probe a large file

Updated by John Spray almost 7 years ago

Updated by Zheng Yan almost 7 years ago

Updated by Sandy Xu almost 7 years ago

Updated by Zheng Yan almost 7 years ago

Updated by John Spray almost 7 years ago

Updated by Zheng Yan almost 7 years ago

Updated by John Spray almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by Sandy Xu almost 7 years ago

Updated by Nathan Cutler over 6 years ago