Project

General

Profile

Bug #19306

fs: mount NFS to cephfs, and then ls a directory containing a large number of files, resulting in ls hang.

Added by geng jichao about 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The ceph_readdir function save lot of date in the file->private_date, include the last_name which uses as offset.However, in the nfs or cifs system file, when read a directory, they will open and close the directory many times, because the contents of the directory cannot be read once, this lead to last_name be null, and start reading from the beginning every time.Finaly, the time complexity of readdir is O(n^2), the n is file nums/max_readdir.

the nfs readdir code is at fs/nfsd/vfs.c/nfsd_readdir.

the kernel version is 4.4.0-46.

History

#1 Updated by Nathan Cutler about 7 years ago

  • Tracker changed from Tasks to Support
  • Project changed from Stable releases to CephFS
  • Target version deleted (v10.2.7)
  • Release set to jewel
  • Affected Versions v10.2.6 added

#2 Updated by John Spray about 7 years ago

  • Tracker changed from Support to Bug
  • Project changed from CephFS to Linux kernel client
  • Subject changed from mount NFS to cephfs, and then ls a directory containing a large number of files, resulting in ls hang. to kcephfs: mount NFS to cephfs, and then ls a directory containing a large number of files, resulting in ls hang.
  • Regression set to No
  • Severity set to 3 - minor
  • Release deleted (jewel)
  • Affected Versions deleted (v10.2.6)

#3 Updated by Zheng Yan about 7 years ago

  • Project changed from Linux kernel client to CephFS
  • Subject changed from kcephfs: mount NFS to cephfs, and then ls a directory containing a large number of files, resulting in ls hang. to fs: mount NFS to cephfs, and then ls a directory containing a large number of files, resulting in ls hang.

This bug is not specific to kernel client. Enabling directory fragments can help. The complete fix is make client encode hash of last dentry in readdir request.

#4 Updated by geng jichao about 7 years ago

I have used the offset parameter of the ceph_dir_llseek function, and it will be passed to mds in readdir request,if the last_name is null, I will use the offset as offset_hash in the handle_client_readdir.it can avoid unnecessary requests to mds, the performance has been greatly improved, but I do not know how to fill in the cache, the ceph_readdir_cache_control.index may be reset to zero, which will cause the contents of the cache error.

#5 Updated by John Spray about 7 years ago

  • Assignee set to Jeff Layton

#6 Updated by Zheng Yan almost 7 years ago

  • Status changed from New to In Progress
  • Assignee changed from Jeff Layton to Zheng Yan

#7 Updated by Zheng Yan almost 7 years ago

  • Status changed from In Progress to Fix Under Review

#9 Updated by geng jichao almost 7 years ago

I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correct。
In other words,req->r_readdir_cache_idx = fi.readdir_cache_idx, then cache_ctl.index = req->r_readdir_cache_idx, but when the file struct is destroyed, the fi.readdir_cache_idx is reset to zero,this causes the cache error。

#10 Updated by Zheng Yan almost 7 years ago

geng jichao wrote:

I have a question, if the file struct is destroyed,how to ensure that cache_ctl.index is correct。
In other words,req->r_readdir_cache_idx = fi.readdir_cache_idx, then cache_ctl.index = req->r_readdir_cache_idx, but when the file struct is destroyed, the fi.readdir_cache_idx is reset to zero,this causes the cache error。

req->r_readdir_cache_idx is -1 by default. cache is disabled unless ceph_readdir_prepopulate() set it to 0

#11 Updated by John Spray almost 7 years ago

The userspace piece (https://github.com/ceph/ceph/pull/14317) has merged.

Zheng: please resolve the ticket when the kernel part has gone upstream

#12 Updated by Zheng Yan over 6 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF