Bug #3935
closedkclient: Big directory access bugs (multiple), mixed 32- and 64-bit clients
0%
Description
I have next directory structure in ceph fs:
somedir subdir1 == 35K files, every 20MB size ==
I'm using Ceph Bobtail. This directory is mounted from fstab like this:
10.252.0.3:6789,10.252.0.2:6789,10.252.0.4:6789:/ /mnt/ceph ceph _netdev,snapdirname=.cc633faa563cbe671221758ad9c01de3,dirstat,norbytes,nocrc,name=admin,secret=SOMESECRET==,readdir_max_entries=8192,readdir_max_bytes=4194304 0 0
from two hosts:
1st is 32bit host running 3.7.3 kernel with ceph module
2nd is 64bit host running 3.7.3 kernel with ceph module
If I cd to that dir from x64, I can see the contents and can copy files from it to local fs.
If I try to create some subdir inside subdir1 and move a file from subdir1 to subdir1/subdir2 it hangs on all accessing hosts, only complete unmount from all hosts, restart mds fixes.
If I cd to that dir from 32bit, I see empty directory, but can see that directory subdir1 includes 35K files from it's stat.
Sometimes, when I try to cd to subdir1 directory, It hangs randomly. Full umount and restart mds-s fixes it.
When the problem occures, other mounts of ceph (with another subdirs) works well and no problem at all.
I have 3 node/15 osds (5 on each), every on separate drive installation, journal in RAMFS. XFS as backing store for OSD. Also have 3 mons and 3 mds on separate 3 nodes. Every mon writes to SSD. Hosts are connected using 1G/10G mixed.
Updated by Ivan Kudryavtsev over 11 years ago
At #3936 I'm providing some benchmarks to show that IOPS/speed is OK for my installation and my hands are not performance connected. My installation also has 1MDS active, 2 standby.
Updated by Ivan Kudryavtsev over 11 years ago
I made a mistake during initial post: amount of files in directory is 3.5K, not 35K. It's my netflow for last years, so just arbitrary tar.gz files.
Updated by Zheng Yan over 11 years ago
please set 'debug mds = 10' and upload mds log. To minimize mds log size, please truncate the mds log before executing the any operation that causes hang.
Updated by Ivan Kudryavtsev over 11 years ago
I will be able to reproduce after the Feb,8. Willl do if nobody will reproduce before.
Updated by Sage Weil over 11 years ago
- Status changed from New to Need More Info
Updated by Ivan Kudryavtsev about 11 years ago
Tried with updated to 0.56.2. Found no troubles, but actually environment changed, since I removed 32-bit kernel client at all.
Updated by Sage Weil about 11 years ago
- Subject changed from kclient: Big directory access bugs (multiple) to kclient: Big directory access bugs (multiple), mixed 32- and 64-bit clients
- Status changed from Need More Info to 12
Updated by Greg Farnum about 11 years ago
The only way I can think of that a 32-bit client would be different is in the inode assignment; could it be running into a conflict and behaving badly?
Updated by Sage Weil about 11 years ago
I was thinking the offset for readdir might be 32-bits.. but i may be wrong there.
Updated by Greg Farnum about 10 years ago
The hangs sound like generic cap and request waitlisting issues to to me. The empty directory is tickling something in my brain about possible listing issues that Zheng fixed several months ago; is that a possibility or am I making it up?
Updated by Zheng Yan about 9 years ago
- Status changed from 12 to Can't reproduce