Bug #1108
closedLarge number of files in a directory makes things grind to a halt
0%
Description
Whilst extracting a copy of our mail directories onto a 10 node cluster(3xmds, 3xmon, 10xosd) I found that there was one person who had 2.5 million files in their Trash folder. I left this extracting over the weekend and when I returned to it on Monday it had ground to a halt, extracting perhaps 1 message per second, it had only extracted 330,000 files at this point. I managed to find the count using:
python -c "import os; print len(os.listdir('/path/to/folder'))"
Using ls
or find
just took too long (presumably it was fetching file metadata as well), but even that python statement took a long time (approx 1hr). I have no idea if this is just a symptom of being a distributed file system or if there is anyway to speed this up however.
Updated by Sage Weil almost 13 years ago
Enabling directory fragmention should fix this.. add
mds bal frag = true
to your [mds] section and restart the mds. It's still off due to limited testing with fragmentation and clustered mds.
Updated by Greg Farnum almost 13 years ago
If that turns out to be too unstable for you and you have gobs of RAM for your MDS, you could also bump up the MDS cache size. It defaults to 100k inodes (each inode takes about 1k), so for ops on directories that are larger than that (and don't have fragmentation turned on) the cache basically flushes completely and then has to re-read off disk.
Updated by Damien Churchill almost 13 years ago
Excellent thanks for the tips. It'll have to wait until Tuesday now for testing but I'll report back then. Going to upgrade the kernel to 2.6.39 as well for the test.
Updated by Sage Weil almost 13 years ago
- Category set to 1
- Status changed from New to 4
Did enabling mds frags help?
Updated by Damien Churchill over 12 years ago
Unfortunately I was unable to get any successful results from the test, the cluster crashed after I'd left it copying over night and the mds were stuck in up:replay. I'll give it another try when 0.32 is released and let you know!
Updated by Sage Weil over 12 years ago
- Status changed from 4 to Closed
Anything new here? Large directories aren't a part of our qa yet, but when they are this'll come up...
Updated by Damien Churchill over 12 years ago
I've just re-created the cluster I was testing this on, and given a 50G lv to store the ceph logs on, so running everything logging wise up full. Going to test with 3 MDS
and
mds bal frag = true
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.