Project

General

Profile

Actions

Bug #1108

closed

Large number of files in a directory makes things grind to a halt

Added by Damien Churchill almost 13 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Whilst extracting a copy of our mail directories onto a 10 node cluster(3xmds, 3xmon, 10xosd) I found that there was one person who had 2.5 million files in their Trash folder. I left this extracting over the weekend and when I returned to it on Monday it had ground to a halt, extracting perhaps 1 message per second, it had only extracted 330,000 files at this point. I managed to find the count using:

python -c "import os; print len(os.listdir('/path/to/folder'))" 

Using ls or find just took too long (presumably it was fetching file metadata as well), but even that python statement took a long time (approx 1hr). I have no idea if this is just a symptom of being a distributed file system or if there is anyway to speed this up however.

Actions #1

Updated by Sage Weil almost 13 years ago

Enabling directory fragmention should fix this.. add

mds bal frag = true

to your [mds] section and restart the mds. It's still off due to limited testing with fragmentation and clustered mds.

Actions #2

Updated by Greg Farnum almost 13 years ago

If that turns out to be too unstable for you and you have gobs of RAM for your MDS, you could also bump up the MDS cache size. It defaults to 100k inodes (each inode takes about 1k), so for ops on directories that are larger than that (and don't have fragmentation turned on) the cache basically flushes completely and then has to re-read off disk.

Actions #3

Updated by Damien Churchill almost 13 years ago

Excellent thanks for the tips. It'll have to wait until Tuesday now for testing but I'll report back then. Going to upgrade the kernel to 2.6.39 as well for the test.

Actions #4

Updated by Sage Weil almost 13 years ago

  • Category set to 1
  • Status changed from New to 4

Did enabling mds frags help?

Actions #5

Updated by Damien Churchill over 12 years ago

Unfortunately I was unable to get any successful results from the test, the cluster crashed after I'd left it copying over night and the mds were stuck in up:replay. I'll give it another try when 0.32 is released and let you know!

Actions #6

Updated by Sage Weil over 12 years ago

  • Status changed from 4 to Closed

Anything new here? Large directories aren't a part of our qa yet, but when they are this'll come up...

Actions #7

Updated by Damien Churchill over 12 years ago

I've just re-created the cluster I was testing this on, and given a 50G lv to store the ceph logs on, so running everything logging wise up full. Going to test with 3 MDS and

mds bal frag = true

Actions #8

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF