Bug #1537
closedcmds 100% when copying lots of files, mds_cache_size and mds_bal_frag
0%
Description
node1: client Linux ss2 2.6.39.3-37-default
node2: ceph v0.34 (a single osd running on btrfs and/or ext4 on raid0 6disks, so basically a single /dev/sdb)
Using ffsb, over half million files.
for any test, ffsb initially writes on a dir. before doing any particular test, e.g., random read.
num_files=585938
min_filesize=1MB
max_filesize=1MB
size_weight 1MB 585938
this is about 600GB fileset.
The copy starts smooth, with high MB/s, then after some time, CMDS gets 100% busy, and so COSD hardly do the writes.
With Greg's advise, I've set the mds cache to 1million, and also 3 million, e.g.,
mds_cache_size = 3000000
mds_bal_frag = true
but still cmds gets 100% busy somewhere during the file copying, possibly after 0.2-0.3 million file.
note that when the files are reduced by 1/5, i.e., 120GB (0.1million), the writes are done timely. (or maybe cmds reached 100% just near the end).
I've tried both btrfs and ext4, (default pg 198). also I've tried to increase pg to 700, this didn't help but this test was using the default mds_cache_size)
many thanks :)
Updated by DongJin Lee over 12 years ago
correct, all in the same directory.
I don't remember this symptom back in 0.29.4,
but this was using multiple separate nodes, node1:osd1, node2:osd2 node3:cmon/cmds, node4:client
Thanks
Updated by Greg Farnum over 12 years ago
- Assignee deleted (
Greg Farnum) - Target version deleted (
v0.36)
Updated by Sage Weil over 11 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1)
Updated by Sage Weil over 11 years ago
- Status changed from New to Resolved
This is an optimization issue, which we'll get to!