Bug #1537
closedcmds 100% when copying lots of files, mds_cache_size and mds_bal_frag
0%
Description
node1: client Linux ss2 2.6.39.3-37-default
node2: ceph v0.34 (a single osd running on btrfs and/or ext4 on raid0 6disks, so basically a single /dev/sdb)
Using ffsb, over half million files.
for any test, ffsb initially writes on a dir. before doing any particular test, e.g., random read.
num_files=585938
min_filesize=1MB
max_filesize=1MB
size_weight 1MB 585938
this is about 600GB fileset.
The copy starts smooth, with high MB/s, then after some time, CMDS gets 100% busy, and so COSD hardly do the writes.
With Greg's advise, I've set the mds cache to 1million, and also 3 million, e.g.,
mds_cache_size = 3000000
mds_bal_frag = true
but still cmds gets 100% busy somewhere during the file copying, possibly after 0.2-0.3 million file.
note that when the files are reduced by 1/5, i.e., 120GB (0.1million), the writes are done timely. (or maybe cmds reached 100% just near the end).
I've tried both btrfs and ext4, (default pg 198). also I've tried to increase pg to 700, this didn't help but this test was using the default mds_cache_size)
many thanks :)