Bug #1537: cmds 100% when copying lots of files, mds_cache_size and mds_bal_frag - CephFS - Ceph

Actions

Copy link

Bug #1537

closed

cmds 100% when copying lots of files, mds_cache_size and mds_bal_frag

Added by DongJin Lee over 12 years ago. Updated over 11 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

node1: client Linux ss2 2.6.39.3-37-default
node2: ceph v0.34 (a single osd running on btrfs and/or ext4 on raid0 6disks, so basically a single /dev/sdb)

Using ffsb, over half million files.
for any test, ffsb initially writes on a dir. before doing any particular test, e.g., random read.

num_files=585938
min_filesize=1MB
max_filesize=1MB
size_weight 1MB 585938

this is about 600GB fileset.
The copy starts smooth, with high MB/s, then after some time, CMDS gets 100% busy, and so COSD hardly do the writes.
With Greg's advise, I've set the mds cache to 1million, and also 3 million, e.g.,

mds_cache_size = 3000000
mds_bal_frag = true

but still cmds gets 100% busy somewhere during the file copying, possibly after 0.2-0.3 million file.
note that when the files are reduced by 1/5, i.e., 120GB (0.1million), the writes are done timely. (or maybe cmds reached 100% just near the end).
I've tried both btrfs and ext4, (default pg 198). also I've tried to increase pg to 700, this didn't help but this test was using the default mds_cache_size)

many thanks :)