Project

General

Profile

Actions

Bug #1537

closed

cmds 100% when copying lots of files, mds_cache_size and mds_bal_frag

Added by DongJin Lee over 12 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

node1: client Linux ss2 2.6.39.3-37-default
node2: ceph v0.34 (a single osd running on btrfs and/or ext4 on raid0 6disks, so basically a single /dev/sdb)

Using ffsb, over half million files.
for any test, ffsb initially writes on a dir. before doing any particular test, e.g., random read.

num_files=585938
min_filesize=1MB
max_filesize=1MB
size_weight 1MB 585938

this is about 600GB fileset.
The copy starts smooth, with high MB/s, then after some time, CMDS gets 100% busy, and so COSD hardly do the writes.
With Greg's advise, I've set the mds cache to 1million, and also 3 million, e.g.,

mds_cache_size = 3000000
mds_bal_frag = true

but still cmds gets 100% busy somewhere during the file copying, possibly after 0.2-0.3 million file.
note that when the files are reduced by 1/5, i.e., 120GB (0.1million), the writes are done timely. (or maybe cmds reached 100% just near the end).
I've tried both btrfs and ext4, (default pg 198). also I've tried to increase pg to 700, this didn't help but this test was using the default mds_cache_size)

many thanks :)

Actions #1

Updated by Sage Weil over 12 years ago

Are the files all in the same directory?

Actions #2

Updated by DongJin Lee over 12 years ago

correct, all in the same directory.

I don't remember this symptom back in 0.29.4,
but this was using multiple separate nodes, node1:osd1, node2:osd2 node3:cmon/cmds, node4:client
Thanks

Actions #3

Updated by Greg Farnum over 12 years ago

  • Assignee deleted (Greg Farnum)
  • Target version deleted (v0.36)
Actions #4

Updated by Sage Weil over 12 years ago

  • Priority changed from High to Normal
Actions #5

Updated by Sage Weil about 12 years ago

  • Category set to 1
Actions #6

Updated by Sage Weil over 11 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
Actions #7

Updated by Sage Weil over 11 years ago

  • Status changed from New to Resolved

This is an optimization issue, which we'll get to!

Actions

Also available in: Atom PDF