Bug #19240: multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid test - CephFS - Ceph

Actions

Copy link

Bug #19240

closed

multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid test

Added by Patrick Donnelly about 7 years ago. Updated about 5 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Patrick Donnelly

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

multimds

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This is possibly not a bug but I thought I would put it here to solicit comments/assistance on what could be causing this disparity. Two graphs showing the client op throughput are on the mira I'm using to host results:

mira092.front.sepia.ceph.com:/mnt/pdonnell/vault/8x8192 20000C MDS 64x4096 Client.2/mds-throughput.png

mira092.front.sepia.ceph.com:/mnt/pdonnell/vault/16x8192 20000C MDS 64x4096 Client/mds-throughput.png

Other graphs are also in the containing directories.

For the 8 MDS case, we're seeing about ~30k maximum aggregate client requests per second and maximum ~14k for 16 MDS. (Note that despite this apparently lack of increased scaling in op throughput, the 16 MDS test still finishes faster. So op throughput isn't telling the whole story on how successful the load distribution is.)

Some ideas for what's happening:

Perhaps due to load on the 8xMDS, the clients are issuing more requests. Perhaps caps being revoked and needing to be reissued?
The 16 MDS case is slower due to the increased work getting locks.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Patrick Donnelly about 7 years ago

Assignee set to Patrick Donnelly

I believe I have this one figured out. The client requests graph (mds-request.png) shows that the 8 MDS workflow is responding to ~250% more client requests than the 16MDS workflow. (This was hard to notice because the y-axis ranges are not the same! :) This increase in work obviously indicates the 8MDS workflow is doing something the 16MDS workflow doesn't, so what is it?

It looks like the increased numer of requests is due to inodes being removed from the cache which is introducing churn. This is evident by the explosive growth in expired inodes by the 1 hour mark for 8MDS, 3 million expired inodes. For 16 MDS, 1.4 million inodes expire by the end of the workflow. Also, looking at the number of cached inodes loaded (mds-ino+), the 8MDS workflow loads ~40% more inodes at ~7.8 million vs. ~5.5 million for 16MDS.

Actions

Copy link

Updated by Patrick Donnelly about 7 years ago

Note: inodes loaded is visible in mds-ino+.png in both workflow directories.

Actions

Copy link

Updated by Patrick Donnelly about 7 years ago

Related to Feature #19362: mds: add perf counters for each type of MDS operation added

Actions

Copy link

Updated by Patrick Donnelly almost 7 years ago

To close this we should confirm hypothesis with new op tracking from http://tracker.ceph.com/issues/19362

I'll do a run with Linode to check.

Actions

Copy link

Updated by John Spray almost 7 years ago

Subject changed from mds: troubling op throughput scaling from 8 to 16 MDS in kernel bulid test to mds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid test

Actions

Copy link

Updated by John Spray almost 7 years ago

Subject changed from mds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid test to multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid test

Actions

Copy link