Project

General

Profile

Actions

Bug #19240

closed

multimds on linode: troubling op throughput scaling from 8 to 16 MDS in kernel bulid test

Added by Patrick Donnelly about 7 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is possibly not a bug but I thought I would put it here to solicit comments/assistance on what could be causing this disparity. Two graphs showing the client op throughput are on the mira I'm using to host results:

mira092.front.sepia.ceph.com:/mnt/pdonnell/vault/8x8192 20000C MDS 64x4096 Client.2/mds-throughput.png

mira092.front.sepia.ceph.com:/mnt/pdonnell/vault/16x8192 20000C MDS 64x4096 Client/mds-throughput.png

Other graphs are also in the containing directories.

For the 8 MDS case, we're seeing about ~30k maximum aggregate client requests per second and maximum ~14k for 16 MDS. (Note that despite this apparently lack of increased scaling in op throughput, the 16 MDS test still finishes faster. So op throughput isn't telling the whole story on how successful the load distribution is.)

Some ideas for what's happening:

  • Perhaps due to load on the 8xMDS, the clients are issuing more requests. Perhaps caps being revoked and needing to be reissued?
  • The 16 MDS case is slower due to the increased work getting locks.

Related issues 1 (0 open1 closed)

Related to CephFS - Feature #19362: mds: add perf counters for each type of MDS operationResolvedPatrick Donnelly03/23/2017

Actions
Actions

Also available in: Atom PDF