Project

General

Profile

Actions

Feature #62668

open

qa: use teuthology scripts to test dozens of clients

Added by Patrick Donnelly 8 months ago. Updated 7 months ago.

Status:
New
Priority:
Normal
Category:
Testing
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
reef,quincy
Reviewed:
Affected Versions:
Component(FS):
qa-suite
Labels (FS):
qa, task(medium)
Pull request ID:

Description

We have one small suite for integration testing of multiple clients:

https://github.com/ceph/ceph/tree/9d7c18257836dc888b4a300f3b5af9f080910986/qa/suites/fs/multiclient

and it only does at most 3 clients:

https://github.com/ceph/ceph/blob/9d7c18257836dc888b4a300f3b5af9f080910986/qa/suites/fs/multiclient/clusters/1-mds-3-client.yaml

Use teuthology fragment scripting

https://docs.ceph.com/projects/teuthology/en/latest/fragment_merging.html

to create dozens of clients for testing in `fs:workload`. I (Patrick) am on the fence of whether it's a good idea to put it in `fs:workload` or keep it separate in `fs:multiclient` but we should definitely exercise CephFS more with mutliple client workloads. The workloads we do test should be re-evaluated as fs:workload has numerous workunits which could probably run in parallel on dozens of clients without issue. The key consideration is (a) cluster overload and (b) time to complete. This will probably require some experimental adjustments to see what is both useful and practical.

Actions #1

Updated by Venky Shankar 8 months ago

Patrick Donnelly wrote:

We have one small suite for integration testing of multiple clients:

https://github.com/ceph/ceph/tree/9d7c18257836dc888b4a300f3b5af9f080910986/qa/suites/fs/multiclient

and it only does at most 3 clients:

https://github.com/ceph/ceph/blob/9d7c18257836dc888b4a300f3b5af9f080910986/qa/suites/fs/multiclient/clusters/1-mds-3-client.yaml

Use teuthology fragment scripting

https://docs.ceph.com/projects/teuthology/en/latest/fragment_merging.html

to create dozens of clients for testing in `fs:workload`.

Fair enough. Do you think we should do the same with fs:thrash?

I (Patrick) am on the fence of whether it's a good idea to put it in `fs:workload` or keep it separate in `fs:multiclient` but we should definitely exercise CephFS more with mutliple client workloads.

I'd say we make it a part of fs:multiclient.

The workloads we do test should be re-evaluated as fs:workload has numerous workunits which could probably run in parallel on dozens of clients without issue. The key consideration is (a) cluster overload and (b) time to complete. This will probably require some experimental adjustments to see what is both useful and practical.

Agreed.

Actions #2

Updated by Venky Shankar 8 months ago

  • Assignee set to Christopher Hoffman
  • Backport set to reef,quincy
Actions #3

Updated by Venky Shankar 7 months ago

I was talking to Patrick regrading the best way (workload) to stress our cap handling code and one option that came up was to have multiple clients clone a repo (say, ceph repo) each, made a bunch of changes locally, commit those changes and then merge it to the "super" repo (suing the file protocol, of-course). That should give us a good start at exercising the cap handling code a lot more that we do today.

Patrick, anything else you want to add?

Actions #4

Updated by Patrick Donnelly 7 months ago

Venky Shankar wrote:

I was talking to Patrick regrading the best way (workload) to stress our cap handling code and one option that came up was to have multiple clients clone a repo (say, ceph repo) each, made a bunch of changes locally, commit those changes and then merge it to the "super" repo (suing the file protocol, of-course). That should give us a good start at exercising the cap handling code a lot more that we do today.

and also file locking!

Patrick, anything else you want to add?

We should hunt for a cooperative multi-client workload but I'm concerned there just isn't one that exists. It may need to be built.

Actions #5

Updated by Greg Farnum 7 months ago

We should see how much work it is to adapt http://jepson.io to filesystems — it's the only generalized distributed consistency tester I'm aware of.

Actions #6

Updated by Christopher Hoffman 7 months ago

Started a pad to evaluate various options, please feel free to add to it:
https://pad.ceph.com/p/403z4gUayw7sXHc4xqq5

Actions

Also available in: Atom PDF