Feature #6373: kcephfs: qa: test fscache - CephFS - Ceph

I started testing fscache in my home environment about a year ago and found that it was pretty horribly broken. David Howells has just posted a major overhaul of the underlying infrastructure, and it now performs better and seems to be more reliable.

I have pushed a branch to the kcephfs tree that's based on the testing branch : wip-ceph-fscache-iter

I think for the testing piece, we mainly want to be able to do whatever testing we would normally do with kcephfs, but with the cache enabled. The catch here is that the clients need a bit of local disk space for this. It would be nice to be able to run a kclient testsuite with an arbitrary set of options.

We may also want some specific tests for fscache. One test I've been doing for fscache is running mount + build + clean + umount in a loop on a large-ish tree (e.g. kernel tree). I think though, that we probably need to add a fscache set of xfstests that do mount + write some specific data + unmount + mount + read and verify. I'll plan to look into that soon and we can then maybe roll up a teuthology wrapper that can run them.

Actions

Copy link

#10

Updated by Jeff Layton over 3 years ago

One other catch. If we want to do testing with fscache, then it would be ideal if we could provision the clients with a dedicated /var/cache/fscache filesystem. cachefilesd can sometimes end up filling up the filesystem, and it's best if that fs isn't the rootfs.

Actions

Copy link Download all files

#11

Updated by David Galloway over 3 years ago

File fscache-smithi.yml fscache-smithi.yml added
File fscache-gibba.yml fscache-gibba.yml added

https://github.com/ceph/ceph-cm-ansible/pull/592
https://github.com/ceph/ceph-cm-ansible/pull/593

Actions

Copy link

#12

Updated by Jeff Layton over 3 years ago

Patch to add arbitrary mount options to kclient:

https://github.com/ceph/ceph/pull/38407

Actions

Copy link

#13

Updated by Patrick Donnelly over 3 years ago

Jeff Layton wrote:

Patch to add arbitrary mount options to kclient:

https://github.com/ceph/ceph/pull/38407

Merged but I obviously won't mark this as resolved because it doesn't test fscache :)

Actions

Copy link

#14

Updated by Jeff Layton over 3 years ago

Not by itself, no. That said, the goal of this ticket is a bit unclear. What exactly should we be aiming to do with this?

Testing fscache itself really comes down to running it through existing tests with it enabled. It's sort of transparent to the fs itself when plumbed in correctly. We can roll tests that do mount/write/umount/read and verifies that things are correct, but that won't catch the tricky places where cache coherency race conditions can pop up.

Do we have any provision in teuthology for running existing suites with "modifiers"? It might also be good start enabling some of the kclient test runs with nowsync too.

Actions

Copy link

#15

Updated by Patrick Donnelly over 3 years ago

Jeff Layton wrote:

Not by itself, no. That said, the goal of this ticket is a bit unclear. What exactly should we be aiming to do with this?

Testing fscache itself really comes down to running it through existing tests with it enabled. It's sort of transparent to the fs itself when plumbed in correctly. We can roll tests that do mount/write/umount/read and verifies that things are correct, but that won't catch the tricky places where cache coherency race conditions can pop up.

You could run fs:workloads with an override to turn fscache on for the first (k)client.

Do we have any provision in teuthology for running existing suites with "modifiers"? It might also be good start enabling some of the kclient test runs with nowsync too.

Yes, add an override file.

Actions

Copy link

#16

Updated by Jeff Layton over 3 years ago

I've already done that then. I guess we can close this. To test fscache, you just need to kick off the run with the following yaml fragment merged into one of the fscache-*.yml files that David attached:

overrides:
  kclient:
    client.0:
        mntopts: ["fsc=client0","nowsync"]
    client.1:
        mntopts: ["fsc=client1","nowsync"]
    client.2:
        mntopts: ["fsc=client2","nowsync"]
    client.3:
        mntopts: ["fsc=client3","nowsync"]

The above adds -o fsc=client.0,nowsync to the first mounted fs, etc.

We'll need to consult with the devops folks if we want to include other machine types in this testing to get an appropriate ansible.cephlab hunk.

Actions

Copy link

#17

Updated by Jeff Layton over 3 years ago

Status changed from New to Resolved

Actions

Copy link

#18

Updated by Patrick Donnelly over 3 years ago

Jeff Layton wrote:

I've already done that then. I guess we can close this. To test fscache, you just need to kick off the run with the following yaml fragment merged into one of the fscache-*.yml files that David attached:

I was hoping we could have something in the upstream QA when this gets merged into the testing branch?

[...]

The above adds -o fsc=client.0,nowsync to the first mounted fs, etc.

We'll need to consult with the devops folks if we want to include other machine types in this testing to get an appropriate ansible.cephlab hunk.

My understanding was that the ansible.cephlab code could be just another yaml fragment.

Actions

Copy link

#19

Updated by Jeff Layton over 3 years ago

Status changed from Resolved to In Progress

Patrick Donnelly wrote:

Jeff Layton wrote:

I've already done that then. I guess we can close this. To test fscache, you just need to kick off the run with the following yaml fragment merged into one of the fscache-*.yml files that David attached:

I was hoping we could have something in the upstream QA when this gets merged into the testing branch?

Unclear to me what you mean by "something" here. What do you want merged?

Actions

Copy link

#20

Updated by Patrick Donnelly over 3 years ago

Jeff Layton wrote:

Patrick Donnelly wrote:

Jeff Layton wrote:

I've already done that then. I guess we can close this. To test fscache, you just need to kick off the run with the following yaml fragment merged into one of the fscache-*.yml files that David attached:

I was hoping we could have something in the upstream QA when this gets merged into the testing branch?

Unclear to me what you mean by "something" here. What do you want merged?

I want to see fscache tested in regularly in teuthology. So, the yaml fragments to turn on fsc and the other necessary changes (ansiblelab configs).

Actions

Copy link

#21

Updated by Jeff Layton over 3 years ago

Status changed from In Progress to Need More Info

Patrick Donnelly wrote:

I want to see fscache tested in regularly in teuthology. So, the yaml fragments to turn on fsc and the other necessary changes (ansiblelab configs).

Me too. I was under the impression that we didn't have any automated testing for the kclient, and that test runs were kicked off manually (mostly by you). Is that the case? If so, how do we make sure that we add these mount options to the automated runs?

If they are being done manually, then where is the appropriate place to put yaml fragments of this nature to ensure that the people running these tests use them?

Actions

Copy link

#22

Updated by Patrick Donnelly over 3 years ago

Jeff Layton wrote:

Patrick Donnelly wrote:

I want to see fscache tested in regularly in teuthology. So, the yaml fragments to turn on fsc and the other necessary changes (ansiblelab configs).

Me too. I was under the impression that we didn't have any automated testing for the kclient, and that test runs were kicked off manually (mostly by you). Is that the case?

The same suite I run manually (fs) is also run regularly in the nightlies.

If so, how do we make sure that we add these mount options to the automated runs?

We need to add the yaml fragments that test fscache. I think having client.0 always use fscache in fs:workload is sufficient.

If they are being done manually, then where is the appropriate place to put yaml fragments of this nature to ensure that the people running these tests use them?

Actions

Copy link

#23

Updated by Patrick Donnelly over 3 years ago

Target version changed from v16.0.0 to v17.0.0

Actions

Copy link

#24

Updated by Jeff Layton about 3 years ago

Status changed from Need More Info to In Progress

Actions

Copy link

#25

Updated by Jeff Layton about 3 years ago

The yaml frags that let you test fscache are machine-specific since the clients need to be provisioned with an extra filesystem for the cache. I have yaml files for smithi and gibba, but not for other machine types. Is there some way to ensure that this suite is only run on certain machines, or maybe only enable fscache when run on those machine types?

Actions

Copy link

#26

Updated by Patrick Donnelly about 3 years ago

Jeff Layton wrote:

The yaml frags that let you test fscache are machine-specific since the clients need to be provisioned with an extra filesystem for the cache. I have yaml files for smithi and gibba, but not for other machine types. Is there some way to ensure that this suite is only run on certain machines, or maybe only enable fscache when run on those machine types?

What's special about smithi vs. gibba? We could teach kclient.yaml to setup a file system for fscache based on the machine type but it would be good to generalize it as much as possible.

Actions

Copy link

#27

Updated by Patrick Donnelly almost 3 years ago

Has duplicate Tasks #38386: qa: write kernel fscache tests added

Actions

Copy link

#28

Updated by Jeff Layton over 2 years ago

Patrick Donnelly wrote:

What's special about smithi vs. gibba? We could teach kclient.yaml to setup a file system for fscache based on the machine type but it would be good to generalize it as much as possible.

That sounds good. How do we do that?

Nothing is special about those machines, other than the fact that I happen to have override yaml fragments for them. I don't think we have a facility that says "I need a 50g local partition formatted and mounted at /var/cache/fscache", and that can make that happen universally for any machine type. The partitioning seems to be custom-rolled on a per-machine-type basis.

The simplest fix would be to just always have the clients provision a dedicated filesystem at /var/cache/fscache. That's a waste of space when it's not being used though.

Actions

Copy link

#29

Updated by Jeff Layton almost 2 years ago

Assignee changed from Jeff Layton to Xiubo Li

Actions

Copy link

#30

Updated by Patrick Donnelly almost 2 years ago

Target version deleted (~~v17.0.0~~)

fscache-gibba.yml (611 Bytes) fscache-gibba.yml		David Galloway, 12/01/2020 09:31 PM
fscache-smithi.yml (620 Bytes) fscache-smithi.yml		David Galloway, 12/01/2020 09:31 PM

Project

General

Profile

Ceph » CephFS

Custom queries

Feature #6373

kcephfs: qa: test fscache

Updated by Sage Weil over 10 years ago

Updated by Sage Weil about 10 years ago

Updated by Greg Farnum about 10 years ago

Updated by Greg Farnum almost 8 years ago

Updated by Patrick Donnelly over 4 years ago

Updated by Patrick Donnelly over 4 years ago

Updated by Patrick Donnelly about 4 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Jeff Layton over 3 years ago

Updated by Jeff Layton over 3 years ago

Updated by David Galloway over 3 years ago

Updated by Jeff Layton over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Jeff Layton over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Jeff Layton over 3 years ago

Updated by Jeff Layton over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Jeff Layton over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Jeff Layton over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by Jeff Layton about 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Patrick Donnelly almost 3 years ago

Updated by Jeff Layton over 2 years ago

Updated by Jeff Layton almost 2 years ago

Updated by Patrick Donnelly almost 2 years ago