Project

General

Profile

Feature #6373

kcephfs: qa: test fscache

Added by Sage Weil about 8 years ago. Updated 4 months ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
Testing
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
kceph, qa-suite
Labels (FS):
qa
Pull request ID:

fscache-gibba.yml View (611 Bytes) David Galloway, 12/01/2020 09:31 PM

fscache-smithi.yml View (620 Bytes) David Galloway, 12/01/2020 09:31 PM


Related issues

Related to CephFS - Bug #6770: ceph fscache: write file more than a page size to orignal file cause cachfiles bug on EOF Can't reproduce
Duplicated by CephFS - Tasks #38386: qa: write kernel fscache tests Closed

History

#1 Updated by Sage Weil about 8 years ago

  • Target version changed from v0.69 to v0.70

#2 Updated by Sage Weil almost 8 years ago

  • Target version changed from v0.70 to v0.77

#3 Updated by Greg Farnum almost 8 years ago

  • Target version deleted (v0.77)

#4 Updated by Greg Farnum over 5 years ago

  • Category set to Testing

#5 Updated by Patrick Donnelly about 2 years ago

  • Related to Bug #6770: ceph fscache: write file more than a page size to orignal file cause cachfiles bug on EOF added

#6 Updated by Patrick Donnelly about 2 years ago

  • Assignee set to Jeff Layton
  • Priority changed from Normal to High
  • Target version set to v15.0.0
  • Component(FS) qa-suite added
  • Labels (FS) qa added

#7 Updated by Patrick Donnelly almost 2 years ago

  • Target version deleted (v15.0.0)

#8 Updated by Patrick Donnelly about 1 year ago

  • Target version set to v16.0.0
  • Component(FS) kceph added

#9 Updated by Jeff Layton about 1 year ago

I started testing fscache in my home environment about a year ago and found that it was pretty horribly broken. David Howells has just posted a major overhaul of the underlying infrastructure, and it now performs better and seems to be more reliable.

I have pushed a branch to the kcephfs tree that's based on the testing branch : wip-ceph-fscache-iter

I think for the testing piece, we mainly want to be able to do whatever testing we would normally do with kcephfs, but with the cache enabled. The catch here is that the clients need a bit of local disk space for this. It would be nice to be able to run a kclient testsuite with an arbitrary set of options.

We may also want some specific tests for fscache. One test I've been doing for fscache is running mount + build + clean + umount in a loop on a large-ish tree (e.g. kernel tree). I think though, that we probably need to add a fscache set of xfstests that do mount + write some specific data + unmount + mount + read and verify. I'll plan to look into that soon and we can then maybe roll up a teuthology wrapper that can run them.

#10 Updated by Jeff Layton about 1 year ago

One other catch. If we want to do testing with fscache, then it would be ideal if we could provision the clients with a dedicated /var/cache/fscache filesystem. cachefilesd can sometimes end up filling up the filesystem, and it's best if that fs isn't the rootfs.

#12 Updated by Jeff Layton about 1 year ago

Patch to add arbitrary mount options to kclient:

https://github.com/ceph/ceph/pull/38407

#13 Updated by Patrick Donnelly 12 months ago

Jeff Layton wrote:

Patch to add arbitrary mount options to kclient:

https://github.com/ceph/ceph/pull/38407

Merged but I obviously won't mark this as resolved because it doesn't test fscache :)

#14 Updated by Jeff Layton 12 months ago

Not by itself, no. That said, the goal of this ticket is a bit unclear. What exactly should we be aiming to do with this?

Testing fscache itself really comes down to running it through existing tests with it enabled. It's sort of transparent to the fs itself when plumbed in correctly. We can roll tests that do mount/write/umount/read and verifies that things are correct, but that won't catch the tricky places where cache coherency race conditions can pop up.

Do we have any provision in teuthology for running existing suites with "modifiers"? It might also be good start enabling some of the kclient test runs with nowsync too.

#15 Updated by Patrick Donnelly 12 months ago

Jeff Layton wrote:

Not by itself, no. That said, the goal of this ticket is a bit unclear. What exactly should we be aiming to do with this?

Testing fscache itself really comes down to running it through existing tests with it enabled. It's sort of transparent to the fs itself when plumbed in correctly. We can roll tests that do mount/write/umount/read and verifies that things are correct, but that won't catch the tricky places where cache coherency race conditions can pop up.

You could run fs:workloads with an override to turn fscache on for the first (k)client.

Do we have any provision in teuthology for running existing suites with "modifiers"? It might also be good start enabling some of the kclient test runs with nowsync too.

Yes, add an override file.

#16 Updated by Jeff Layton 12 months ago

I've already done that then. I guess we can close this. To test fscache, you just need to kick off the run with the following yaml fragment merged into one of the fscache-*.yml files that David attached:

overrides:
  kclient:
    client.0:
        mntopts: ["fsc=client0","nowsync"]
    client.1:
        mntopts: ["fsc=client1","nowsync"]
    client.2:
        mntopts: ["fsc=client2","nowsync"]
    client.3:
        mntopts: ["fsc=client3","nowsync"]

The above adds -o fsc=client.0,nowsync to the first mounted fs, etc.

We'll need to consult with the devops folks if we want to include other machine types in this testing to get an appropriate ansible.cephlab hunk.

#17 Updated by Jeff Layton 12 months ago

  • Status changed from New to Resolved

#18 Updated by Patrick Donnelly 12 months ago

Jeff Layton wrote:

I've already done that then. I guess we can close this. To test fscache, you just need to kick off the run with the following yaml fragment merged into one of the fscache-*.yml files that David attached:

I was hoping we could have something in the upstream QA when this gets merged into the testing branch?

[...]

The above adds -o fsc=client.0,nowsync to the first mounted fs, etc.

We'll need to consult with the devops folks if we want to include other machine types in this testing to get an appropriate ansible.cephlab hunk.

My understanding was that the ansible.cephlab code could be just another yaml fragment.

#19 Updated by Jeff Layton 12 months ago

  • Status changed from Resolved to In Progress

Patrick Donnelly wrote:

Jeff Layton wrote:

I've already done that then. I guess we can close this. To test fscache, you just need to kick off the run with the following yaml fragment merged into one of the fscache-*.yml files that David attached:

I was hoping we could have something in the upstream QA when this gets merged into the testing branch?

Unclear to me what you mean by "something" here. What do you want merged?

#20 Updated by Patrick Donnelly 12 months ago

Jeff Layton wrote:

Patrick Donnelly wrote:

Jeff Layton wrote:

I've already done that then. I guess we can close this. To test fscache, you just need to kick off the run with the following yaml fragment merged into one of the fscache-*.yml files that David attached:

I was hoping we could have something in the upstream QA when this gets merged into the testing branch?

Unclear to me what you mean by "something" here. What do you want merged?

I want to see fscache tested in regularly in teuthology. So, the yaml fragments to turn on fsc and the other necessary changes (ansiblelab configs).

#21 Updated by Jeff Layton 12 months ago

  • Status changed from In Progress to Need More Info

Patrick Donnelly wrote:

I want to see fscache tested in regularly in teuthology. So, the yaml fragments to turn on fsc and the other necessary changes (ansiblelab configs).

Me too. I was under the impression that we didn't have any automated testing for the kclient, and that test runs were kicked off manually (mostly by you). Is that the case? If so, how do we make sure that we add these mount options to the automated runs?

If they are being done manually, then where is the appropriate place to put yaml fragments of this nature to ensure that the people running these tests use them?

#22 Updated by Patrick Donnelly 12 months ago

Jeff Layton wrote:

Patrick Donnelly wrote:

I want to see fscache tested in regularly in teuthology. So, the yaml fragments to turn on fsc and the other necessary changes (ansiblelab configs).

Me too. I was under the impression that we didn't have any automated testing for the kclient, and that test runs were kicked off manually (mostly by you). Is that the case?

The same suite I run manually (fs) is also run regularly in the nightlies.

If so, how do we make sure that we add these mount options to the automated runs?

We need to add the yaml fragments that test fscache. I think having client.0 always use fscache in fs:workload is sufficient.

If they are being done manually, then where is the appropriate place to put yaml fragments of this nature to ensure that the people running these tests use them?

#23 Updated by Patrick Donnelly 11 months ago

  • Target version changed from v16.0.0 to v17.0.0

#24 Updated by Jeff Layton 9 months ago

  • Status changed from Need More Info to In Progress

#25 Updated by Jeff Layton 9 months ago

The yaml frags that let you test fscache are machine-specific since the clients need to be provisioned with an extra filesystem for the cache. I have yaml files for smithi and gibba, but not for other machine types. Is there some way to ensure that this suite is only run on certain machines, or maybe only enable fscache when run on those machine types?

#26 Updated by Patrick Donnelly 9 months ago

Jeff Layton wrote:

The yaml frags that let you test fscache are machine-specific since the clients need to be provisioned with an extra filesystem for the cache. I have yaml files for smithi and gibba, but not for other machine types. Is there some way to ensure that this suite is only run on certain machines, or maybe only enable fscache when run on those machine types?

What's special about smithi vs. gibba? We could teach kclient.yaml to setup a file system for fscache based on the machine type but it would be good to generalize it as much as possible.

#27 Updated by Patrick Donnelly 7 months ago

  • Duplicated by Tasks #38386: qa: write kernel fscache tests added

#28 Updated by Jeff Layton 4 months ago

Patrick Donnelly wrote:

What's special about smithi vs. gibba? We could teach kclient.yaml to setup a file system for fscache based on the machine type but it would be good to generalize it as much as possible.

That sounds good. How do we do that?

Nothing is special about those machines, other than the fact that I happen to have override yaml fragments for them. I don't think we have a facility that says "I need a 50g local partition formatted and mounted at /var/cache/fscache", and that can make that happen universally for any machine type. The partitioning seems to be custom-rolled on a per-machine-type basis.

The simplest fix would be to just always have the clients provision a dedicated filesystem at /var/cache/fscache. That's a waste of space when it's not being used though.

Also available in: Atom PDF