Project

General

Profile

Actions

Feature #45021

open

client: new asok commands for diagnosing cap handling issues

Added by Jeff Layton about 4 years ago. Updated 8 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Introspection/Control
Target version:
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
Client
Labels (FS):
task(intern), task(medium)
Pull request ID:

Description

I've been working on some ganesha+cephfs issues that have been reported, and it's quickly becoming evident to me that we lack good tools for diagnosing cap revocation issues. I've had at least a couple of cases where clients are not giving up caps and that causes slow requests on the MDS. Connecting those dots however can take a lot of time and effort.

This is just a start to the discussion, but what would be nice would be to have an asok command (or something) that dumps out a delinquent client summary. For each cap grant/revoke in flight, display:

1) inode number
2) what caps are being revoked (or just dump issued + implemented and maybe other fields)
3) how long ago was the revoke message sent?

That's something we could easily ask customers for and shouldn't end up giving us reams of info to pick through (like debug logging does). At the same time, we could use some counterpart info on the clients. The client generally knows when a cap revoke comes in -- it'd be nice to have some insight into why they aren't being released. There, it would be good to see a list of:

1) inode number (and maybe type, and possibly primary dentry path)
2) when was cap revoke initially received
3) issued + implemented caps (and maybe dirty, flushing, etc)
4) which caps have outstanding references

For the userland client, an asok command might make sense (do the clients get an asok?). The kclient could display this in sysfs (I do already have some draft patches for some of this in a branch somewhere).


Related issues 1 (1 open0 closed)

Related to CephFS - Feature #44279: client: provide asok commands to getattr an inode with desired capsFix Under ReviewVenky Shankar

Actions
Actions #1

Updated by Greg Farnum about 4 years ago

In general: yes! This is a long-desired addition to our introspection and debugging abilities.

But some questions to make sure we've looked through things:
What do the session dump commands output, and what's missing? (I know they include some of this info around pending revokes, but I think not per-cap requests?)
What does the cache dump include, and what's missing? (It should have the MDS-side state but I think not ongoing revokes.)

In general tracking MClientCaps status is a big hole; they didn't get fitted into the OpRequest tracking infrastructure and I suspect we'd need that or something similar to satisfy this.

Actions #2

Updated by Jeff Layton about 4 years ago

Greg Farnum wrote:

In general: yes! This is a long-desired addition to our introspection and debugging abilities.

But some questions to make sure we've looked through things:
What do the session dump commands output, and what's missing? (I know they include some of this info around pending revokes, but I think not per-cap requests?)

I didn't see a session dump command, but there is a session ls. It has some high level counts for recall_caps and release_caps, but otherwise it doesn't tell you enough detail. You might be able to use it to ID problematic clients though.

What does the cache dump include, and what's missing? (It should have the MDS-side state but I think not ongoing revokes.)

dump cache does have info about caps, but it's going to dump out every inode in the cache, which makes the signal to noise ratio pretty bad. On my vstart cluster, it's 56M of text from one of the 3 mds's. On a large, long-running cluster we could have to sift through (and download) gigabytes of text.

What I think we need is a way to be able to look at a glance and see what's happening with cap activity. In particular, any caps where
pending != issued != wanted ?

That might indicate places where the client hasn't caught up to what the server wants (or vice versa), and those are of particular interest for debugging.

As far as what it would print, the info in the client_caps field in the dump cache output would be good, and maybe something about the lag time. Alternately, we could just have it dump a list of inodes that are of interest and use dump inode command to get the rest? IDK what would be best, really.

In general tracking MClientCaps status is a big hole; they didn't get fitted into the OpRequest tracking infrastructure and I suspect we'd need that or something similar to satisfy this.

Yeah, for the userland client in particular, that's going to be a project. The kclient has debugfs. We have some info in a "caps" file in there already, but it could be expanded (knowing about outstanding cap refs would be nice) and maybe reformatted to better highlight the ones that are undergoing changes.

Actions #3

Updated by Patrick Donnelly almost 4 years ago

  • Subject changed from new asok commands for diagnosing cap handling issues to client: new asok commands for diagnosing cap handling issues
  • Component(FS) Client added
  • Labels (FS) task(intern), task(medium) added
Actions #4

Updated by Venky Shankar over 2 years ago

  • Assignee set to Kotresh Hiremath Ravishankar
Actions #5

Updated by Venky Shankar about 1 year ago

  • Related to Feature #44279: client: provide asok commands to getattr an inode with desired caps added
Actions #6

Updated by Venky Shankar about 1 year ago

  • Category set to Introspection/Control
  • Assignee changed from Kotresh Hiremath Ravishankar to Venky Shankar
  • Target version set to v19.0.0

Kotresh, I'm taking this one and 44279

Actions #7

Updated by Venky Shankar 8 months ago

  • Status changed from New to In Progress
Actions

Also available in: Atom PDF