Feature #45021
openclient: new asok commands for diagnosing cap handling issues
0%
Description
I've been working on some ganesha+cephfs issues that have been reported, and it's quickly becoming evident to me that we lack good tools for diagnosing cap revocation issues. I've had at least a couple of cases where clients are not giving up caps and that causes slow requests on the MDS. Connecting those dots however can take a lot of time and effort.
This is just a start to the discussion, but what would be nice would be to have an asok command (or something) that dumps out a delinquent client summary. For each cap grant/revoke in flight, display:
1) inode number
2) what caps are being revoked (or just dump issued + implemented and maybe other fields)
3) how long ago was the revoke message sent?
That's something we could easily ask customers for and shouldn't end up giving us reams of info to pick through (like debug logging does). At the same time, we could use some counterpart info on the clients. The client generally knows when a cap revoke comes in -- it'd be nice to have some insight into why they aren't being released. There, it would be good to see a list of:
1) inode number (and maybe type, and possibly primary dentry path)
2) when was cap revoke initially received
3) issued + implemented caps (and maybe dirty, flushing, etc)
4) which caps have outstanding references
For the userland client, an asok command might make sense (do the clients get an asok?). The kclient could display this in sysfs (I do already have some draft patches for some of this in a branch somewhere).