Delay in clientreplay on quiet clusters
Because we are checking for clientreplay_done at the end of _dispatch, if a request is completing via a commit context like C_MDS_inode_update_finish, we don't recognise that clientreplay is done until the next time some other messages comes in.
In practice, that means that on a quiet cluster of an MDS and a client, we don't make it out of clientreplay until the client sends its next cap renewal (up to 30s later).
In usage this is mildly annoying, in test it is especially annoying because it causes an overly wide variance in expected timing for failover.
mds: advance clientreplay when replying
...not just at the end of _dispatch. Often we reply
to clients (i.e. complete a request) outside of
_dispatch, and currently in these cases we fail
to check for clientreplay completion (only hitting
that next time someone talks to _dispatch)