Feature #9940
openuclient: be more robust when dealing with outstanding RADOS IO and stale caps
0%
Description
If we've given IO to the Objecter and our caps go stale, we need to do something to handle it.
Updated by John Spray over 9 years ago
While in the general case it is necessary to fence clients that have become unresponsive to the MDS, this type of "soft" op cancellation based on a timeout would be useful for sites with lower integrity requirements, so that they can avoid derailing jobs by fencing clients.
When cancelling ops client side, it only becomes safe for another client to go do competing ops once the OSD epoch has incremented1: otherwise if the cancelled op is still in flight to an OSD, it could arrive after another client has been issued the cap and sent an op of its own. Given that we're in "soft" land on this code path, one could argue that one only needs a grace period between op cancellation and issuing cap to another client, although at this stage we're dealing with the combined potential slowdown on the client side and the OSD side, so the odds of an op getting held up beyond the grace period is higher.
1. This is how the new ENOSPC handling works, it relies on cancelled ops being followed by an OSD map with the full flag set an a higher epoch.
Updated by Greg Farnum almost 8 years ago
- Category changed from 46 to Correctness/Safety