Project

General

Profile

Feature #9940

uclient: be more robust when dealing with outstanding RADOS IO and stale caps

Added by Greg Farnum almost 5 years ago. Updated about 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
Start date:
10/29/2014
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

If we've given IO to the Objecter and our caps go stale, we need to do something to handle it.


Related issues

Related to fs - Feature #9754: A 'fence and evict' client eviction command Resolved 10/13/2014
Related to fs - Feature #9755: Fence late clients during reconnect timeout Resolved 10/13/2014

History

#1 Updated by John Spray almost 5 years ago

While in the general case it is necessary to fence clients that have become unresponsive to the MDS, this type of "soft" op cancellation based on a timeout would be useful for sites with lower integrity requirements, so that they can avoid derailing jobs by fencing clients.

When cancelling ops client side, it only becomes safe for another client to go do competing ops once the OSD epoch has incremented1: otherwise if the cancelled op is still in flight to an OSD, it could arrive after another client has been issued the cap and sent an op of its own. Given that we're in "soft" land on this code path, one could argue that one only needs a grace period between op cancellation and issuing cap to another client, although at this stage we're dealing with the combined potential slowdown on the client side and the OSD side, so the odds of an op getting held up beyond the grace period is higher.

1. This is how the new ENOSPC handling works, it relies on cancelled ops being followed by an OSD map with the full flag set an a higher epoch.

#2 Updated by Greg Farnum about 3 years ago

  • Category changed from 46 to Correctness/Safety

Also available in: Atom PDF