Project

General

Profile

Actions

Feature #9940

open

uclient: be more robust when dealing with outstanding RADOS IO and stale caps

Added by Greg Farnum over 9 years ago. Updated almost 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

If we've given IO to the Objecter and our caps go stale, we need to do something to handle it.


Related issues 2 (0 open2 closed)

Related to CephFS - Feature #9754: A 'fence and evict' client eviction commandResolved10/13/2014

Actions
Related to CephFS - Feature #9755: Fence late clients during reconnect timeoutResolved10/13/2014

Actions
Actions #1

Updated by John Spray over 9 years ago

While in the general case it is necessary to fence clients that have become unresponsive to the MDS, this type of "soft" op cancellation based on a timeout would be useful for sites with lower integrity requirements, so that they can avoid derailing jobs by fencing clients.

When cancelling ops client side, it only becomes safe for another client to go do competing ops once the OSD epoch has incremented1: otherwise if the cancelled op is still in flight to an OSD, it could arrive after another client has been issued the cap and sent an op of its own. Given that we're in "soft" land on this code path, one could argue that one only needs a grace period between op cancellation and issuing cap to another client, although at this stage we're dealing with the combined potential slowdown on the client side and the OSD side, so the odds of an op getting held up beyond the grace period is higher.

1. This is how the new ENOSPC handling works, it relies on cancelled ops being followed by an OSD map with the full flag set an a higher epoch.

Actions #2

Updated by Greg Farnum almost 8 years ago

  • Category changed from 46 to Correctness/Safety
Actions

Also available in: Atom PDF