Bug #45432: fastfail of client requests for homeless session scenario - rgw - Ceph

Actions

Copy link

Bug #45432

open

fastfail of client requests for homeless session scenario

Added by Or Friedmann almost 4 years ago. Updated almost 2 years ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Or Friedmann

Target version:

Ceph - v16.0.0

% Done:

Source:

Tags:

Backport:

octopus pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

38350

Crash signature (v1):

Crash signature (v2):

Description

[Problem] As per the current radosgw behaviour, for any client request, a blocking call is made to the osd to fetch the object. But in case of homeless session i.e when no osd's for that object is available to serve data, the rgw thread hangs indefinitely waiting for an osd to come active. If multiple such requests come, all the radosgw thread gets exhausted, each waiting indefinitely for the osd to come back. This creates a complete service unavailability. Even though there are many other active osd to serve valid client requests, the rgw threads are simply not free to take incoming request.

[Solution] There is no point in indefinitely waiting when all the osd's for an object are down. It is appropriate to cancel the op in such scenarios so that the radosgw thread is free to take more incoming valid requests. Also, this tunable should be configurable from the ceph.conf as to enable or disable this feature.