Project

General

Profile

Bug #45432

fastfail of client requests for homeless session scenario

Added by Or Friedmann 9 months ago. Updated 6 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Target version:
% Done:

0%

Source:
Tags:
Backport:
nautilus octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

[Problem] As per the current radosgw behaviour, for any client request, a blocking call is made to the osd to fetch the object. But in case of homeless session i.e when no osd's for that object is available to serve data, the rgw thread hangs indefinitely waiting for an osd to come active. If multiple such requests come, all the radosgw thread gets exhausted, each waiting indefinitely for the osd to come back. This creates a complete service unavailability. Even though there are many other active osd to serve valid client requests, the rgw threads are simply not free to take incoming request.

[Solution] There is no point in indefinitely waiting when all the osd's for an object are down. It is appropriate to cancel the op in such scenarios so that the radosgw thread is free to take more incoming valid requests. Also, this tunable should be configurable from the ceph.conf as to enable or disable this feature.

History

#1 Updated by Matthew Oliver 8 months ago

OK, I think I'm managed to recreate the problem in my vstart env. Now time to poke around :)

#2 Updated by Or Friedmann 8 months ago

  • Assignee set to Or Friedmann

#3 Updated by Or Friedmann 6 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID changed from 34365 to 35458

Also available in: Atom PDF