Bug #45432
openfastfail of client requests for homeless session scenario
0%
Description
[Problem] As per the current radosgw behaviour, for any client request, a blocking call is made to the osd to fetch the object. But in case of homeless session i.e when no osd's for that object is available to serve data, the rgw thread hangs indefinitely waiting for an osd to come active. If multiple such requests come, all the radosgw thread gets exhausted, each waiting indefinitely for the osd to come back. This creates a complete service unavailability. Even though there are many other active osd to serve valid client requests, the rgw threads are simply not free to take incoming request.
[Solution] There is no point in indefinitely waiting when all the osd's for an object are down. It is appropriate to cancel the op in such scenarios so that the radosgw thread is free to take more incoming valid requests. Also, this tunable should be configurable from the ceph.conf as to enable or disable this feature.
Updated by Matthew Oliver almost 4 years ago
OK, I think I'm managed to recreate the problem in my vstart env. Now time to poke around :)
Updated by Or Friedmann almost 4 years ago
- Status changed from New to Fix Under Review
- Pull request ID changed from 34365 to 35458
Updated by Casey Bodley over 2 years ago
- Backport changed from nautilus octopus to octopus pacific
- Pull request ID changed from 35458 to 38350