Bug #45432
openfastfail of client requests for homeless session scenario
0%
Description
[Problem] As per the current radosgw behaviour, for any client request, a blocking call is made to the osd to fetch the object. But in case of homeless session i.e when no osd's for that object is available to serve data, the rgw thread hangs indefinitely waiting for an osd to come active. If multiple such requests come, all the radosgw thread gets exhausted, each waiting indefinitely for the osd to come back. This creates a complete service unavailability. Even though there are many other active osd to serve valid client requests, the rgw threads are simply not free to take incoming request.
[Solution] There is no point in indefinitely waiting when all the osd's for an object are down. It is appropriate to cancel the op in such scenarios so that the radosgw thread is free to take more incoming valid requests. Also, this tunable should be configurable from the ceph.conf as to enable or disable this feature.