Project

General

Profile

Bug #42925

Radosgw threads hangs indefinitely when a PG goes inactive

Added by Biswajeet Patra about 2 months ago. Updated about 1 month ago.

Status:
Triaged
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

As per the current radosgw behaviour, for any read request a blocking call is made to the osd to fetch the object. But, in certain scenarios when the object is part of a PG that has gone inactive due to any reason (either all osds down for the object or osd count less than min_size), the radosgw thread hangs indefinitely waiting for the PG to become active. With multiple similar requests, all the radosgw threads gets exhausted soon and rgw is not able to serve any client requests which may have been targeted for active PGs. This creates a complete service unavailability.

P.S: This issue is faced in luminous version by us, although it could be reproduced in the master branch as well.

History

#1 Updated by Casey Bodley about 2 months ago

  • Status changed from New to Triaged

#2 Updated by Casey Bodley about 1 month ago

yeah, this needs some higher-level discussion and was raised on the ceph devel mailing list. radosgw calls into librados for osd requests, and librados will block indefinitely until a request can be satisfied. changing radosgw to time out on these requests would be complicated, but i agree that it's worth thinking about

Also available in: Atom PDF