Bug #42925: Radosgw threads hangs indefinitely when a PG goes inactive - rgw - Ceph

Actions

Copy link

Bug #42925

open

Radosgw threads hangs indefinitely when a PG goes inactive

Added by Biswajeet Patra over 4 years ago. Updated over 4 years ago.

Status:

Triaged

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v0.52a

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

As per the current radosgw behaviour, for any read request a blocking call is made to the osd to fetch the object. But, in certain scenarios when the object is part of a PG that has gone inactive due to any reason (either all osds down for the object or osd count less than min_size), the radosgw thread hangs indefinitely waiting for the PG to become active. With multiple similar requests, all the radosgw threads gets exhausted soon and rgw is not able to serve any client requests which may have been targeted for active PGs. This creates a complete service unavailability.

P.S: This issue is faced in luminous version by us, although it could be reproduced in the master branch as well.

Actions

Copy link

Updated by Casey Bodley over 4 years ago

Status changed from New to Triaged

Actions

Copy link

Updated by Casey Bodley over 4 years ago

yeah, this needs some higher-level discussion and was raised on the ceph devel mailing list. radosgw calls into librados for osd requests, and librados will block indefinitely until a request can be satisfied. changing radosgw to time out on these requests would be complicated, but i agree that it's worth thinking about

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #42925

Radosgw threads hangs indefinitely when a PG goes inactive

Updated by Casey Bodley over 4 years ago

Updated by Casey Bodley over 4 years ago