Bug #48115: rgw stops responding correctly, error log full of messages: ceph s3:get_obj Scheduling request failed with -2218 - rgw - Ceph

Actions

Copy link

Bug #48115

closed

rgw stops responding correctly, error log full of messages: ceph s3:get_obj Scheduling request failed with -2218

Added by Matthew Darwin over 3 years ago. Updated over 3 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.5

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

rgw stops responding correctly, error log full of messages: ceph s3:get_obj Scheduling request failed with -2218.

After restarting rgw things go back to normal.

Using package debian: 15.2.5-1~bpo10+1

Example log entry:

2020-11-03T10:32:33.799+0000 7f129a267700 1 ====== starting new request req=0x7f11d90c6680 =====
2020-11-03T10:32:33.799+0000 7f129a267700 0 req 5594040 0s s3:get_obj Scheduling request failed with 2218
2020-11-03T10:32:33.799+0000 7f129a267700 1 op>ERRORHANDLER: err_no=-2218 new_err_no=-2218
2020-11-03T10:32:33.799+0000 7f129a267700 1 ====== req done req=0x7f11d90c6680 op status=0 http_status=503 latency=0s ======
2020-11-03T10:32:33.799+0000 7f129a267700 1 beast: 0x7f11d90c6680: XXX.XXX.XXX.XXX - - [2020-11-03T10:32:33.799466+0000] "HEAD /xxxxxxxxxxx HTTP/1.1" 503 0 - "aws-sdk-go/1.25.43 (go1.14.2; linux; amd64)" -

impact to all URLs, all connected hosts. Other rgw running as part of the cluster were operating normally.

I'm not sure what else is needed to debug this issue. Please advise.

Some of the OSD were unstable (offline briefly) around the time of this issue. Not sure if it is related.

Actions

Copy link

Updated by Or Friedmann over 3 years ago

Hi,

2218 is ERR_RATE_LIMITED, by default each rgw can handle up to 1000, configured by this parameter rgw max_concurrent_requests.
RGW will keep requests until complete them, so if the osds do not respond, the requests are waiting to complete (so it's related to the inactive pgs)

Actions

Copy link

Updated by Casey Bodley over 3 years ago

Status changed from New to Closed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #48115

rgw stops responding correctly, error log full of messages: ceph s3:get_obj Scheduling request failed with -2218

Updated by Or Friedmann over 3 years ago

Updated by Casey Bodley over 3 years ago