Bug #48115
closedrgw stops responding correctly, error log full of messages: ceph s3:get_obj Scheduling request failed with -2218
0%
Description
rgw stops responding correctly, error log full of messages: ceph s3:get_obj Scheduling request failed with -2218.
After restarting rgw things go back to normal.
Using package debian: 15.2.5-1~bpo10+1
Example log entry:
2020-11-03T10:32:33.799+0000 7f129a267700 1 ====== starting new request req=0x7f11d90c6680 =====
2020-11-03T10:32:33.799+0000 7f129a267700 0 req 5594040 0s s3:get_obj Scheduling request failed with 2218>ERRORHANDLER: err_no=-2218 new_err_no=-2218
2020-11-03T10:32:33.799+0000 7f129a267700 1 op
2020-11-03T10:32:33.799+0000 7f129a267700 1 ====== req done req=0x7f11d90c6680 op status=0 http_status=503 latency=0s ======
2020-11-03T10:32:33.799+0000 7f129a267700 1 beast: 0x7f11d90c6680: XXX.XXX.XXX.XXX - - [2020-11-03T10:32:33.799466+0000] "HEAD /xxxxxxxxxxx HTTP/1.1" 503 0 - "aws-sdk-go/1.25.43 (go1.14.2; linux; amd64)" -
impact to all URLs, all connected hosts. Other rgw running as part of the cluster were operating normally.
I'm not sure what else is needed to debug this issue. Please advise.
Some of the OSD were unstable (offline briefly) around the time of this issue. Not sure if it is related.
Updated by Or Friedmann over 3 years ago
Hi,
2218 is ERR_RATE_LIMITED, by default each rgw can handle up to 1000, configured by this parameter rgw max_concurrent_requests.
RGW will keep requests until complete them, so if the osds do not respond, the requests are waiting to complete (so it's related to the inactive pgs)