Project

General

Profile

Actions

Bug #58671

closed

Frontend socket leak that leads to OOM when connections are reset

Added by Yixin Jin about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
% Done:

100%

Source:
Tags:
ssl beast backport_processed
Backport:
pacific quincy
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This issue is related to the following type of traffic:
  1. Client uses non-persistent connection (Connection: close).
  2. Client uses SO_LINGER option, which causes it to send RST when the socket is closed. The server side gets ECONNRESET.
  3. Client does HTTPS requests.

As the result, those sockets that got ECONNRESET would still go ahead to call async_shutdown() in the request coroutine because that error isn't propagated all the way up to the coroutine. After handle_connection() call, it thinks everything is OK so it calls async_shutdown() to shutdown SSL stream. However, the call to async_shutdown() never returns since epoll no longer delivers events on this socket after it delivered the one for ECONNRESET or when ECONNRESET is returned upon sendmsg(). So, the call is stuck and we have a socket/coroutine leak. Each coroutine consumes 512KB stack and eventually it will OOM and crash.


Related issues 2 (0 open2 closed)

Copied to rgw - Backport #58769: pacific: Frontend socket leak that leads to OOM when connections are resetResolvedCasey BodleyActions
Copied to rgw - Backport #58770: quincy: Frontend socket leak that leads to OOM when connections are resetResolvedCasey BodleyActions
Actions #1

Updated by Casey Bodley about 1 year ago

  • Priority changed from Normal to High
  • Tags set to ssl beast
  • Backport set to pacific quincy
Actions #2

Updated by Yixin Jin about 1 year ago

It is a lot easier to reproduce if HTTPS traffic is slowed by tc qdisc and S3 client, like s3cmd, is managed by a timer, such as killing it if it hasn't finished download after 20 seconds.

Actions #3

Updated by Casey Bodley about 1 year ago

  • Status changed from New to Fix Under Review
  • Assignee set to Casey Bodley
  • Pull request ID set to 50059
Actions #4

Updated by J. Eric Ivancich about 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Backport Bot about 1 year ago

  • Copied to Backport #58769: pacific: Frontend socket leak that leads to OOM when connections are reset added
Actions #6

Updated by Backport Bot about 1 year ago

  • Copied to Backport #58770: quincy: Frontend socket leak that leads to OOM when connections are reset added
Actions #7

Updated by Backport Bot about 1 year ago

  • Tags changed from ssl beast to ssl beast backport_processed
Actions #8

Updated by Konstantin Shalygin about 1 year ago

  • Status changed from Pending Backport to Resolved
  • Target version set to v18.0.0
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF