Project

General

Profile

Actions

Fix #20330

closed

msg : tcp backlog parameter increase

Added by red ref almost 7 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On a quite large cluster, during a cold restart phase, I noticed from tcp metrics some "ListenOverflows" and "ListenDrops" from /proc/net/netstat. On the other side, OSD's were failing to join MON's or peers (according to logs). Digging into, I tried to increase sysctl "net.core.somaxconn" kernel parameter (from some tutorials). Unfortunately, this parameter did not solved this issue. According to "ss -ln", nothing changed on the socket side.

Looking at linux documentation, I recompiled my packages modifying "128" hardcoded values from :
- https://github.com/ceph/ceph/blob/master/src/msg/simple/Accepter.cc : line 214
- https://github.com/ceph/ceph/blob/master/src/msg/async/PosixStack.cc : line 325
- https://github.com/ceph/ceph/blob/master/src/msg/async/rdma/RDMAServerSocketImpl.cc : line 59 (I do not use RDMA)

Values from "ss -ln" changed well this time and cluster reboot became smoother.

I think those values might evolve to something greater (> 1024) or even the maximum stock kernel accepts (65535) as it is capped by "somaxconn" value.

Another possibility would be to make it configurable.

Actions #1

Updated by Kefu Chai almost 7 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Haomai Wang
Actions #2

Updated by Sage Weil almost 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions #3

Updated by Greg Farnum about 5 years ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)
Actions

Also available in: Atom PDF