Project

General

Profile

Documentation #6142

Ceph needs mor than 32k pids

Added by Niklas Goerke about 6 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
08/28/2013
Due date:
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

I kinda painfully discovered that one of my Hosts with 45 OSDs on it spawned 1.4 Million threads when starting it into a recovering cluster.
About 33k of those threads are persistent which is more than the default 32k pids a linux box provides.

In my opinion the documentation should contain a note that the amount of pids should be increased:

#sysctl -w kernel.pid_max=4194303
or persistently:
put
kernel.pid_max = 4194303
into /etc/sysctl.conf

(4194303 is the maximum possible)

Associated revisions

Revision 7948e13b (diff)
Added by John Wilkins about 5 years ago

doc: Added sysctl max thread count discussion.

Fixes: #6142

Signed-off-by: John Wilkins <>

History

#1 Updated by Sage Weil about 5 years ago

  • Assignee set to John Wilkins
  • Priority changed from Low to High

John, not sure where this should go in the doc structure...

#2 Updated by Warren Wang about 5 years ago

This is a critical change for denser hardware and more threads allocated per OSD. Can we get a message into ceph-deploy as well? Perhaps upon the addition of OSDs over a certain number? Open to suggestions.

#3 Updated by David Moreau Simard about 5 years ago

FWIW there might be a bug to extract out of this. Adding this just for cross-reference: http://lists.openstack.org/pipermail/openstack-operators/2014-August/005015.html

#4 Updated by John Wilkins about 5 years ago

  • Status changed from New to In Progress

#5 Updated by John Wilkins about 5 years ago

  • Assignee changed from John Wilkins to Alfredo Deza

Added commentary in Hardware section and in troubleshooting.

http://ceph.com/docs/master/start/hardware-recommendations/#additional-considerations
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#an-osd-won-t-start

Alfredo,

There is a note here suggesting that ceph-deploy notifies a user if the number of OSDs per node exceeds n#. That is, to suggest increasing the max threadcount.

#6 Updated by Alfredo Deza about 5 years ago

Adding a warning if deploying more than N OSDs into a single host sounds entirely reasonable to me and easy to add to ceph-deploy.

What would that number be though? Is anything greater than 20 OK?

#7 Updated by Warren Wang about 5 years ago

Greater than 20 is a safe number. Have not yet seen this issue on a host with 24 OSDs.

#9 Updated by Alfredo Deza about 5 years ago

  • Status changed from In Progress to Resolved

merged commit 73fdc7b into ceph:master

Also available in: Atom PDF