Documentation #6142

Ceph needs mor than 32k pids

Added by Niklas Goerke almost 7 years ago. Updated over 5 years ago.

Target version:
% Done:


Affected Versions:
Pull request ID:


I kinda painfully discovered that one of my Hosts with 45 OSDs on it spawned 1.4 Million threads when starting it into a recovering cluster.
About 33k of those threads are persistent which is more than the default 32k pids a linux box provides.

In my opinion the documentation should contain a note that the amount of pids should be increased:

#sysctl -w kernel.pid_max=4194303
or persistently:
kernel.pid_max = 4194303
into /etc/sysctl.conf

(4194303 is the maximum possible)

Associated revisions

Revision 7948e13b (diff)
Added by John Wilkins almost 6 years ago

doc: Added sysctl max thread count discussion.

Fixes: #6142

Signed-off-by: John Wilkins <>


#1 Updated by Sage Weil almost 6 years ago

  • Assignee set to John Wilkins
  • Priority changed from Low to High

John, not sure where this should go in the doc structure...

#2 Updated by Warren Wang almost 6 years ago

This is a critical change for denser hardware and more threads allocated per OSD. Can we get a message into ceph-deploy as well? Perhaps upon the addition of OSDs over a certain number? Open to suggestions.

#3 Updated by David Moreau Simard almost 6 years ago

FWIW there might be a bug to extract out of this. Adding this just for cross-reference:

#4 Updated by John Wilkins almost 6 years ago

  • Status changed from New to In Progress

#5 Updated by John Wilkins almost 6 years ago

  • Assignee changed from John Wilkins to Alfredo Deza

Added commentary in Hardware section and in troubleshooting.


There is a note here suggesting that ceph-deploy notifies a user if the number of OSDs per node exceeds n#. That is, to suggest increasing the max threadcount.

#6 Updated by Alfredo Deza almost 6 years ago

Adding a warning if deploying more than N OSDs into a single host sounds entirely reasonable to me and easy to add to ceph-deploy.

What would that number be though? Is anything greater than 20 OK?

#7 Updated by Warren Wang almost 6 years ago

Greater than 20 is a safe number. Have not yet seen this issue on a host with 24 OSDs.

#9 Updated by Alfredo Deza over 5 years ago

  • Status changed from In Progress to Resolved

merged commit 73fdc7b into ceph:master

Also available in: Atom PDF