Documentation #6142
Ceph needs mor than 32k pids
0%
Description
I kinda painfully discovered that one of my Hosts with 45 OSDs on it spawned 1.4 Million threads when starting it into a recovering cluster.
About 33k of those threads are persistent which is more than the default 32k pids a linux box provides.
In my opinion the documentation should contain a note that the amount of pids should be increased:
#sysctl -w kernel.pid_max=4194303
or persistently:
put
kernel.pid_max = 4194303
into /etc/sysctl.conf
(4194303 is the maximum possible)
Associated revisions
doc: Added sysctl max thread count discussion.
Fixes: #6142
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
History
#1 Updated by Sage Weil over 9 years ago
- Assignee set to John Wilkins
- Priority changed from Low to High
John, not sure where this should go in the doc structure...
#2 Updated by Warren Wang over 9 years ago
This is a critical change for denser hardware and more threads allocated per OSD. Can we get a message into ceph-deploy as well? Perhaps upon the addition of OSDs over a certain number? Open to suggestions.
#3 Updated by David Moreau Simard over 9 years ago
FWIW there might be a bug to extract out of this. Adding this just for cross-reference: http://lists.openstack.org/pipermail/openstack-operators/2014-August/005015.html
#4 Updated by John Wilkins over 9 years ago
- Status changed from New to In Progress
#5 Updated by John Wilkins over 9 years ago
- Assignee changed from John Wilkins to Alfredo Deza
Added commentary in Hardware section and in troubleshooting.
http://ceph.com/docs/master/start/hardware-recommendations/#additional-considerations
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#an-osd-won-t-start
Alfredo,
There is a note here suggesting that ceph-deploy notifies a user if the number of OSDs per node exceeds n#. That is, to suggest increasing the max threadcount.
#6 Updated by Alfredo Deza over 9 years ago
Adding a warning if deploying more than N OSDs into a single host sounds entirely reasonable to me and easy to add to ceph-deploy.
What would that number be though? Is anything greater than 20 OK?
#7 Updated by Warren Wang over 9 years ago
Greater than 20 is a safe number. Have not yet seen this issue on a host with 24 OSDs.
#8 Updated by Alfredo Deza over 9 years ago
#9 Updated by Alfredo Deza over 9 years ago
- Status changed from In Progress to Resolved
merged commit 73fdc7b into ceph:master