Project

General

Profile

Bug #21929

Default kernel.pid_max is easily exceeded during recovery on high OSD-count system

Added by David Disseldorp almost 2 years ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
Start date:
10/25/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

For CONFIG_BASE_FULL Linux kernels, the maximum number of process / thread IDs is set to 32768 by default. This default limit can be quite easily hit during recovery on nodes with high OSD counts.

Changing this limit is already recommended in Ceph documentation http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#an-osd-won-t-start , but this would be better done automatically via sysctl.d.


Related issues

Copied to devops - Backport #22194: luminous: Default kernel.pid_max is easily exceeded during recovery on high OSD-count system Resolved
Copied to devops - Backport #22195: jewel: Default kernel.pid_max is easily exceeded during recovery on high OSD-count system Rejected

History

#2 Updated by Nathan Cutler almost 2 years ago

  • Status changed from New to Need Review
  • Backport set to luminous

#3 Updated by Nathan Cutler almost 2 years ago

  • Backport changed from luminous to luminous, jewel

#4 Updated by Nathan Cutler almost 2 years ago

Note: related to https://github.com/ceph/ceph/pull/17894 which was backported to luminous by https://github.com/ceph/ceph/pull/18540

#5 Updated by Kefu Chai almost 2 years ago

  • Status changed from Need Review to Pending Backport

#6 Updated by Nathan Cutler almost 2 years ago

  • Copied to Backport #22194: luminous: Default kernel.pid_max is easily exceeded during recovery on high OSD-count system added

#7 Updated by Nathan Cutler almost 2 years ago

  • Copied to Backport #22195: jewel: Default kernel.pid_max is easily exceeded during recovery on high OSD-count system added

#8 Updated by Nathan Cutler 10 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF