Project

General

Profile

Actions

Bug #21929

closed

Default kernel.pid_max is easily exceeded during recovery on high OSD-count system

Added by David Disseldorp over 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

For CONFIG_BASE_FULL Linux kernels, the maximum number of process / thread IDs is set to 32768 by default. This default limit can be quite easily hit during recovery on nodes with high OSD counts.

Changing this limit is already recommended in Ceph documentation http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#an-osd-won-t-start , but this would be better done automatically via sysctl.d.


Related issues 2 (0 open2 closed)

Copied to devops - Backport #22194: luminous: Default kernel.pid_max is easily exceeded during recovery on high OSD-count system ResolvedNathan CutlerActions
Copied to devops - Backport #22195: jewel: Default kernel.pid_max is easily exceeded during recovery on high OSD-count system RejectedActions
Actions #2

Updated by Nathan Cutler over 6 years ago

  • Status changed from New to Fix Under Review
  • Backport set to luminous
Actions #3

Updated by Nathan Cutler over 6 years ago

  • Backport changed from luminous to luminous, jewel
Actions #4

Updated by Nathan Cutler over 6 years ago

Note: related to https://github.com/ceph/ceph/pull/17894 which was backported to luminous by https://github.com/ceph/ceph/pull/18540

Actions #5

Updated by Kefu Chai over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #22194: luminous: Default kernel.pid_max is easily exceeded during recovery on high OSD-count system added
Actions #7

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #22195: jewel: Default kernel.pid_max is easily exceeded during recovery on high OSD-count system added
Actions #8

Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF