Project

General

Profile

Bug #40692

Ceph daemons failing to start when large unix groups exist

Added by David Turner over 4 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While tracking down this [1] error I found where the error came from in the [2] code and looked into the getgrnam_r function. The problem seems to be well outlined [3] here and is that large unix groups can easily over-use the buffer and error out. Most of my Ceph hosts have the ceph group above these large groups, but on one of them the ceph group got placed below them and this error showed up. The recommendation on that website is to check for this error and re-attempt the call with a larger buffer. I did move the ceph line in /etc/group up above the large group lines and the daemons started successfully.

Additional note. I discovered this while running 12.2.12, but this code is unchanged in master.

[1] unable to look up group 'ceph': (34) Numerical result out of range
[2] https://github.com/ceph/ceph/blob/8e8db703172fc9bccd96b7de344d6a7d761b7862/src/global/global_init.cc#L246-L261
[3] https://tomlee.co/2012/10/problems-with-large-linux-unix-groups-and-getgrgid_r-getgrnam_r/

History

#1 Updated by Greg Farnum over 4 years ago

  • Project changed from Ceph to RADOS

Also available in: Atom PDF