Bug #40692
Ceph daemons failing to start when large unix groups exist
0%
Description
While tracking down this [1] error I found where the error came from in the [2] code and looked into the getgrnam_r function. The problem seems to be well outlined [3] here and is that large unix groups can easily over-use the buffer and error out. Most of my Ceph hosts have the ceph group above these large groups, but on one of them the ceph group got placed below them and this error showed up. The recommendation on that website is to check for this error and re-attempt the call with a larger buffer. I did move the ceph line in /etc/group up above the large group lines and the daemons started successfully.
Additional note. I discovered this while running 12.2.12, but this code is unchanged in master.
[1] unable to look up group 'ceph': (34) Numerical result out of range
[2] https://github.com/ceph/ceph/blob/8e8db703172fc9bccd96b7de344d6a7d761b7862/src/global/global_init.cc#L246-L261
[3] https://tomlee.co/2012/10/problems-with-large-linux-unix-groups-and-getgrgid_r-getgrnam_r/
History
#1 Updated by Greg Farnum over 4 years ago
- Project changed from Ceph to RADOS