Project

General

Profile

Actions

Bug #57187

open

/var/run/ceph cannot be created due to lack of permission

Added by Zhongzhou Cai over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When I'm installing the ceph ubuntu (focal) image (v16.2.10) via rook(v1.9.9) on GKE cluster, the daemon service (i.e., mon, mgr, osd) is crashlooping due to liveness probe failure `Liveness probe failed: admin_socket: exception getting command descriptions: [Errno 2] No such file or directory`. The admin socket uses `/var/run/ceph` by default, but that run dir doesn't exist. Here's the logs from ceph-mon pod:

```
warning: unable to create /var/run/ceph: (13) Permission denied
debug 2022-08-05T00:38:06.472+0000 7f0960c2c540 -1 asok(0x56213ef7e000) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-mon.a.asok': (2) No such file or directory
```

The error comes from here: https://github.com/ceph/ceph/blob/45fa1a083152e41a408d15505f594ec5f1b4fe17/src/global/global_init.cc#L380-L391. It turns out that we drop privilege before we create `/var/run/ceph`, which explains why the run-dir creation failed. We create the run-dir here (https://github.com/ceph/ceph/blob/45fa1a083152e41a408d15505f594ec5f1b4fe17/src/global/global_init.cc#L380-L391) and drop privilege by calling setuid here (https://github.com/ceph/ceph/blob/45fa1a083152e41a408d15505f594ec5f1b4fe17/src/global/global_init.cc#L316-L325).

Is there a reason why we drop privilege before creating the run directory? Failure to create the run directory may lead to service start failure.

No data to display

Actions

Also available in: Atom PDF