Bug #47745: cephadm: adopt {prometheus,grafana,alertmanager} fails with "RuntimeError: uid/gid not found" - Orchestrator - Ceph

Actions

Copy link

Bug #47745

closed

cephadm: adopt {prometheus,grafana,alertmanager} fails with "RuntimeError: uid/gid not found"

Added by Tim Serong over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Tim Serong

Category:

cephadm/monitoring

Target version:

% Done:

Source:

Tags:

Backport:

octopus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

37542

Crash signature (v1):

Crash signature (v2):

Description

This is essentially the same problem as https://tracker.ceph.com/issues/46398, but occurs when adopting prometheus/grafana/alertmanager, as opposed to when initially deploying those services:

# cephadm --image 192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0 adopt --style=legacy --name prometheus.$(hostname)
INFO:cephadm:Pulling container image 192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0...
INFO:cephadm:Non-zero exit code 1 from /usr/bin/podman run --rm --net=host --ipc=host -e CONTAINER_IMAGE=192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0 -e NODE_NAME=admin --entrypoint stat 192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0 -c %u %g /var/lib/ceph
INFO:cephadm:stat:stderr stat: cannot stat '/var/lib/ceph': No such file or directory
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 5758, in <module>
    r = args.func()
  File "/usr/sbin/cephadm", line 1213, in _default_image
    return func()
  File "/usr/sbin/cephadm", line 3648, in command_adopt
    command_adopt_prometheus(daemon_id, fsid)
  File "/usr/sbin/cephadm", line 3874, in command_adopt_prometheus
    copy_files([config_src], config_dst, uid=uid, gid=gid)
  File "/usr/sbin/cephadm", line 1332, in copy_files
    (uid, gid) = extract_uid_gid()
  File "/usr/sbin/cephadm", line 1864, in extract_uid_gid
    raise RuntimeError('uid/gid not found')
RuntimeError: uid/gid not found

The problem here is that command_adopt_prometheus() correctly calls extract_uid_gid_monitoring(daemon_type), but if the uid and gid returned are 0 (i.e. the root user), the subsequent call to copy_files() checks if not uid or not gid (which of course matches, because those values are 0), then calls extract_uid_gid(), which tries to stat /var/lib/ceph, which of course doesn't exist inside the monitoring container images. The fix is to change this to if uid is None or gid is None.

Related issues 1 (0 open — 1 closed)