Project

General

Profile

Bug #47745

cephadm: adopt {prometheus,grafana,alertmanager} fails with "RuntimeError: uid/gid not found"

Added by Tim Serong 2 months ago. Updated about 2 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
cephadm/monitoring
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

This is essentially the same problem as https://tracker.ceph.com/issues/46398, but occurs when adopting prometheus/grafana/alertmanager, as opposed to when initially deploying those services:

# cephadm --image 192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0 adopt --style=legacy --name prometheus.$(hostname)
INFO:cephadm:Pulling container image 192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0...
INFO:cephadm:Non-zero exit code 1 from /usr/bin/podman run --rm --net=host --ipc=host -e CONTAINER_IMAGE=192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0 -e NODE_NAME=admin --entrypoint stat 192.168.121.1:5000/caasp/v4.5/prometheus-server:2.18.0 -c %u %g /var/lib/ceph
INFO:cephadm:stat:stderr stat: cannot stat '/var/lib/ceph': No such file or directory
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 5758, in <module>
    r = args.func()
  File "/usr/sbin/cephadm", line 1213, in _default_image
    return func()
  File "/usr/sbin/cephadm", line 3648, in command_adopt
    command_adopt_prometheus(daemon_id, fsid)
  File "/usr/sbin/cephadm", line 3874, in command_adopt_prometheus
    copy_files([config_src], config_dst, uid=uid, gid=gid)
  File "/usr/sbin/cephadm", line 1332, in copy_files
    (uid, gid) = extract_uid_gid()
  File "/usr/sbin/cephadm", line 1864, in extract_uid_gid
    raise RuntimeError('uid/gid not found')
RuntimeError: uid/gid not found

The problem here is that command_adopt_prometheus() correctly calls extract_uid_gid_monitoring(daemon_type), but if the uid and gid returned are 0 (i.e. the root user), the subsequent call to copy_files() checks if not uid or not gid (which of course matches, because those values are 0), then calls extract_uid_gid(), which tries to stat /var/lib/ceph, which of course doesn't exist inside the monitoring container images. The fix is to change this to if uid is None or gid is None.


Related issues

Related to Orchestrator - Bug #46398: cephadm: can't use custom prometheus image Resolved

History

#1 Updated by Tim Serong 2 months ago

  • Related to Bug #46398: cephadm: can't use custom prometheus image added

#2 Updated by Tim Serong 2 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 37542

#3 Updated by Kefu Chai about 2 months ago

  • Status changed from Fix Under Review to Pending Backport

Also available in: Atom PDF