Project

General

Profile

Actions

Bug #45120

closed

cephadm: adopt prometheus doesn't work

Added by Dimitri Savineau about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The cephadm adopt prometheus command has two major problems:

1/ the etc/prometheus directory in the destination directory isn't created before trying to copy the prometheus configuration from /etc/prometheus

# cephadm --verbose adopt --cluster ceph --skip-pull --style legacy --name prometheus.cephaio-1
DEBUG:cephadm:Acquiring lock 140215559093384 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:Lock 140215559093384 acquired on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:['/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=prom/prometheus:latest', '-e', 'NODE_NAME=cephaio-1', '--entrypoint', 'stat', 'prom/prometheus:latest', '-c', '%u %g', '/etc/prometheus']
DEBUG:cephadm:Running command: /bin/podman run --rm --net=host -e CONTAINER_IMAGE=prom/prometheus:latest -e NODE_NAME=cephaio-1 --entrypoint stat prom/prometheus:latest -c %u %g /etc/prometheus
DEBUG:cephadm:stat:stdout 65534 65534
DEBUG:cephadm:Running command: systemctl is-enabled prometheus
DEBUG:cephadm:systemctl:stdout disabled
DEBUG:cephadm:Running command: systemctl is-active prometheus
DEBUG:cephadm:systemctl:stdout inactive
DEBUG:cephadm:copy file '//etc/prometheus/prometheus.yml' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus'
Traceback (most recent call last):
  File "/sbin/cephadm", line 4282, in <module>
    r = args.func()
  File "/sbin/cephadm", line 972, in _default_image
    return func()
  File "/sbin/cephadm", line 2918, in command_adopt
    command_adopt_prometheus(daemon_id, fsid)
  File "/sbin/cephadm", line 3064, in command_adopt_prometheus
    copy_files([config_src], config_dst, uid=uid, gid=gid)
  File "/sbin/cephadm", line 1088, in copy_files
    shutil.copyfile(src_file, dst_file)
  File "/usr/lib64/python3.6/shutil.py", line 121, in copyfile
    with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus'
DEBUG:cephadm:Releasing lock 140215559093384 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:Lock 140215559093384 released on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock

2/ There's no metrics directory in the prometheus data directory. It doesn't exist at all in prometheus so the data copy is failing too

# cephadm --verbose adopt --cluster ceph --skip-pull --style legacy --name prometheus.cephaio-1
DEBUG:cephadm:Acquiring lock 139769986456712 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:Lock 139769986456712 acquired on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:['/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=prom/prometheus:latest', '-e', 'NODE_NAME=cephaio-1', '--entrypoint', 'stat', 'prom/prometheus:latest', '-c', '%u %g', '/etc/prometheus']
DEBUG:cephadm:Running command: /bin/podman run --rm --net=host -e CONTAINER_IMAGE=prom/prometheus:latest -e NODE_NAME=cephaio-1 --entrypoint stat prom/prometheus:latest -c %u %g /etc/prometheus
DEBUG:cephadm:stat:stdout 65534 65534
DEBUG:cephadm:Running command: systemctl is-enabled prometheus
DEBUG:cephadm:systemctl:stdout enabled
DEBUG:cephadm:Running command: systemctl is-active prometheus
DEBUG:cephadm:systemctl:stdout active
INFO:cephadm:Stopping old systemd unit prometheus...
DEBUG:cephadm:Running command: systemctl stop prometheus
INFO:cephadm:Disabling old systemd unit prometheus...
DEBUG:cephadm:Running command: systemctl disable prometheus
DEBUG:cephadm:systemctl:stderr Removed /etc/systemd/system/multi-user.target.wants/prometheus.service.
DEBUG:cephadm:copy file '//etc/prometheus/prometheus.yml' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus/prometheus.yml'
DEBUG:cephadm:chown 65534:65534 '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus/prometheus.yml'
DEBUG:cephadm:copy directory '//var/lib/prometheus/metrics' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/data'
Traceback (most recent call last):
  File "/sbin/cephadm", line 4283, in <module>
    r = args.func()
  File "/sbin/cephadm", line 972, in _default_image
    return func()
  File "/sbin/cephadm", line 2918, in command_adopt
    command_adopt_prometheus(daemon_id, fsid)
  File "/sbin/cephadm", line 3071, in command_adopt_prometheus
    copy_tree([data_src], data_dst, uid=uid, gid=gid)
  File "/sbin/cephadm", line 1064, in copy_tree
    shutil.copytree(src_dir, dst_dir) # dirs_exist_ok needs python 3.8
  File "/usr/lib64/python3.6/shutil.py", line 315, in copytree
    names = os.listdir(src)
FileNotFoundError: [Errno 2] No such file or directory: '//var/lib/prometheus/metrics'
DEBUG:cephadm:Releasing lock 139769986456712 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:Lock 139769986456712 released on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock

The prometheus data are stored in /var/lib/prometheus.

# ls -lh /var/lib/prometheus/
total 20K
drwxr-xr-x. 3 nobody nobody  68 Apr 16 19:36 01E5ZM7AT2DS1NTTQT5V6PB4TE
drwxr-xr-x. 3 nobody nobody  68 Apr 16 19:36 01E5ZT8DFWV9CA8HPVEC5M3WQV
drwxr-xr-x. 3 nobody nobody  68 Apr 16 19:37 01E627X85KN1MXJH247EX33AAJ
-rw-r--r--. 1 nobody nobody   0 Apr 15 16:14 lock
-rw-r--r--. 1 nobody nobody 20K Apr 16 19:41 queries.active
drwxr-xr-x. 3 nobody nobody  95 Apr 16 19:41 wal

/metrics is only an endpoint from the REST API

Actions

Also available in: Atom PDF