Actions
Bug #45120
closedcephadm: adopt prometheus doesn't work
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
The cephadm adopt prometheus command has two major problems:
1/ the etc/prometheus directory in the destination directory isn't created before trying to copy the prometheus configuration from /etc/prometheus
# cephadm --verbose adopt --cluster ceph --skip-pull --style legacy --name prometheus.cephaio-1 DEBUG:cephadm:Acquiring lock 140215559093384 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:Lock 140215559093384 acquired on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:['/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=prom/prometheus:latest', '-e', 'NODE_NAME=cephaio-1', '--entrypoint', 'stat', 'prom/prometheus:latest', '-c', '%u %g', '/etc/prometheus'] DEBUG:cephadm:Running command: /bin/podman run --rm --net=host -e CONTAINER_IMAGE=prom/prometheus:latest -e NODE_NAME=cephaio-1 --entrypoint stat prom/prometheus:latest -c %u %g /etc/prometheus DEBUG:cephadm:stat:stdout 65534 65534 DEBUG:cephadm:Running command: systemctl is-enabled prometheus DEBUG:cephadm:systemctl:stdout disabled DEBUG:cephadm:Running command: systemctl is-active prometheus DEBUG:cephadm:systemctl:stdout inactive DEBUG:cephadm:copy file '//etc/prometheus/prometheus.yml' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus' Traceback (most recent call last): File "/sbin/cephadm", line 4282, in <module> r = args.func() File "/sbin/cephadm", line 972, in _default_image return func() File "/sbin/cephadm", line 2918, in command_adopt command_adopt_prometheus(daemon_id, fsid) File "/sbin/cephadm", line 3064, in command_adopt_prometheus copy_files([config_src], config_dst, uid=uid, gid=gid) File "/sbin/cephadm", line 1088, in copy_files shutil.copyfile(src_file, dst_file) File "/usr/lib64/python3.6/shutil.py", line 121, in copyfile with open(dst, 'wb') as fdst: FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus' DEBUG:cephadm:Releasing lock 140215559093384 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:Lock 140215559093384 released on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
2/ There's no metrics directory in the prometheus data directory. It doesn't exist at all in prometheus so the data copy is failing too
# cephadm --verbose adopt --cluster ceph --skip-pull --style legacy --name prometheus.cephaio-1 DEBUG:cephadm:Acquiring lock 139769986456712 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:Lock 139769986456712 acquired on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:['/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=prom/prometheus:latest', '-e', 'NODE_NAME=cephaio-1', '--entrypoint', 'stat', 'prom/prometheus:latest', '-c', '%u %g', '/etc/prometheus'] DEBUG:cephadm:Running command: /bin/podman run --rm --net=host -e CONTAINER_IMAGE=prom/prometheus:latest -e NODE_NAME=cephaio-1 --entrypoint stat prom/prometheus:latest -c %u %g /etc/prometheus DEBUG:cephadm:stat:stdout 65534 65534 DEBUG:cephadm:Running command: systemctl is-enabled prometheus DEBUG:cephadm:systemctl:stdout enabled DEBUG:cephadm:Running command: systemctl is-active prometheus DEBUG:cephadm:systemctl:stdout active INFO:cephadm:Stopping old systemd unit prometheus... DEBUG:cephadm:Running command: systemctl stop prometheus INFO:cephadm:Disabling old systemd unit prometheus... DEBUG:cephadm:Running command: systemctl disable prometheus DEBUG:cephadm:systemctl:stderr Removed /etc/systemd/system/multi-user.target.wants/prometheus.service. DEBUG:cephadm:copy file '//etc/prometheus/prometheus.yml' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus/prometheus.yml' DEBUG:cephadm:chown 65534:65534 '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus/prometheus.yml' DEBUG:cephadm:copy directory '//var/lib/prometheus/metrics' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/data' Traceback (most recent call last): File "/sbin/cephadm", line 4283, in <module> r = args.func() File "/sbin/cephadm", line 972, in _default_image return func() File "/sbin/cephadm", line 2918, in command_adopt command_adopt_prometheus(daemon_id, fsid) File "/sbin/cephadm", line 3071, in command_adopt_prometheus copy_tree([data_src], data_dst, uid=uid, gid=gid) File "/sbin/cephadm", line 1064, in copy_tree shutil.copytree(src_dir, dst_dir) # dirs_exist_ok needs python 3.8 File "/usr/lib64/python3.6/shutil.py", line 315, in copytree names = os.listdir(src) FileNotFoundError: [Errno 2] No such file or directory: '//var/lib/prometheus/metrics' DEBUG:cephadm:Releasing lock 139769986456712 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:Lock 139769986456712 released on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
The prometheus data are stored in /var/lib/prometheus.
# ls -lh /var/lib/prometheus/ total 20K drwxr-xr-x. 3 nobody nobody 68 Apr 16 19:36 01E5ZM7AT2DS1NTTQT5V6PB4TE drwxr-xr-x. 3 nobody nobody 68 Apr 16 19:36 01E5ZT8DFWV9CA8HPVEC5M3WQV drwxr-xr-x. 3 nobody nobody 68 Apr 16 19:37 01E627X85KN1MXJH247EX33AAJ -rw-r--r--. 1 nobody nobody 0 Apr 15 16:14 lock -rw-r--r--. 1 nobody nobody 20K Apr 16 19:41 queries.active drwxr-xr-x. 3 nobody nobody 95 Apr 16 19:41 wal
/metrics is only an endpoint from the REST API
Actions