Bug #45120
closedcephadm: adopt prometheus doesn't work
0%
Description
The cephadm adopt prometheus command has two major problems:
1/ the etc/prometheus directory in the destination directory isn't created before trying to copy the prometheus configuration from /etc/prometheus
# cephadm --verbose adopt --cluster ceph --skip-pull --style legacy --name prometheus.cephaio-1 DEBUG:cephadm:Acquiring lock 140215559093384 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:Lock 140215559093384 acquired on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:['/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=prom/prometheus:latest', '-e', 'NODE_NAME=cephaio-1', '--entrypoint', 'stat', 'prom/prometheus:latest', '-c', '%u %g', '/etc/prometheus'] DEBUG:cephadm:Running command: /bin/podman run --rm --net=host -e CONTAINER_IMAGE=prom/prometheus:latest -e NODE_NAME=cephaio-1 --entrypoint stat prom/prometheus:latest -c %u %g /etc/prometheus DEBUG:cephadm:stat:stdout 65534 65534 DEBUG:cephadm:Running command: systemctl is-enabled prometheus DEBUG:cephadm:systemctl:stdout disabled DEBUG:cephadm:Running command: systemctl is-active prometheus DEBUG:cephadm:systemctl:stdout inactive DEBUG:cephadm:copy file '//etc/prometheus/prometheus.yml' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus' Traceback (most recent call last): File "/sbin/cephadm", line 4282, in <module> r = args.func() File "/sbin/cephadm", line 972, in _default_image return func() File "/sbin/cephadm", line 2918, in command_adopt command_adopt_prometheus(daemon_id, fsid) File "/sbin/cephadm", line 3064, in command_adopt_prometheus copy_files([config_src], config_dst, uid=uid, gid=gid) File "/sbin/cephadm", line 1088, in copy_files shutil.copyfile(src_file, dst_file) File "/usr/lib64/python3.6/shutil.py", line 121, in copyfile with open(dst, 'wb') as fdst: FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus' DEBUG:cephadm:Releasing lock 140215559093384 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:Lock 140215559093384 released on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
2/ There's no metrics directory in the prometheus data directory. It doesn't exist at all in prometheus so the data copy is failing too
# cephadm --verbose adopt --cluster ceph --skip-pull --style legacy --name prometheus.cephaio-1 DEBUG:cephadm:Acquiring lock 139769986456712 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:Lock 139769986456712 acquired on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:['/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=prom/prometheus:latest', '-e', 'NODE_NAME=cephaio-1', '--entrypoint', 'stat', 'prom/prometheus:latest', '-c', '%u %g', '/etc/prometheus'] DEBUG:cephadm:Running command: /bin/podman run --rm --net=host -e CONTAINER_IMAGE=prom/prometheus:latest -e NODE_NAME=cephaio-1 --entrypoint stat prom/prometheus:latest -c %u %g /etc/prometheus DEBUG:cephadm:stat:stdout 65534 65534 DEBUG:cephadm:Running command: systemctl is-enabled prometheus DEBUG:cephadm:systemctl:stdout enabled DEBUG:cephadm:Running command: systemctl is-active prometheus DEBUG:cephadm:systemctl:stdout active INFO:cephadm:Stopping old systemd unit prometheus... DEBUG:cephadm:Running command: systemctl stop prometheus INFO:cephadm:Disabling old systemd unit prometheus... DEBUG:cephadm:Running command: systemctl disable prometheus DEBUG:cephadm:systemctl:stderr Removed /etc/systemd/system/multi-user.target.wants/prometheus.service. DEBUG:cephadm:copy file '//etc/prometheus/prometheus.yml' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus/prometheus.yml' DEBUG:cephadm:chown 65534:65534 '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus/prometheus.yml' DEBUG:cephadm:copy directory '//var/lib/prometheus/metrics' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/data' Traceback (most recent call last): File "/sbin/cephadm", line 4283, in <module> r = args.func() File "/sbin/cephadm", line 972, in _default_image return func() File "/sbin/cephadm", line 2918, in command_adopt command_adopt_prometheus(daemon_id, fsid) File "/sbin/cephadm", line 3071, in command_adopt_prometheus copy_tree([data_src], data_dst, uid=uid, gid=gid) File "/sbin/cephadm", line 1064, in copy_tree shutil.copytree(src_dir, dst_dir) # dirs_exist_ok needs python 3.8 File "/usr/lib64/python3.6/shutil.py", line 315, in copytree names = os.listdir(src) FileNotFoundError: [Errno 2] No such file or directory: '//var/lib/prometheus/metrics' DEBUG:cephadm:Releasing lock 139769986456712 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock DEBUG:cephadm:Lock 139769986456712 released on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
The prometheus data are stored in /var/lib/prometheus.
# ls -lh /var/lib/prometheus/ total 20K drwxr-xr-x. 3 nobody nobody 68 Apr 16 19:36 01E5ZM7AT2DS1NTTQT5V6PB4TE drwxr-xr-x. 3 nobody nobody 68 Apr 16 19:36 01E5ZT8DFWV9CA8HPVEC5M3WQV drwxr-xr-x. 3 nobody nobody 68 Apr 16 19:37 01E627X85KN1MXJH247EX33AAJ -rw-r--r--. 1 nobody nobody 0 Apr 15 16:14 lock -rw-r--r--. 1 nobody nobody 20K Apr 16 19:41 queries.active drwxr-xr-x. 3 nobody nobody 95 Apr 16 19:41 wal
/metrics is only an endpoint from the REST API
Updated by Michael Fritch about 4 years ago
This path appears to be platform and/or config dependent based the '--storage.tsdb.path' argument.
Debian uses a default of /var/lib/prometheus/metrics2/, whereas SUSE uses a default of /var/lib/prometheus/metrics/ ...
On SUSE, I have the following dir structure:
# ls -lh /var/lib/prometheus/ total 12K drwxr-x--- 2 prometheus prometheus 4.0K Apr 17 23:39 alertmanager drwxr-x--- 9 prometheus prometheus 4.0K Apr 17 23:01 metrics drwxr-xr-x 2 prometheus prometheus 4.0K Apr 14 06:15 node-exporter # ls -lh /var/lib/prometheus/metrics/ total 28K drwxr-xr-x 3 prometheus prometheus 4.0K Apr 14 22:21 01E5X5NTJSX34VTR5HXNE5DXZG drwxr-xr-x 3 prometheus prometheus 4.0K Apr 15 17:00 01E5Z5NJK2CA1G288M6B8AQB70 drwxr-xr-x 3 prometheus prometheus 4.0K Apr 16 11:00 01E613FKKWFKY4SW20HY51RYFQ drwxr-xr-x 3 prometheus prometheus 4.0K Apr 17 05:00 01E6318X5609WKPBDTG8XCVH9H drwxr-xr-x 3 prometheus prometheus 4.0K Apr 17 23:00 01E64Z1VX9XTZZW9J3Z0XJ58V2 drwxr-xr-x 3 prometheus prometheus 4.0K Apr 17 23:00 01E64Z30ZZHGWN0VRYDK2K8ZE6 -rw-r--r-- 1 prometheus prometheus 0 Apr 14 19:16 lock drwxr-xr-x 3 prometheus prometheus 4.0K Apr 17 17:46 wal
Updated by Dimitri Savineau about 4 years ago
So we should have something to address issue 2/ since the prometheus data directory depends on the OS.
I'll update the patch to address 1/ without the data directory. This should be handle differently
Updated by Sebastian Wagner about 4 years ago
- Status changed from New to Fix Under Review
Updated by Sebastian Wagner almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Sebastian Wagner almost 4 years ago
- Status changed from Pending Backport to Resolved
- Target version set to v15.2.2