Project

General

Profile

Actions

Bug #45120

closed

cephadm: adopt prometheus doesn't work

Added by Dimitri Savineau about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The cephadm adopt prometheus command has two major problems:

1/ the etc/prometheus directory in the destination directory isn't created before trying to copy the prometheus configuration from /etc/prometheus

# cephadm --verbose adopt --cluster ceph --skip-pull --style legacy --name prometheus.cephaio-1
DEBUG:cephadm:Acquiring lock 140215559093384 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:Lock 140215559093384 acquired on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:['/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=prom/prometheus:latest', '-e', 'NODE_NAME=cephaio-1', '--entrypoint', 'stat', 'prom/prometheus:latest', '-c', '%u %g', '/etc/prometheus']
DEBUG:cephadm:Running command: /bin/podman run --rm --net=host -e CONTAINER_IMAGE=prom/prometheus:latest -e NODE_NAME=cephaio-1 --entrypoint stat prom/prometheus:latest -c %u %g /etc/prometheus
DEBUG:cephadm:stat:stdout 65534 65534
DEBUG:cephadm:Running command: systemctl is-enabled prometheus
DEBUG:cephadm:systemctl:stdout disabled
DEBUG:cephadm:Running command: systemctl is-active prometheus
DEBUG:cephadm:systemctl:stdout inactive
DEBUG:cephadm:copy file '//etc/prometheus/prometheus.yml' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus'
Traceback (most recent call last):
  File "/sbin/cephadm", line 4282, in <module>
    r = args.func()
  File "/sbin/cephadm", line 972, in _default_image
    return func()
  File "/sbin/cephadm", line 2918, in command_adopt
    command_adopt_prometheus(daemon_id, fsid)
  File "/sbin/cephadm", line 3064, in command_adopt_prometheus
    copy_files([config_src], config_dst, uid=uid, gid=gid)
  File "/sbin/cephadm", line 1088, in copy_files
    shutil.copyfile(src_file, dst_file)
  File "/usr/lib64/python3.6/shutil.py", line 121, in copyfile
    with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus'
DEBUG:cephadm:Releasing lock 140215559093384 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:Lock 140215559093384 released on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock

2/ There's no metrics directory in the prometheus data directory. It doesn't exist at all in prometheus so the data copy is failing too

# cephadm --verbose adopt --cluster ceph --skip-pull --style legacy --name prometheus.cephaio-1
DEBUG:cephadm:Acquiring lock 139769986456712 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:Lock 139769986456712 acquired on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:['/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=prom/prometheus:latest', '-e', 'NODE_NAME=cephaio-1', '--entrypoint', 'stat', 'prom/prometheus:latest', '-c', '%u %g', '/etc/prometheus']
DEBUG:cephadm:Running command: /bin/podman run --rm --net=host -e CONTAINER_IMAGE=prom/prometheus:latest -e NODE_NAME=cephaio-1 --entrypoint stat prom/prometheus:latest -c %u %g /etc/prometheus
DEBUG:cephadm:stat:stdout 65534 65534
DEBUG:cephadm:Running command: systemctl is-enabled prometheus
DEBUG:cephadm:systemctl:stdout enabled
DEBUG:cephadm:Running command: systemctl is-active prometheus
DEBUG:cephadm:systemctl:stdout active
INFO:cephadm:Stopping old systemd unit prometheus...
DEBUG:cephadm:Running command: systemctl stop prometheus
INFO:cephadm:Disabling old systemd unit prometheus...
DEBUG:cephadm:Running command: systemctl disable prometheus
DEBUG:cephadm:systemctl:stderr Removed /etc/systemd/system/multi-user.target.wants/prometheus.service.
DEBUG:cephadm:copy file '//etc/prometheus/prometheus.yml' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus/prometheus.yml'
DEBUG:cephadm:chown 65534:65534 '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/etc/prometheus/prometheus.yml'
DEBUG:cephadm:copy directory '//var/lib/prometheus/metrics' -> '/var/lib/ceph/35ed6b2b-7a8b-4f78-82cb-83ba3630098c/prometheus.cephaio-1/data'
Traceback (most recent call last):
  File "/sbin/cephadm", line 4283, in <module>
    r = args.func()
  File "/sbin/cephadm", line 972, in _default_image
    return func()
  File "/sbin/cephadm", line 2918, in command_adopt
    command_adopt_prometheus(daemon_id, fsid)
  File "/sbin/cephadm", line 3071, in command_adopt_prometheus
    copy_tree([data_src], data_dst, uid=uid, gid=gid)
  File "/sbin/cephadm", line 1064, in copy_tree
    shutil.copytree(src_dir, dst_dir) # dirs_exist_ok needs python 3.8
  File "/usr/lib64/python3.6/shutil.py", line 315, in copytree
    names = os.listdir(src)
FileNotFoundError: [Errno 2] No such file or directory: '//var/lib/prometheus/metrics'
DEBUG:cephadm:Releasing lock 139769986456712 on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock
DEBUG:cephadm:Lock 139769986456712 released on /run/cephadm/35ed6b2b-7a8b-4f78-82cb-83ba3630098c.lock

The prometheus data are stored in /var/lib/prometheus.

# ls -lh /var/lib/prometheus/
total 20K
drwxr-xr-x. 3 nobody nobody  68 Apr 16 19:36 01E5ZM7AT2DS1NTTQT5V6PB4TE
drwxr-xr-x. 3 nobody nobody  68 Apr 16 19:36 01E5ZT8DFWV9CA8HPVEC5M3WQV
drwxr-xr-x. 3 nobody nobody  68 Apr 16 19:37 01E627X85KN1MXJH247EX33AAJ
-rw-r--r--. 1 nobody nobody   0 Apr 15 16:14 lock
-rw-r--r--. 1 nobody nobody 20K Apr 16 19:41 queries.active
drwxr-xr-x. 3 nobody nobody  95 Apr 16 19:41 wal

/metrics is only an endpoint from the REST API

Actions #1

Updated by Michael Fritch about 4 years ago

This path appears to be platform and/or config dependent based the '--storage.tsdb.path' argument.

Debian uses a default of /var/lib/prometheus/metrics2/, whereas SUSE uses a default of /var/lib/prometheus/metrics/ ...


On SUSE, I have the following dir structure:

# ls -lh /var/lib/prometheus/
total 12K
drwxr-x--- 2 prometheus prometheus 4.0K Apr 17 23:39 alertmanager
drwxr-x--- 9 prometheus prometheus 4.0K Apr 17 23:01 metrics
drwxr-xr-x 2 prometheus prometheus 4.0K Apr 14 06:15 node-exporter

# ls -lh /var/lib/prometheus/metrics/
total 28K
drwxr-xr-x 3 prometheus prometheus 4.0K Apr 14 22:21 01E5X5NTJSX34VTR5HXNE5DXZG
drwxr-xr-x 3 prometheus prometheus 4.0K Apr 15 17:00 01E5Z5NJK2CA1G288M6B8AQB70
drwxr-xr-x 3 prometheus prometheus 4.0K Apr 16 11:00 01E613FKKWFKY4SW20HY51RYFQ
drwxr-xr-x 3 prometheus prometheus 4.0K Apr 17 05:00 01E6318X5609WKPBDTG8XCVH9H
drwxr-xr-x 3 prometheus prometheus 4.0K Apr 17 23:00 01E64Z1VX9XTZZW9J3Z0XJ58V2
drwxr-xr-x 3 prometheus prometheus 4.0K Apr 17 23:00 01E64Z30ZZHGWN0VRYDK2K8ZE6
-rw-r--r-- 1 prometheus prometheus    0 Apr 14 19:16 lock
drwxr-xr-x 3 prometheus prometheus 4.0K Apr 17 17:46 wal
Actions #2

Updated by Dimitri Savineau about 4 years ago

So we should have something to address issue 2/ since the prometheus data directory depends on the OS.

I'll update the patch to address 1/ without the data directory. This should be handle differently

Actions #3

Updated by Michael Fritch about 4 years ago

  • Pull request ID set to 34600
Actions #4

Updated by Sebastian Wagner about 4 years ago

  • Status changed from New to Fix Under Review
Actions #5

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from Pending Backport to Resolved
  • Target version set to v15.2.2
Actions

Also available in: Atom PDF