Project

General

Profile

Bug #48870

cephadm: Several services in error status after upgrade to 15.2.8: unrecognized arguments: --filter-for-batch

Added by Gunther Heinrich about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I initiated the upgrade to the latest version of Ceph from 15.2.5 (Ubuntu 20.04.1, Podman 2.1.1). At first everything was working fine until "ceph log last cephadm log" reported several errors which all were similar to the following one:

2021-01-13T12:03:17.545094+0000 mgr.iz-ceph-v1-mon-03.ncnoal (mgr.19061289) 534 : cephadm [ERR] cephadm exited with an error code: 1, 
stderr:/usr/bin/podman:stderr usage: ceph-volume inventory [-h] [--format {plain,json,json-pretty}] [path]/usr/bin/podman:stderr ceph-volume inventory: error: unrecognized arguments: --filter-for-batch
Traceback (most recent call last):
  File "<stdin>", line 6112, in <module>
  File "<stdin>", line 1299, in _infer_fsid
  File "<stdin>", line 1382, in _infer_image
  File "<stdin>", line 3612, in command_ceph_volume
  File "<stdin>", line 1061, in call_throws
RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15.2.5 -e NODE_NAME=iz-ceph-v1-mon-03 -v /var/run/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c:/var/run/ceph:z -v /var/log/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c:/var/log/ceph:z -v /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm docker.io/ceph/ceph:v15.2.5 inventory --format=json --filter-for-batch
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1012, in _remote_connection
    yield (conn, connr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1156, in _run_cephadm
    code, '\n'.join(err)))
orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:/usr/bin/podman:stderr usage: ceph-volume inventory [-h] [--format {plain,json,json-pretty}] [path]/usr/bin/podman:stderr ceph-volume inventory: error: unrecognized arguments: --filter-for-batch
Traceback (most recent call last):
  File "<stdin>", line 6112, in <module>
  File "<stdin>", line 1299, in _infer_fsid
  File "<stdin>", line 1382, in _infer_image
  File "<stdin>", line 3612, in command_ceph_volume
  File "<stdin>", line 1061, in call_throws

The upgrade nevertheless seemed to finish and all services were reported by ceph orch ps as running in v15.2.8

Because of the errors during the upgrade I assumed a small inconsistency with podman/ubuntu so I updated all systems via apt to the latest versions and rebooted them, resulting in Podman updated to version 2.2.1.

Since then ceph orch ps reports several services in an error status:

alertmanager.iz-ceph-v1-mon-01       iz-ceph-v1-mon-01  error          4m ago     6M   0.21.0   docker.io/prom/alertmanager:latest   c876f5897d7b  9a771c886b09
crash.iz-ceph-v1-mon-01              iz-ceph-v1-mon-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  93d9a3267a2d
crash.iz-ceph-v1-mon-02              iz-ceph-v1-mon-02  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  574787725a46
crash.iz-ceph-v1-mon-03              iz-ceph-v1-mon-03  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  7f94569f5195
crash.iz-ceph-v1-osd-01              iz-ceph-v1-osd-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  5182be56dc6b
crash.iz-ceph-v1-osd-02              iz-ceph-v1-osd-02  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  9e23c565fc64
crash.iz-ceph-v1-osd-03              iz-ceph-v1-osd-03  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  12ea957024f8
grafana.iz-ceph-v1-mon-01            iz-ceph-v1-mon-01  error          4m ago     6M   6.6.2    docker.io/ceph/ceph-grafana:latest   87a51ecf0b1c  43c6d192a4dd
mds.cephfs.iz-ceph-v1-mon-01.vjehnz  iz-ceph-v1-mon-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  5ad9c3fae67d
mds.cephfs.iz-ceph-v1-mon-02.cbdqzm  iz-ceph-v1-mon-02  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  5e732eb43b59
mds.cephfs.iz-ceph-v1-mon-03.zpxfkg  iz-ceph-v1-mon-03  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  71ed07a47be2
mgr.iz-ceph-v1-mon-01.elswai         iz-ceph-v1-mon-01  error          4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  567b649f7825
mgr.iz-ceph-v1-mon-02.foqmfa         iz-ceph-v1-mon-02  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  5b2014ba51da
mgr.iz-ceph-v1-mon-03.ncnoal         iz-ceph-v1-mon-03  error          4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  115effface96
mon.iz-ceph-v1-mon-01                iz-ceph-v1-mon-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  7fc683227cee
mon.iz-ceph-v1-mon-02                iz-ceph-v1-mon-02  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  01951e4b5a5e
mon.iz-ceph-v1-mon-03                iz-ceph-v1-mon-03  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  ed2e540108f2
node-exporter.iz-ceph-v1-mon-01      iz-ceph-v1-mon-01  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  f8d1e263f5bd
node-exporter.iz-ceph-v1-mon-02      iz-ceph-v1-mon-02  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  deb31c0e96fe
node-exporter.iz-ceph-v1-mon-03      iz-ceph-v1-mon-03  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  803ad1ead652
node-exporter.iz-ceph-v1-osd-01      iz-ceph-v1-osd-01  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  31a86adea0da
node-exporter.iz-ceph-v1-osd-02      iz-ceph-v1-osd-02  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  98aba976cb23
node-exporter.iz-ceph-v1-osd-03      iz-ceph-v1-osd-03  error          4m ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  121aed124f3b
osd.0                                iz-ceph-v1-osd-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  19b8ae70354c
osd.1                                iz-ceph-v1-osd-01  running (16h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  2569a941b756
osd.2                                iz-ceph-v1-osd-02  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  4e0a96f76663
osd.3                                iz-ceph-v1-osd-02  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  7256225f6fd7
osd.4                                iz-ceph-v1-osd-03  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  f86ecc759cee
osd.5                                iz-ceph-v1-osd-03  running (17h)  4m ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  8da57f4da01f
prometheus.iz-ceph-v1-mon-01         iz-ceph-v1-mon-01  error          4m ago     6M   2.19.1   docker.io/prom/prometheus:latest     396dc3b4e717  2a0a8f4c368c

Here are some journalctl log entries from the boot process on the first monitor node which indicate that the containers couldn't be started properly:

Jan 13 14:29:04 iz-ceph-v1-mon-01 bash[1547]: 127.0.0.1 - - [13/Jan/2021:13:29:04] "GET /metrics HTTP/1.1" 200 - "" "Prometheus/2.19.1" 
Jan 13 14:29:05 iz-ceph-v1-mon-01 systemd[1]: Started libpod-conmon-7fc683227cee875c70df5a5bb56d85b42311e37f4faa67d0b9ccf0b870a18794.scope.
Jan 13 14:29:05 iz-ceph-v1-mon-01 systemd[1]: run-runc-7fc683227cee875c70df5a5bb56d85b42311e37f4faa67d0b9ccf0b870a18794-runc.VTU3WA.mount: Succeeded.
Jan 13 14:29:05 iz-ceph-v1-mon-01 podman[3236]: 2021-01-13 14:29:05.518057669 +0100 CET m=+0.199315429 container exec 7fc683227cee875c70df5a5bb56d85b42311e37f4faa67d0b9ccf0b870a18794 (image=docker.io/ceph/ceph:v15.2.8, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mon.iz-ceph-v1-mon-01, org.label-schema.vendor=CentOS, RELEASE=HEAD, CEPH_POINT_RELEASE=-15.2.8, GIT_BRANCH=HEAD, GIT_CLEAN=True, org.label-schema.build-date=20201204, org.label-schema.license=GPLv2, org.label-schema.schema-version=1.0, GIT_COMMIT=4289d4d722f77bb5ddad5e5f98141a2a1c21a48f, GIT_REPO=https://github.com/ceph/ceph-container.git, ceph=True, maintainer=Dimitri Savineau <dsavinea@redhat.com>, org.label-schema.name=CentOS Base Image)
Jan 13 14:29:05 iz-ceph-v1-mon-01 systemd[1]: libpod-conmon-7fc683227cee875c70df5a5bb56d85b42311e37f4faa67d0b9ccf0b870a18794.scope: Succeeded.
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[1]: Started libpod-conmon-43c6d192a4dd2f15b2ccf5f13696f9d1801c48b79b3751259275133bf68f052f.scope.
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[1]: run-runc-43c6d192a4dd2f15b2ccf5f13696f9d1801c48b79b3751259275133bf68f052f-runc.CQ54pQ.mount: Succeeded.
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[3042]: run-runc-43c6d192a4dd2f15b2ccf5f13696f9d1801c48b79b3751259275133bf68f052f-runc.CQ54pQ.mount: Succeeded.
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[2074]: run-runc-43c6d192a4dd2f15b2ccf5f13696f9d1801c48b79b3751259275133bf68f052f-runc.CQ54pQ.mount: Succeeded.
Jan 13 14:29:06 iz-ceph-v1-mon-01 podman[3318]: 2021-01-13 14:29:06.21613319 +0100 CET m=+0.117582684 container exec 43c6d192a4dd2f15b2ccf5f13696f9d1801c48b79b3751259275133bf68f052f (image=docker.io/ceph/ceph-grafana:latest, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-grafana.iz-ceph-v1-mon-01, io.buildah.version=1.14.2, org.label-schema.name=CentOS Base Image, org.label-schema.schema-version=1.0, org.opencontainers.image.vendor=CentOS, description=Ceph Grafana Container, org.label-schema.vendor=CentOS, summary=Grafana Container configured for Ceph mgr/dashboard integration, org.label-schema.license=GPLv2,  org.label-schema.vendor=CentOS, summary=Grafana Container configured for Ceph mgr/dashboard integration, org.label-schema.license=GPLv2, org.label-schema.build-date=20200114, org.opencontainers.image.created=2020-01-14 00:00:00-08:00, org.opencontainers.image.title=CentOS Base Image, maintainer=Paul Cuzner <pcuzner@redhat.com>, org.opencontainers.image.licenses=GPL-2.0-only)
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[1]: libpod-conmon-43c6d192a4dd2f15b2ccf5f13696f9d1801c48b79b3751259275133bf68f052f.scope: Succeeded.
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[1]: Started libpod-conmon-2a0a8f4c368ce4ee486c0a281ccc80c1df1101486c621a30c8ec2c9ea7e39bda.scope.
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[1]: run-runc-2a0a8f4c368ce4ee486c0a281ccc80c1df1101486c621a30c8ec2c9ea7e39bda-runc.AzgeLg.mount: Succeeded.
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[2074]: run-runc-2a0a8f4c368ce4ee486c0a281ccc80c1df1101486c621a30c8ec2c9ea7e39bda-runc.AzgeLg.mount: Succeeded.
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[3042]: run-runc-2a0a8f4c368ce4ee486c0a281ccc80c1df1101486c621a30c8ec2c9ea7e39bda-runc.AzgeLg.mount: Succeeded.
Jan 13 14:29:06 iz-ceph-v1-mon-01 podman[3403]: 2021-01-13 14:29:06.557505103 +0100 CET m=+0.142413228 container exec 2a0a8f4c368ce4ee486c0a281ccc80c1df1101486c621a30c8ec2c9ea7e39bda (image=docker.io/prom/prometheus:latest, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-prometheus.iz-ceph-v1-mon-01, maintainer=The Prometheus Authors <prometheus-developers@googlegroups.com>)
Jan 13 14:29:06 iz-ceph-v1-mon-01 systemd[1]: libpod-conmon-2a0a8f4c368ce4ee486c0a281ccc80c1df1101486c621a30c8ec2c9ea7e39bda.scope: Succeeded.
Jan 13 14:29:07 iz-ceph-v1-mon-01 systemd[1]: Started libpod-conmon-9a771c886b0903e3db045d41d21e7c0df9c126ac3fdd797cf8b60142d4f899c7.scope.
Jan 13 14:29:07 iz-ceph-v1-mon-01 systemd[1]: run-runc-9a771c886b0903e3db045d41d21e7c0df9c126ac3fdd797cf8b60142d4f899c7-runc.ae5tiF.mount: Succeeded.
Jan 13 14:29:07 iz-ceph-v1-mon-01 systemd[3042]: run-runc-9a771c886b0903e3db045d41d21e7c0df9c126ac3fdd797cf8b60142d4f899c7-runc.ae5tiF.mount: Succeeded.
Jan 13 14:29:07 iz-ceph-v1-mon-01 podman[3513]: 2021-01-13 14:29:07.188433234 +0100 CET m=+0.181371954 container exec 9a771c886b0903e3db045d41d21e7c0df9c126ac3fdd797cf8b60142d4f899c7 (image=docker.io/prom/alertmanager:latest, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-alertmanager.iz-ceph-v1-mon-01, maintainer=The Prometheus Authors <prometheus-developers@googlegroups.com>)
Jan 13 14:29:07 iz-ceph-v1-mon-01 systemd[1]: libpod-conmon-9a771c886b0903e3db045d41d21e7c0df9c126ac3fdd797cf8b60142d4f899c7.scope: Succeeded.
Jan 13 14:29:07 iz-ceph-v1-mon-01 systemd[1]: Started libpod-conmon-f8d1e263f5bda2a15fa6b609b11a0b8a8dfb811ea7880db86d4e9a5ba4c62037.scope.
Jan 13 14:29:07 iz-ceph-v1-mon-01 systemd[1]: run-runc-f8d1e263f5bda2a15fa6b609b11a0b8a8dfb811ea7880db86d4e9a5ba4c62037-runc.WjQpx7.mount: Succeeded.
Jan 13 14:29:07 iz-ceph-v1-mon-01 systemd[2074]: run-runc-f8d1e263f5bda2a15fa6b609b11a0b8a8dfb811ea7880db86d4e9a5ba4c62037-runc.WjQpx7.mount: Succeeded.
Jan 13 14:29:07 iz-ceph-v1-mon-01 systemd[3042]: run-runc-f8d1e263f5bda2a15fa6b609b11a0b8a8dfb811ea7880db86d4e9a5ba4c62037-runc.WjQpx7.mount: Succeeded.
Jan 13 14:29:07 iz-ceph-v1-mon-01 podman[3599]: 2021-01-13 14:29:07.526560736 +0100 CET m=+0.144490800 container exec f8d1e263f5bda2a15fa6b609b11a0b8a8dfb811ea7880db86d4e9a5ba4c62037 (image=docker.io/prom/node-exporter:latest, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-01, maintainer=The Prometheus Authors <prometheus-developers@googlegroups.com>)
Jan 13 14:29:07 iz-ceph-v1-mon-01 systemd[1]: libpod-conmon-f8d1e263f5bda2a15fa6b609b11a0b8a8dfb811ea7880db86d4e9a5ba4c62037.scope: Succeeded.

[...]

Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: start operation timed out. Terminating.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: start operation timed out. Terminating.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: start operation timed out. Terminating.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: start operation timed out. Terminating.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service: start operation timed out. Terminating.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Failed with result 'timeout'.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: Failed to start Ceph node-exporter.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Failed with result 'timeout'.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: Failed to start Ceph grafana.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Failed with result 'timeout'.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: Failed to start Ceph prometheus.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Failed with result 'timeout'.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: Failed to start Ceph alertmanager.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service: Failed with result 'timeout'.
Jan 13 14:29:51 iz-ceph-v1-mon-01 systemd[1]: Failed to start Ceph mgr.iz-ceph-v1-mon-01.elswai for 68317c90-b44f-11ea-a0c4-d1443a31407c.

[...]

Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Scheduled restart job, restart counter is at 1.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Scheduled restart job, restart counter is at 1.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Scheduled restart job, restart counter is at 1.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service: Scheduled restart job, restart counter is at 1.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Scheduled restart job, restart counter is at 1.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Stopped Ceph alertmanager.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Found left-over process 1238 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Found left-over process 1272 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Found left-over process 1606 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Starting Ceph alertmanager.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c...
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Stopped Ceph grafana.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Found left-over process 1237 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Found left-over process 1279 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Found left-over process 1608 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Starting Ceph grafana.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c...
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Stopped Ceph mgr.iz-ceph-v1-mon-01.elswai for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service: Found left-over process 1234 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service: Found left-over process 1547 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service: Found left-over process 1734 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Starting Ceph mgr.iz-ceph-v1-mon-01.elswai for 68317c90-b44f-11ea-a0c4-d1443a31407c...
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Stopped Ceph node-exporter.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Found left-over process 1222 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Found left-over process 1223 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Found left-over process 1587 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Starting Ceph node-exporter.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c...
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Stopped Ceph prometheus.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Found left-over process 1240 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Found left-over process 1265 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Found left-over process 1599 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: Starting Ceph prometheus.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c...
Jan 13 14:30:01 iz-ceph-v1-mon-01 podman[4951]: Error: cannot remove container 3775ce5793baab123898cce1c5ebd979b934ef41d443300b3ee2cf535e034e24 as it is running - running or paused containers cannot be removed without force: container state improper
Jan 13 14:30:01 iz-ceph-v1-mon-01 podman[4949]: Error: cannot remove container 9a771c886b0903e3db045d41d21e7c0df9c126ac3fdd797cf8b60142d4f899c7 as it is running - running or paused containers cannot be removed without force: container state improper
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Found left-over process 1238 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Found left-over process 1272 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Found left-over process 1606 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service: Found left-over process 1234 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service: Found left-over process 1547 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@mgr.iz-ceph-v1-mon-01.elswai.service: Found left-over process 1734 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 podman[4950]: Error: cannot remove container 43c6d192a4dd2f15b2ccf5f13696f9d1801c48b79b3751259275133bf68f052f as it is running - running or paused containers cannot be removed without force: container state improper
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Found left-over process 1237 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Found left-over process 1279 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Found left-over process 1608 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 podman[4952]: Error: cannot remove container f8d1e263f5bda2a15fa6b609b11a0b8a8dfb811ea7880db86d4e9a5ba4c62037 as it is running - running or paused containers cannot be removed without force: container state improper
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Found left-over process 1222 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Found left-over process 1223 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Found left-over process 1587 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 podman[4953]: Error: cannot remove container 2a0a8f4c368ce4ee486c0a281ccc80c1df1101486c621a30c8ec2c9ea7e39bda as it is running - running or paused containers cannot be removed without force: container state improper
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Found left-over process 1240 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Found left-over process 1265 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Found left-over process 1599 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:01 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:02 iz-ceph-v1-mon-01 bash[5072]: Error: error creating container storage: the container name "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-alertmanager.iz-ceph-v1-mon-01" is already in use by "9a771c886b0903e3db045d41d21e7c0df9c126ac3fdd797cf8b60142d4f899c7". You have to remove that container to be to reuse that name.: that name is already in use
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Control process exited, code=exited, status=125/n/a
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Failed with result 'exit-code'.
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: Failed to start Ceph alertmanager.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:02 iz-ceph-v1-mon-01 bash[5078]: Error: error creating container storage: the container name "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-grafana.iz-ceph-v1-mon-01" is already in use by "43c6d192a4dd2f15b2ccf5f13696f9d1801c48b79b3751259275133bf68f052f". You have to remove that container to be able to reuse that name.: that name is already in use
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Control process exited, code=exited, status=125/n/a
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Failed with result 'exit-code'.
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: Failed to start Ceph grafana.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:02 iz-ceph-v1-mon-01 bash[5084]: Error: error creating container storage: the container name "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-01" is already in use by "f8d1e263f5bda2a15fa6b609b11a0b8a8dfb811ea7880db86d4e9a5ba4c62037". You have to remove that container to be able to reuse that name.: that name is already in use
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Control process exited, code=exited, status=125/n/a
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Failed with result 'exit-code'.
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: Failed to start Ceph node-exporter.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:02 iz-ceph-v1-mon-01 bash[5099]: Error: error creating container storage: the container name "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-prometheus.iz-ceph-v1-mon-01" is already in use by "2a0a8f4c368ce4ee486c0a281ccc80c1df1101486c621a30c8ec2c9ea7e39bda". You have to remove that container to be able to reuse that name.: that name is already in use
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Control process exited, code=exited, status=125/n/a
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Failed with result 'exit-code'.
Jan 13 14:30:02 iz-ceph-v1-mon-01 systemd[1]: Failed to start Ceph prometheus.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.

[...]

Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Scheduled restart job, restart counter is at 2.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Scheduled restart job, restart counter is at 2.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: Stopped Ceph alertmanager.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Found left-over process 1238 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Found left-over process 1272 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@alertmanager.iz-ceph-v1-mon-01.service: Found left-over process 1606 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: Starting Ceph alertmanager.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c...
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: Stopped Ceph grafana.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Found left-over process 1237 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Found left-over process 1279 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@grafana.iz-ceph-v1-mon-01.service: Found left-over process 1608 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: Starting Ceph grafana.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c...
Jan 13 14:30:12 iz-ceph-v1-mon-01 podman[5077]: 2021-01-13 14:30:12.455597737 +0100 CET m=+10.678191759 container stop 3775ce5793baab123898cce1c5ebd979b934ef41d443300b3ee2cf535e034e24 (image=docker.io/ceph/ceph:v15.2.8, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-01.elswai, org.label-schema.ve>
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Scheduled restart job, restart counter is at 2.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Scheduled restart job, restart counter is at 2.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: Stopped Ceph node-exporter.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Found left-over process 1222 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Found left-over process 1223 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-01.service: Found left-over process 1587 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: Starting Ceph node-exporter.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c...
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: Stopped Ceph prometheus.iz-ceph-v1-mon-01 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Found left-over process 1240 (bash) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Found left-over process 1265 (podman) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@prometheus.iz-ceph-v1-mon-01.service: Found left-over process 1599 (conmon) in control group while starting unit. Ignoring.
Jan 13 14:30:12 iz-ceph-v1-mon-01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.

[...]


Related issues

Related to Orchestrator - Bug #48694: ceph-volume: unrecognized arguments: --filter-for-batch Resolved
Related to Orchestrator - Bug #49013: cephadm: Service definition causes some container startups to fail Resolved

History

#1 Updated by Sebastian Wagner about 3 years ago

  • Project changed from Ceph to Orchestrator

#2 Updated by Sebastian Wagner about 3 years ago

  • Related to Bug #48694: ceph-volume: unrecognized arguments: --filter-for-batch added

#3 Updated by Sebastian Wagner about 3 years ago

  • Subject changed from cephadm: Several services in error status after upgrade to 15.2.8 due to failed container start operations to cephadm: Several services in error status after upgrade to 15.2.8: unrecognized arguments: --filter-for-batch

#4 Updated by Sebastian Wagner about 3 years ago

  • Priority changed from Normal to Urgent

#5 Updated by Sebastian Wagner about 3 years ago

ok: everything was added in https://github.com/ceph/ceph/pull/37520

  • the --filter-for-batch implementation in ceph-volume
  • the --filter-for-batch usage in cephadm

but the error message contains:

usage: ceph-volume inventory [-h] [--format {plain,json,json-pretty}] [path]

which is part of https://docs.ceph.com/en/latest/releases/octopus/#changelog

ok, this is an internal cephadm upgrade problem:

step 1: user calls `ceph orch upgrade`
step 2: mgr/cephadm calls ceph orch config set mgr.x container_image <new-container>
step 3: mgr gets redeployed
step 4: mgr failover
step 5: mgr/cephadm calls _refresh_host_devices
step 6: _refresh_host_devices calls ceph orch config get osd.1 container_image. But this returns the old image
step 7: _refresh_host_devices calls ceph-volume ... --filter-for-batch

and this breaks the upgrade, as [global] contaienr_image still points to the old image, but the code is running on the new image already

#7 Updated by Sebastian Wagner about 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Sebastian Wagner

#8 Updated by Sebastian Wagner about 3 years ago

Ok, I see two distinct issues here:

1. --filter-for-batch which is addressed in https://github.com/ceph/ceph/pull/38926
2. several daemons are not coming back up due to "Found left-over process 1606 (conmon) in control group while starting unit"

for the second issue, as a workaround, please restart the host.

#9 Updated by Sebastian Wagner about 3 years ago

  • Priority changed from Urgent to Normal

#10 Updated by Gunther Heinrich about 3 years ago

I restarted all hosts but that didn't solve the problem unfortunately:


alertmanager.iz-ceph-v1-mon-01       iz-ceph-v1-mon-01  error          36s ago    7M   0.21.0   docker.io/prom/alertmanager:latest   c876f5897d7b  6cf2571fc92a
crash.iz-ceph-v1-mon-01              iz-ceph-v1-mon-01  running (19m)  36s ago    7M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  4537f1ff97c1
crash.iz-ceph-v1-mon-02              iz-ceph-v1-mon-02  running (18m)  40s ago    7M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  b409fd7a374b
crash.iz-ceph-v1-mon-03              iz-ceph-v1-mon-03  running (17m)  40s ago    7M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  9fc17a3ec4f3
crash.iz-ceph-v1-osd-01              iz-ceph-v1-osd-01  running (15m)  9s ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  39d9bf41bb7e
crash.iz-ceph-v1-osd-02              iz-ceph-v1-osd-02  running (14m)  9s ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  58589908b68e
crash.iz-ceph-v1-osd-03              iz-ceph-v1-osd-03  running (12m)  8s ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  b81bb6acd673
grafana.iz-ceph-v1-mon-01            iz-ceph-v1-mon-01  error          36s ago    7M   6.6.2    docker.io/ceph/ceph-grafana:latest   87a51ecf0b1c  625ff3e2633f
mds.cephfs.iz-ceph-v1-mon-01.vjehnz  iz-ceph-v1-mon-01  running (19m)  36s ago    6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  db1e1c821e1d
mds.cephfs.iz-ceph-v1-mon-02.cbdqzm  iz-ceph-v1-mon-02  running (18m)  40s ago    6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  8bdea20df510
mds.cephfs.iz-ceph-v1-mon-03.zpxfkg  iz-ceph-v1-mon-03  running (17m)  40s ago    6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  f96147b8aaee
mgr.iz-ceph-v1-mon-01.elswai         iz-ceph-v1-mon-01  error          36s ago    6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  53d23c82303a
mgr.iz-ceph-v1-mon-02.foqmfa         iz-ceph-v1-mon-02  running (18m)  40s ago    6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  3c03159e79b9
mgr.iz-ceph-v1-mon-03.ncnoal         iz-ceph-v1-mon-03  error          40s ago    6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  c7d745c7cacc
mon.iz-ceph-v1-mon-01                iz-ceph-v1-mon-01  running (19m)  36s ago    7M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  2f93bf5fcddb
mon.iz-ceph-v1-mon-02                iz-ceph-v1-mon-02  running (18m)  40s ago    6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  c5a891d63f6a
mon.iz-ceph-v1-mon-03                iz-ceph-v1-mon-03  running (17m)  40s ago    6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  3dc86765e867
node-exporter.iz-ceph-v1-mon-01      iz-ceph-v1-mon-01  error          36s ago    7M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  fbd216acfdf2
node-exporter.iz-ceph-v1-mon-02      iz-ceph-v1-mon-02  error          40s ago    7M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  2e1cdf4c3564
node-exporter.iz-ceph-v1-mon-03      iz-ceph-v1-mon-03  error          40s ago    7M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  b156aeb256fa
node-exporter.iz-ceph-v1-osd-01      iz-ceph-v1-osd-01  error          9s ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  98c3eea27b58
node-exporter.iz-ceph-v1-osd-02      iz-ceph-v1-osd-02  error          9s ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  64292a5661c6
node-exporter.iz-ceph-v1-osd-03      iz-ceph-v1-osd-03  error          8s ago     6M   1.0.1    docker.io/prom/node-exporter:latest  0e0218889c33  3ac07baa7d87
osd.0                                iz-ceph-v1-osd-01  running (15m)  9s ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  ab44fe2809d1
osd.1                                iz-ceph-v1-osd-01  running (15m)  9s ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  51b06ee0e70a
osd.2                                iz-ceph-v1-osd-02  running (14m)  9s ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  18fecaea7f46
osd.3                                iz-ceph-v1-osd-02  running (14m)  9s ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  9243e4ea8ea5
osd.4                                iz-ceph-v1-osd-03  running (12m)  8s ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  c48ab6d881dc
osd.5                                iz-ceph-v1-osd-03  running (12m)  8s ago     6M   15.2.8   docker.io/ceph/ceph:v15.2.8          5553b0cb212c  fa157d28af7a
prometheus.iz-ceph-v1-mon-01         iz-ceph-v1-mon-01  error          36s ago    7M   2.19.1   docker.io/prom/prometheus:latest     396dc3b4e717  70aab14fb0c0

In the journalctl-logs I found these entries (this time I took them from the second mon) which seems to indicate, that during startup some (old?) containers cannot be found anymore and thus cannot be evicted? Here are the log entries with some later entries added for "2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca":

Jan 18 07:17:30 iz-ceph-v1-mon-02 podman[602]: Error: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-crash.iz-ceph-v1-mon-02 found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 podman[606]: Error: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02 found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 podman[605]: Error: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mon.iz-ceph-v1-mon-02 found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 podman[604]: Error: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-02.foqmfa found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 podman[603]: Error: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mds.cephfs.iz-ceph-v1-mon-02.cbdqzm found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 bash[1154]: Error: failed to evict container: "": failed to find container "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-02.foqmfa" in state: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-02.foqmfa found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 bash[1167]: Error: failed to evict container: "": failed to find container "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-crash.iz-ceph-v1-mon-02" in state: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-crash.iz-ceph-v1-mon-02 found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 bash[1163]: Error: failed to evict container: "": failed to find container "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mon.iz-ceph-v1-mon-02" in state: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mon.iz-ceph-v1-mon-02 found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 bash[1166]: Error: failed to evict container: "": failed to find container "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mds.cephfs.iz-ceph-v1-mon-02.cbdqzm" in state: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mds.cephfs.iz-ceph-v1-mon-02.cbdqzm found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 bash[1206]: Error: failed to evict container: "": failed to find container "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-02.foqmfa" in state: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-02.foqmfa found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 bash[1278]: Error: failed to evict container: "": failed to find container "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-crash.iz-ceph-v1-mon-02" in state: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-crash.iz-ceph-v1-mon-02 found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 bash[1283]: Error: failed to evict container: "": failed to find container "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mon.iz-ceph-v1-mon-02" in state: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mon.iz-ceph-v1-mon-02 found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 bash[1282]: Error: failed to evict container: "": failed to find container "ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mds.cephfs.iz-ceph-v1-mon-02.cbdqzm" in state: no container with name or ID ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mds.cephfs.iz-ceph-v1-mon-02.cbdqzm found: no such container
Jan 18 07:17:30 iz-ceph-v1-mon-02 podman[1159]: 2021-01-18 07:17:30.704563369 +0100 CET m=+0.523688984 container create 2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca (image=docker.io/prom/node-exporter:latest, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02, maintai[...]
Jan 18 07:17:31 iz-ceph-v1-mon-02 podman[1313]: 2021-01-18 07:17:31.113839747 +0100 CET m=+0.518400545 container create 3c03159e79b9be1257c3642b7c9a6c42202cd1b657ec54bd1cf1d1cc2340aaa9 (image=docker.io/ceph/ceph:v15.2.8, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mgr.iz-ceph-v1-mon-02.foqmfa, maintainer=Dimitri[...]
Jan 18 07:17:31 iz-ceph-v1-mon-02 podman[1368]: 2021-01-18 07:17:31.15929247 +0100 CET m=+0.484149434 container create b409fd7a374be01ffac8f70d8446a9272fac2e3b1e4adec84812da5ce1c66e19 (image=docker.io/ceph/ceph:v15.2.8, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-crash.iz-ceph-v1-mon-02, ceph=True, GIT_BRANCH=HE[...]
Jan 18 07:17:31 iz-ceph-v1-mon-02 podman[1376]: 2021-01-18 07:17:31.285364304 +0100 CET m=+0.599962022 container create 8bdea20df5102420d7ab226ea893d078955d602cb020cb27e7b2466cbda078cd (image=docker.io/ceph/ceph:v15.2.8, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mds.cephfs.iz-ceph-v1-mon-02.cbdqzm, org.label-s[...]
Jan 18 07:17:31 iz-ceph-v1-mon-02 podman[1375]: 2021-01-18 07:17:31.389936576 +0100 CET m=+0.656427562 container create c5a891d63f6a0198f7c2e7d8dd0d322971e12d8351985408c956512791be8ef9 (image=docker.io/ceph/ceph:v15.2.8, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-mon.iz-ceph-v1-mon-02, GIT_BRANCH=HEAD, CEPH_POI[...]

[...]

Jan 18 07:19:35 iz-ceph-v1-mon-02 systemd[1]: Started libpod-conmon-2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca.scope.
Jan 18 07:19:35 iz-ceph-v1-mon-02 systemd[1769]: run-runc-2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca-runc.PSbC7o.mount: Succeeded.
Jan 18 07:19:35 iz-ceph-v1-mon-02 systemd[1]: run-runc-2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca-runc.PSbC7o.mount: Succeeded.
Jan 18 07:19:35 iz-ceph-v1-mon-02 podman[2082]: 2021-01-18 07:19:35.991805881 +0100 CET m=+0.185363292 container exec 2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca (image=docker.io/prom/node-exporter:latest, name=ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02, maintaine>
Jan 18 07:19:36 iz-ceph-v1-mon-02 systemd[1]: libpod-conmon-2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca.scope: Succeeded.

[...]

Jan 18 07:19:40 iz-ceph-v1-mon-02 systemd[1]: Stopped Ceph node-exporter.iz-ceph-v1-mon-02 for 68317c90-b44f-11ea-a0c4-d1443a31407c.
Jan 18 07:19:40 iz-ceph-v1-mon-02 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-02.service: Found left-over process 1152 (bash) in control group while starting unit. Ignoring.
Jan 18 07:19:40 iz-ceph-v1-mon-02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 18 07:19:40 iz-ceph-v1-mon-02 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-02.service: Found left-over process 1159 (podman) in control group while starting unit. Ignoring.
Jan 18 07:19:40 iz-ceph-v1-mon-02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 18 07:19:40 iz-ceph-v1-mon-02 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-mon-02.service: Found left-over process 1474 (conmon) in control group while starting unit. Ignoring.
Jan 18 07:19:40 iz-ceph-v1-mon-02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 18 07:19:40 iz-ceph-v1-mon-02 systemd[1]: Starting Ceph node-exporter.iz-ceph-v1-mon-02 for 68317c90-b44f-11ea-a0c4-d1443a31407c...
Jan 18 07:19:40 iz-ceph-v1-mon-02 podman[2572]: Error: cannot remove container 2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca as it is running - running or paused containers cannot be removed without force: container state improper

And here are some entries from htop:


   1152 root       20   0  6972  3272  3048 S  0.0  0.2  0:00.00 /bin/bash /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/node-exporter.iz-ceph-v1-mon-02/unit.run
   1164 root       20   0 1313M 50764 28108 S  0.0  2.5  0:00.02 /usr/bin/podman run --rm --net=host --user 65534 --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02 -e CONTAINER_IMAGE=prom/node-exporter -e NODE_NAME=iz-ceph-v1-mon-02 -v /proc:/host/proc:ro -v /sys:/host/sys:ro -v /:/roo
   1165 root       20   0 1313M 50764 28108 S  0.0  2.5  0:00.05 /usr/bin/podman run --rm --net=host --user 65534 --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02 -e CONTAINER_IMAGE=prom/node-exporter -e NODE_NAME=iz-ceph-v1-mon-02 -v /proc:/host/proc:ro -v /sys:/host/sys:ro -v /:/roo
   1168 root       20   0 1313M 50764 28108 S  0.0  2.5  0:00.00 /usr/bin/podman run --rm --net=host --user 65534 --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02 -e CONTAINER_IMAGE=prom/node-exporter -e NODE_NAME=iz-ceph-v1-mon-02 -v /proc:/host/proc:ro -v /sys:/host/sys:ro -v /:/roo
   1179 root       20   0 1313M 50764 28108 S  0.0  2.5  0:00.00 /usr/bin/podman run --rm --net=host --user 65534 --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02 -e CONTAINER_IMAGE=prom/node-exporter -e NODE_NAME=iz-ceph-v1-mon-02 -v /proc:/host/proc:ro -v /sys:/host/sys:ro -v /:/roo
   1202 root       20   0 1313M 50764 28108 S  0.0  2.5  0:00.00 /usr/bin/podman run --rm --net=host --user 65534 --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02 -e CONTAINER_IMAGE=prom/node-exporter -e NODE_NAME=iz-ceph-v1-mon-02 -v /proc:/host/proc:ro -v /sys:/host/sys:ro -v /:/roo
   1204 root       20   0 1313M 50764 28108 S  0.0  2.5  0:00.01 /usr/bin/podman run --rm --net=host --user 65534 --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02 -e CONTAINER_IMAGE=prom/node-exporter -e NODE_NAME=iz-ceph-v1-mon-02 -v /proc:/host/proc:ro -v /sys:/host/sys:ro -v /:/roo
   1255 root       20   0 1313M 50764 28108 S  0.0  2.5  0:00.00 /usr/bin/podman run --rm --net=host --user 65534 --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02 -e CONTAINER_IMAGE=prom/node-exporter -e NODE_NAME=iz-ceph-v1-mon-02 -v /proc:/host/proc:ro -v /sys:/host/sys:ro -v /:/roo
   1385 root       20   0 1313M 50764 28108 S  0.0  2.5  0:00.04 /usr/bin/podman run --rm --net=host --user 65534 --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-mon-02 -e CONTAINER_IMAGE=prom/node-exporter -e NODE_NAME=iz-ceph-v1-mon-02 -v /proc:/host/proc:ro -v /sys:/host/sys:ro -v /:/roo
   1477 root       20   0 80516  2008  1736 S  0.0  0.1  0:00.00 /usr/libexec/podman/conmon --api-version 1 -c 2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca -u 2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca -r /usr/sbin/runc -b /var/lib/containers/storage/overlay-containers/2e1cd
   1474 root       20   0 80516  2008  1736 S  0.0  0.1  0:00.00 /usr/libexec/podman/conmon --api-version 1 -c 2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca -u 2e1cdf4c3564967faa8750836a71d41f309c597a0a27491638d70b8034cf88ca -r /usr/sbin/runc -b /var/lib/containers/storage/overlay-containers/2e1cd
   1490 root       20   0 80516  2012  1740 S  0.0  0.1  0:00.00 /usr/libexec/podman/conmon --api-version 1 -c 8bdea20df5102420d7ab226ea893d078955d602cb020cb27e7b2466cbda078cd -u 8bdea20df5102420d7ab226ea893d078955d602cb020cb27e7b2466cbda078cd -r /usr/sbin/runc -b /var/lib/containers/storage/overlay-containers/8bdea
   1488 root       20   0 80516  2012  1740 S  0.0  0.1  0:00.00 /usr/libexec/podman/conmon --api-version 1 -c 8bdea20df5102420d7ab226ea893d078955d602cb020cb27e7b2466cbda078cd -u 8bdea20df5102420d7ab226ea893d078955d602cb020cb27e7b2466cbda078cd -r /usr/sbin/runc -b /var/lib/containers/storage/overlay-containers/8bdea
   1498 root       20   0 80516  2012  1736 S  0.0  0.1  0:00.00 /usr/libexec/podman/conmon --api-version 1 -c c5a891d63f6a0198f7c2e7d8dd0d322971e12d8351985408c956512791be8ef9 -u c5a891d63f6a0198f7c2e7d8dd0d322971e12d8351985408c956512791be8ef9 -r /usr/sbin/runc -b /var/lib/containers/storage/overlay-containers/c5a89
   1508 root       20   0 80516  1964  1692 S  0.0  0.1  0:00.00 /usr/libexec/podman/conmon --api-version 1 -c b409fd7a374be01ffac8f70d8446a9272fac2e3b1e4adec84812da5ce1c66e19 -u b409fd7a374be01ffac8f70d8446a9272fac2e3b1e4adec84812da5ce1c66e19 -r /usr/sbin/runc -b /var/lib/containers/storage/overlay-containers/b409f
   1506 root       20   0 80516  1964  1692 S  0.0  0.1  0:00.00 /usr/libexec/podman/conmon --api-version 1 -c b409fd7a374be01ffac8f70d8446a9272fac2e3b1e4adec84812da5ce1c66e19 -u b409fd7a374be01ffac8f70d8446a9272fac2e3b1e4adec84812da5ce1c66e19 -r /usr/sbin/runc -b /var/lib/containers/storage/overlay-containers/b409f
   1556 root       20   0 80516  1964  1688 S  0.0  0.1  0:00.00 /usr/libexec/podman/conmon --api-version 1 -c 3c03159e79b9be1257c3642b7c9a6c42202cd1b657ec54bd1cf1d1cc2340aaa9 -u 3c03159e79b9be1257c3642b7c9a6c42202cd1b657ec54bd1cf1d1cc2340aaa9 -r /usr/sbin/runc -b /var/lib/containers/storage/overlay-containers/3c031

#11 Updated by Gunther Heinrich about 3 years ago

It seems that Ubuntu/Podman is not the cause for this issue because I am running into the same errors after updating a running testcluster from 15.2.5 to 15.2.8 without updating Ubuntu or Podman. Moreover, no issues were reported by the Orchestrator during that update process.

#12 Updated by Gunther Heinrich about 3 years ago

I did some analysis of the node-exporter on the osd node 3 to see what might be happening.

To me it looks as if the container simply fails to start, although I couldn't find out why. Could this be the result of the old containers returned by the refresh of the mgr?

Nevertheless because of this error several other issues surfaced which resulted in the following process:
  1. The Ceph Service / Container tries to start
  2. After 120 seconds a timeout is triggered for the run command, Podman logs exit code 125 (The command fails for any other reason)
  3. Restart Attempt 1
  4. Container which failed to start is still present, was not stopped properly, error code 2 (One of the specified containers is paused or running)
  5. Restart Attempt 2
  6. Failed Container still present, error code 2
  7. Restart Attempt 3
  8. Failed Container still present, error code 2
  9. Restart Attempt 4
  10. Failed Container still present, error code 2
  11. Restart Attempt 5
  12. Podman logs that the start request repeated too quickly

And although the container ran into a timeout it is indeed present in podman ps and marked as fine and running:

CONTAINER ID  IMAGE                                COMMAND               CREATED            STATUS                PORTS   NAMES
f89479f68523  docker.io/prom/node-exporter:latest  --no-collector.ti...  44 minutes ago     Up 44 minutes ago             ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-osd-03

So the automatic removal of a container - at least in a failure state - doesn't seem to work properly.

Here is the first start attempt where it seems that the node-exporter container is indeed running and loggings messages:

Jan 22 10:32:19 iz-ceph-v1-osd-03 systemd[1]: Started libcontainer container f89479f685232f97599225337be16ace05f31519cf16c133841748fd48a7bcd9.
[...]
Jan 22 10:32:19 iz-ceph-v1-osd-03 podman[992]: 2021-01-22 10:32:19.427747009 +0100 CET m=+1.594842743 container init f89479f685232f97599225337be16ace05f31519cf16c133841748fd48a7bcd9
Jan 22 10:32:19 iz-ceph-v1-osd-03 podman[992]: 2021-01-22 10:32:19.515085874 +0100 CET m=+1.682181638 container start f89479f685232f97599225337be16ace05f31519cf16c133841748fd48a7bcd9
Jan 22 10:32:19 iz-ceph-v1-osd-03 podman[992]: 2021-01-22 10:32:19.516035647 +0100 CET m=+1.683131421 container attach f89479f685232f97599225337be16ace05f31519cf16c133841748fd48a7bcd9
[...]
Jan 22 10:32:21 iz-ceph-v1-osd-03 bash[992]: level=info ts=2021-01-22T09:32:21.250Z caller=node_exporter.go:177 msg="Starting node_exporter" version="(version=1.0.1, branch=HEAD, revision=3715be6ae899f2a9b9dbfd9c39f3e09a7bd>
Jan 22 10:32:21 iz-ceph-v1-osd-03 bash[992]: level=info ts=2021-01-22T09:32:21.251Z caller=node_exporter.go:178 msg="Build context" build_context="(go=go1.14.4, user=root@1f76dbbcfa55, date=20200616-12:44:12)" 
Jan 22 10:32:21 iz-ceph-v1-osd-03 bash[992]: level=info ts=2021-01-22T09:32:21.271Z caller=node_exporter.go:105 msg="Enabled collectors" 
[...]

Two minutes later the timeout message is logged which triggers the restart attempt process outlined above:

Jan 22 10:34:16 iz-ceph-v1-osd-03 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service: start operation timed out. Terminating.
Jan 22 10:34:16 iz-ceph-v1-osd-03 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service: Failed with result 'timeout'.
Jan 22 10:34:16 iz-ceph-v1-osd-03 systemd[1]: Failed to start Ceph node-exporter.iz-ceph-v1-osd-03 for 68317c90-b44f-11ea-a0c4-d1443a31407c.

This is the status of the container service after the timeout (sudo systemctl status ceph-68317c90-b44f-11ea-a0c4-d1443a31407c @ node-exporter.iz-ceph-v1-osd-03.service):
ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service - Ceph node-exporter.iz-ceph-v1-osd-03 for 68317c90-b44f-11ea-a0c4-d1443a31407c
     Loaded: loaded (/etc/systemd/system/ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Fri 2021-01-22 10:34:47 CET; 4s ago
    Process: 2355 ExecStartPre=/usr/bin/podman rm ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-osd-03 (code=exited, status=2)
    Process: 2380 ExecStartPre=/bin/rm -f //run/ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service-pid //run/ceph-68317c [...] (code=exited, status=0/SUCCESS)
    Process: 2381 ExecStart=/bin/bash /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/node-exporter.iz-ceph-v1-osd-03/unit.run (code=exited, status=125)
    Process: 2408 ExecStopPost=/bin/bash /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/node-exporter.iz-ceph-v1-osd-03/unit.poststop (code=exited, status=0/SUCCESS)
    Process: 2409 ExecStopPost=/bin/rm -f //run/ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service-pid //run/ceph-68317c [...] (code=exited, status=0/SUCCESS)
      Tasks: 10 (limit: 2282)
     Memory: 33.2M
     CGroup: /system.slice/system-ceph\x2d68317c90\x2db44f\x2d11ea\x2da0c4\x2dd1443a31407c.slice/ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service
             -- 989 /bin/bash /var/lib/ceph/68317c90-b44f-11ea-a0c4-d1443a31407c/node-exporter.iz-ceph-v1-osd-03/unit.run
             -- 992 /usr/bin/podman run --rm --net=host --user 65534 --name ceph-68317c90-b44f-11ea-a0c4-d1443a31407c-node-exporter.iz-ceph-v1-osd-03 -e CONTAINER_IMAGE=prom/node-exporter -e NODE_NAME=iz-ceph-v1-osd-03 -v />
             --1237 /usr/libexec/podman/conmon --api-version 1 -c f89479f685232f97599225337be16ace05f31519cf16c133841748fd48a7bcd9 -u f89479f685232f97599225337be16ace05f31519cf16c133841748fd48a7bcd9 -r /usr/sbin/runc -b /va>

Jan 22 10:34:47 iz-ceph-v1-osd-03 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service: Failed with result 'exit-code'.
Jan 22 10:34:47 iz-ceph-v1-osd-03 systemd[1]: Failed to start Ceph node-exporter.iz-ceph-v1-osd-03 for 68317c90-b44f-11ea-a0c4-d1443a31407c.

And this is the status at the fifth restart attempt:
Jan 22 10:35:08 iz-ceph-v1-osd-03 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service: Start request repeated too quickly.
Jan 22 10:35:08 iz-ceph-v1-osd-03 systemd[1]: ceph-68317c90-b44f-11ea-a0c4-d1443a31407c@node-exporter.iz-ceph-v1-osd-03.service: Failed with result 'exit-code'.
Jan 22 10:35:08 iz-ceph-v1-osd-03 systemd[1]: Failed to start Ceph node-exporter.iz-ceph-v1-osd-03 for 68317c90-b44f-11ea-a0c4-d1443a31407c.

Should these problems be put into different issues/tickets?

#13 Updated by Gunther Heinrich about 3 years ago

I think I found the underlying issue of the container startup problems which is unrelated to the unrecognized options. I will open a new bug report for that.

#14 Updated by Sebastian Wagner about 3 years ago

  • Related to Bug #49013: cephadm: Service definition causes some container startups to fail added

#15 Updated by Sebastian Wagner about 3 years ago

  • Pull request ID set to 38927

#16 Updated by Sebastian Wagner about 3 years ago

  • Status changed from In Progress to Resolved

#17 Updated by Neha Ojha about 3 years ago

Sebastian, I am seeing something similar in pacific upgrade tests, want me create a new tracker issue?

rados/cephadm/upgrade/{1-start 2-repo_digest/defaut 3-start-upgrade 4-wait distro$/{ubuntu_18.04} fixed-2 mon_election/classic}

2021-02-23T01:45:31.520 INFO:journalctl@ceph.mgr.x.smithi155.stdout:Feb 23 01:45:31 smithi155 bash[21259]: debug 2021-02-23T01:45:31.098+0000 7fb6565e0700 -1 log_channel(cephadm) log [ERR] : cephadm exited with an error code: 1, stderr:/usr/bin/docker: usage: ceph-volume inventory [-h] [--format {plain,json,json-pretty}] [path]
2021-02-23T01:45:31.520 INFO:journalctl@ceph.mgr.x.smithi155.stdout:Feb 23 01:45:31 smithi155 bash[21259]: /usr/bin/docker: ceph-volume inventory: error: unrecognized arguments: --filter-for-batch
2021-02-23T01:45:31.520 INFO:journalctl@ceph.mgr.x.smithi155.stdout:Feb 23 01:45:31 smithi155 bash[21259]: Traceback (most recent call last):
2021-02-23T01:45:31.521 INFO:journalctl@ceph.mgr.x.smithi155.stdout:Feb 23 01:45:31 smithi155 bash[21259]:   File "<stdin>", line 7713, in <module>
2021-02-23T01:45:31.521 INFO:journalctl@ceph.mgr.x.smithi155.stdout:Feb 23 01:45:31 smithi155 bash[21259]:   File "<stdin>", line 7702, in main
2021-02-23T01:45:31.521 INFO:journalctl@ceph.mgr.x.smithi155.stdout:Feb 23 01:45:31 smithi155 bash[21259]:   File "<stdin>", line 1617, in _infer_fsid
2021-02-23T01:45:31.521 INFO:journalctl@ceph.mgr.x.smithi155.stdout:Feb 23 01:45:31 smithi155 bash[21259]:   File "<stdin>", line 1701, in _infer_image
2021-02-23T01:45:31.521 INFO:journalctl@ceph.mgr.x.smithi155.stdout:Feb 23 01:45:31 smithi155 bash[21259]:   File "<stdin>", line 4308, in command_ceph_volume
2021-02-23T01:45:31.522 INFO:journalctl@ceph.mgr.x.smithi155.stdout:Feb 23 01:45:31 smithi155 bash[21259]:   File "<stdin>", line 1380, in call_throws
/a/nojha-2021-02-22_23:52:36-rados-pacific-distro-basic-smithi/5904592

Also available in: Atom PDF