Bug #45909: already existing cluster deployed: cephadm bootstrap failure - Orchestrator - Ceph

Actions

Copy link

Bug #45909

closed

already existing cluster deployed: cephadm bootstrap failure

Added by Deepika Upadhyay almost 4 years ago. Updated over 2 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

cephadm

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

$ sudo ./cephadm --verbose bootstrap --mon-ip 192.168.250.1 --allow-overwrite

BUG:cephadm:firewalld service ceph-mon is enabled in current zone
DEBUG:cephadm:Running command: /usr/bin/firewall-cmd --reload
DEBUG:cephadm:/usr/bin/firewall-cmd:stdout success
INFO:cephadm:Waiting for mon to start...
INFO:cephadm:Waiting for mon...
DEBUG:cephadm:Running command: /usr/bin/podman run --rm --net=host --ipc=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=x1cabon -v /var/lib/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac/mon.x1cabon:/var/lib/ceph/mon/ceph-x1cabon:z -v /tmp/ceph-tmp34r2vq7j:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpquibmmqw:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph docker.io/ceph/ceph:v15 status
DEBUG:cephadm:/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
INFO:cephadm:Non-zero exit code 13 from /usr/bin/podman run --rm --net=host --ipc=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=x1cabon -v /var/lib/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac/mon.x1cabon:/var/lib/ceph/mon/ceph-x1cabon:z -v /tmp/ceph-tmp34r2vq7j:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpquibmmqw:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph docker.io/ceph/ceph:v15 status
INFO:cephadm:/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
INFO:cephadm:mon not available, waiting (1/10)...
DEBUG:cephadm:Running command: /usr/bin/podman run --rm --net=host --ipc=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=x1cabon -v /var/lib/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac/mon.x1cabon:/var/lib/ceph/mon/ceph-x1cabon:z -v /tmp/ceph-tmp34r2vq7j:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpquibmmqw:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph docker.io/ceph/ceph:v15 status
DEBUG:cephadm:/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
INFO:cephadm:Non-zero exit code 13 from /usr/bin/podman run --rm --net=host --ipc=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=x1cabon -v /var/lib/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac/mon.x1cabon:/var/lib/ceph/mon/ceph-x1cabon:z -v /tmp/ceph-tmp34r2vq7j:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmpquibmmqw:/etc/ceph/ceph.conf:z --entrypoint /usr/bin/ceph docker.io/ceph/ceph:v15 status
INFO:cephadm:/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
INFO:cephadm:mon not available, waiting (2/10).

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Deepika Upadhyay almost 4 years ago

Category set to cephadm

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

looks as if cephadm wasn't able to start the mons. can you attach

cephadm ls
# the logs for failed MON. Or something like this:
cephadm ls | jq '.[].name' | xargs -n 1 cephadm logs --name

Actions

Copy link

Updated by Deepika Upadhyay almost 4 years ago

Sebastian Wagner wrote:

looks as if cephadm wasn't able to start the mons. can you attach

[...]

sure!

[
    {
        "style": "cephadm:v1",
        "name": "mon.x1cabon",
        "fsid": "993f1eee-a731-11ea-a2f1-24ee9af1caac",
        "enabled": true,
        "state": "running",
        "container_id": "5c1e7e528e1aa6450255c9b045f0e9a70e390b0552f9a7e8e4acf619f44bbcbd",
        "container_image_name": "docker.io/ceph/daemon-base:latest-master-devel",
        "container_image_id": "79566b1db52fa6ab71e3436643a0f40c0d24c19b04eaebe2d322adbaa94c7c9a",
        "version": "16.0.0-1574-gacefe44a49",
        "started": "2020-06-05T14:41:34.296961",
        "created": "2020-06-05T13:36:54.635434",
        "deployed": "2020-06-05T13:36:53.859428",
        "configured": "2020-06-05T13:36:54.635434",
        "systemd_unit": "ceph-993f1eee-a731-11ea-a2f1-24ee9af1caac@mon.x1cabon" 
    },
    {
        "style": "cephadm:v1",
        "name": "mon.x1cabon",
        "fsid": "736f997c-a732-11ea-8432-24ee9af1caac",
        "enabled": true,
        "state": "error",
        "container_id": null,
        "container_image_name": "docker.io/ceph/daemon-base:latest-master-devel",
        "container_image_id": null,
        "version": null,
        "started": null,
        "created": "2020-06-05T13:43:00.385223",
        "deployed": "2020-06-05T13:42:59.632210",
        "configured": "2020-06-05T13:43:00.385223",
        "systemd_unit": "ceph-736f997c-a732-11ea-8432-24ee9af1caac@mon.x1cabon" 
    },
    {
        "style": "cephadm:v1",
        "name": "osd.0",
        "fsid": "efd7940e-396e-4b8a-90bf-cf5d3733230e",
        "enabled": false,
        "state": "stopped",
        "container_id": null,
        "container_image_name": null,
        "container_image_id": null,
        "version": null,
        "started": null,
        "created": null,
        "deployed": null,
        "configured": null,
        "systemd_unit": "ceph-efd7940e-396e-4b8a-90bf-cf5d3733230e@osd.0" 
    },
    {
        "style": "cephadm:v1",
        "name": "mon.x1cabon",
        "fsid": "e5eb1470-a739-11ea-aac3-24ee9af1caac",
        "enabled": true,
        "state": "error",
        "container_id": null,
        "container_image_name": "docker.io/ceph/ceph:v15",
        "container_image_id": null,
        "version": null,
        "started": null,
        "created": "2020-06-05T14:37:16.401314",
        "deployed": "2020-06-05T14:37:15.688308",
        "configured": "2020-06-05T14:37:16.402314",
        "systemd_unit": "ceph-e5eb1470-a739-11ea-aac3-24ee9af1caac@mon.x1cabon" 
    },
    {
        "style": "cephadm:v1",
        "name": "mon.x1cabon",
        "fsid": "5f171e26-a739-11ea-8e1e-24ee9af1caac",
        "enabled": true,
        "state": "error",
        "container_id": null,
        "container_image_name": "docker.io/ceph/daemon-base:latest-master-devel",
        "container_image_id": null,
        "version": null,
        "started": null,
        "created": "2020-06-05T14:32:40.396867",
        "deployed": "2020-06-05T14:32:39.666861",
        "configured": "2020-06-05T14:32:40.396867",
        "systemd_unit": "ceph-5f171e26-a739-11ea-8e1e-24ee9af1caac@mon.x1cabon" 
    },
    {
        "style": "cephadm:v1",
        "name": "mon.x1cabon",
        "fsid": "fbe42472-7d47-11ea-83bd-24ee9af1caac",
        "enabled": true,
        "state": "error",
        "container_id": null,
        "container_image_name": "docker.io/ceph/ceph:v15",
        "container_image_id": null,
        "version": null,
        "started": null,
        "created": "2020-04-13T05:31:21.825336",
        "deployed": "2020-04-13T05:31:20.784326",
        "configured": "2020-04-13T05:31:21.826335",
        "systemd_unit": "ceph-fbe42472-7d47-11ea-83bd-24ee9af1caac@mon.x1cabon" 
    },
    {
        "style": "cephadm:v1",
        "name": "mon.x1cabon",
        "fsid": "a236e5be-a73a-11ea-bed2-24ee9af1caac",
        "enabled": true,
        "state": "error",
        "container_id": null,
        "container_image_name": "docker.io/ceph/ceph:v15",
        "container_image_id": null,
        "version": null,
        "started": null,
        "created": "2020-06-05T14:41:45.463678",
        "deployed": "2020-06-05T14:41:44.729671",
        "configured": "2020-06-05T14:41:45.463678",
        "systemd_unit": "ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon" 
    }
]

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

hm. you already have plenty of clusters already running on your machine. is this on purpose? In yes, I think ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon is to blame. What#s the output of systemctl status ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon ?

Actions

Copy link

Updated by Deepika Upadhyay almost 4 years ago

Sebastian Wagner wrote:

hm. you already have plenty of clusters already running on your machine. is this on purpose? In yes, I think ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon is to blame. What#s the output of systemctl status ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon ?

strangely, I restarted my laptop and seems like a lot of ceph-mons got init'ed on startup alone. I kill them with `sudo kill -9` but the are getting reinitialised somehow...

root        2152  0.1  0.3 1208316 59060 ?       Sl   14:13   0:00 /usr/bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-736f997c-a732-11ea-8432-24ee9af1caac-mon.x1cabon -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=x1cabon -v /var/run/ceph/736f997c-a732-11ea-8432-24ee9af1caac:/var/run/ceph:z -v /var/log/ceph/736f997c-a732-11ea-8432-24ee9af1caac:/var/log/ceph:z -v /var/lib/ceph/736f997c-a732-11ea-8432-24ee9af1caac/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/736f997c-a732-11ea-8432-24ee9af1caac/mon.x1cabon:/var/lib/ceph/mon/ceph-x1cabon:z -v /var/lib/ceph/736f997c-a732-11ea-8432-24ee9af1caac/mon.x1cabon/config:/etc/ceph/ceph.conf:z -v /dev:/dev -v /run/udev:/run/udev --entrypoint /usr/bin/ceph-mon docker.io/ceph/daemon-base:latest-master-devel -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
167         2365  0.3  0.2 480360 35068 ?        Ssl  14:13   0:00 /usr/bin/ceph-mon -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
root        5817  2.7  0.3 1208316 59460 ?       Sl   14:14   0:00 /usr/bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac-mon.x1cabon -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=x1cabon -v /var/run/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac:/var/run/ceph:z -v /var/log/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac:/var/log/ceph:z -v /var/lib/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac/mon.x1cabon:/var/lib/ceph/mon/ceph-x1cabon:z -v /var/lib/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac/mon.x1cabon/config:/etc/ceph/ceph.conf:z -v /dev:/dev -v /run/udev:/run/udev --entrypoint /usr/bin/ceph-mon docker.io/ceph/ceph:v15 -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
root        5819  2.7  0.3 1208572 58688 ?       Sl   14:14   0:00 /usr/bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-5f171e26-a739-11ea-8e1e-24ee9af1caac-mon.x1cabon -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=x1cabon -v /var/run/ceph/5f171e26-a739-11ea-8e1e-24ee9af1caac:/var/run/ceph:z -v /var/log/ceph/5f171e26-a739-11ea-8e1e-24ee9af1caac:/var/log/ceph:z -v /var/lib/ceph/5f171e26-a739-11ea-8e1e-24ee9af1caac/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/5f171e26-a739-11ea-8e1e-24ee9af1caac/mon.x1cabon:/var/lib/ceph/mon/ceph-x1cabon:z -v /var/lib/ceph/5f171e26-a739-11ea-8e1e-24ee9af1caac/mon.x1cabon/config:/etc/ceph/ceph.conf:z -v /dev:/dev -v /run/udev:/run/udev --entrypoint /usr/bin/ceph-mon docker.io/ceph/daemon-base:latest-master-devel -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
root        5967  2.5  0.3 1208316 59076 ?       Sl   14:14   0:00 /usr/bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-e5eb1470-a739-11ea-aac3-24ee9af1caac-mon.x1cabon -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=x1cabon -v /var/run/ceph/e5eb1470-a739-11ea-aac3-24ee9af1caac:/var/run/ceph:z -v /var/log/ceph/e5eb1470-a739-11ea-aac3-24ee9af1caac:/var/log/ceph:z -v /var/lib/ceph/e5eb1470-a739-11ea-aac3-24ee9af1caac/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/e5eb1470-a739-11ea-aac3-24ee9af1caac/mon.x1cabon:/var/lib/ceph/mon/ceph-x1cabon:z -v /var/lib/ceph/e5eb1470-a739-11ea-aac3-24ee9af1caac/mon.x1cabon/config:/etc/ceph/ceph.conf:z -v /dev:/dev -v /run/udev:/run/udev --entrypoint /usr/bin/ceph-mon docker.io/ceph/ceph:v15 -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
root        5972  3.5  0.3 1208060 58656 ?       Sl   14:14   0:00 /usr/bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-993f1eee-a731-11ea-a2f1-24ee9af1caac-mon.x1cabon -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=x1cabon -v /var/run/ceph/993f1eee-a731-11ea-a2f1-24ee9af1caac:/var/run/ceph:z -v /var/log/ceph/993f1eee-a731-11ea-a2f1-24ee9af1caac:/var/log/ceph:z -v /var/lib/ceph/993f1eee-a731-11ea-a2f1-24ee9af1caac/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/993f1eee-a731-11ea-a2f1-24ee9af1caac/mon.x1cabon:/var/lib/ceph/mon/ceph-x1cabon:z -v /var/lib/ceph/993f1eee-a731-11ea-a2f1-24ee9af1caac/mon.x1cabon/config:/etc/ceph/ceph.conf:z -v /dev:/dev -v /run/udev:/run/udev --entrypoint /usr/bin/ceph-mon docker.io/ceph/daemon-base:latest-master-devel -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
167         6036  0.7  0.2 385084 32784 ?        Ssl  14:14   0:00 /usr/bin/ceph-mon -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
167         6063  1.2  0.1 384980 32504 ?        Ssl  14:14   0:00 /usr/bin/ceph-mon -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
167         6092  1.0  0.2 384976 32668 ?        Ssl  14:14   0:00 /usr/bin/ceph-mon -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
167         6167  1.7  0.2 385084 33072 ?        Ssl  14:14   0:00 /usr/bin/ceph-mon -n mon.x1cabon -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true

you think this is expected behaviour?
and here's systemctl for failed mon:

● ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon.service - Ceph mon.x1cabon for a236e5be-a73a-11ea-bed2-24ee9af1caac
   Loaded: loaded (/etc/systemd/system/ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2020-06-08 14:15:19 IST; 3min 59s ago
  Process: 6464 ExecStartPre=/usr/bin/podman rm ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac-mon.x1cabon (code=exited, status=1/FAILURE)
  Process: 6575 ExecStart=/bin/bash /var/lib/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac/mon.x1cabon/unit.run (code=exited, status=1/FAILURE)
  Process: 7209 ExecStopPost=/bin/bash /var/lib/ceph/a236e5be-a73a-11ea-bed2-24ee9af1caac/mon.x1cabon/unit.poststop (code=exited, status=0/SUCCES>
 Main PID: 6575 (code=exited, status=1/FAILURE)

Jun 08 14:15:19 x1cabon systemd[1]: ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon.service: Scheduled restart job, restart counter is at 5.
Jun 08 14:15:19 x1cabon systemd[1]: Stopped Ceph mon.x1cabon for a236e5be-a73a-11ea-bed2-24ee9af1caac.
Jun 08 14:15:19 x1cabon systemd[1]: ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon.service: Start request repeated too quickly.
Jun 08 14:15:19 x1cabon systemd[1]: ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon.service: Failed with result 'exit-code'.
Jun 08 14:15:19 x1cabon systemd[1]: Failed to start Ceph mon.x1cabon for a236e5be-a73a-11ea-bed2-24ee9af1caac.

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Deepika Upadhyay wrote:

Sebastian Wagner wrote:

hm. you already have plenty of clusters already running on your machine. is this on purpose? In yes, I think ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon is to blame. What#s the output of systemctl status ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon ?

strangely, I restarted my laptop and seems like a lot of ceph-mons got init'ed on startup alone. I kill them with `sudo kill -9` but the are getting reinitialised somehow...

Sure, they're getting restarted by systemd immediately.

[...]

you think this is expected behaviour?
and here's systemctl for failed mon:

[...]

I think you want to do something like:

fsids = cephadm ls | jq -r .[].fsid | sort -u
foreach fsid:
  cephadm rm-cluster $fsid

Actions

Copy link

Updated by Deepika Upadhyay almost 4 years ago

Sebastian Wagner wrote:

Deepika Upadhyay wrote:

Sebastian Wagner wrote:

hm. you already have plenty of clusters already running on your machine. is this on purpose? In yes, I think ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon is to blame. What#s the output of systemctl status ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon ?

strangely, I restarted my laptop and seems like a lot of ceph-mons got init'ed on startup alone. I kill them with `sudo kill -9` but the are getting reinitialised somehow...

Sure, they're getting restarted by systemd immediately.

[...]

you think this is expected behaviour?
and here's systemctl for failed mon:

[...]

I think you want to do something like:

[...]

aah, solves the problem. Thanks Sebastian!

Actions

Copy link

Updated by Deepika Upadhyay almost 4 years ago

Deepika Upadhyay wrote:

Sebastian Wagner wrote:

Deepika Upadhyay wrote:

Sebastian Wagner wrote:

hm. you already have plenty of clusters already running on your machine. is this on purpose? In yes, I think ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon is to blame. What#s the output of systemctl status ceph-a236e5be-a73a-11ea-bed2-24ee9af1caac@mon.x1cabon ?

strangely, I restarted my laptop and seems like a lot of ceph-mons got init'ed on startup alone. I kill them with `sudo kill -9` but the are getting reinitialised somehow...

Sure, they're getting restarted by systemd immediately.

[...]

you think this is expected behaviour?
and here's systemctl for failed mon:

[...]

I think you want to do something like:

[...]

aah, solves the problem. Thanks Sebastian!

Actions

Copy link

Updated by Deepika Upadhyay almost 4 years ago

Status changed from New to Resolved

Actions

Copy link

#10

Updated by Sebastian Wagner almost 4 years ago

Related to Bug #45961: cephadm: high load and slow disk make "cephadm bootstrap" fail added

Actions

Copy link

#11

Updated by Deepika Upadhyay over 3 years ago

Hey Sebastian,

I see this failure still, in conditions when there is an already existing cluster deployed: verified in Ubuntu 16.04(senta02) and f32. Removing all previous cluster solved the issue.

Running command: systemctl start ceph-cb07b196-0d40-11eb-b4da-0025900809c6.target
Running command: systemctl daemon-reload
Running command: systemctl stop ceph-cb07b196-0d40-11eb-b4da-0025900809c6@mon.senta02
Running command: systemctl reset-failed ceph-cb07b196-0d40-11eb-b4da-0025900809c6@mon.senta02
systemctl:stderr Failed to reset failed state of unit ceph-cb07b196-0d40-11eb-b4da-0025900809c6@mon.senta02.service: Unit ceph-cb07b196-0d40-11eb-b4da-0025900809c6@mon.senta02.service is not loaded.
Running command: systemctl enable ceph-cb07b196-0d40-11eb-b4da-0025900809c6@mon.senta02
systemctl:stderr Created symlink /etc/systemd/system/ceph-cb07b196-0d40-11eb-b4da-0025900809c6.target.wants/ceph-cb07b196-0d40-11eb-b4da-0025900809c6@mon.senta02.service → /etc/systemd/system/ceph-cb07b196-0d40-11eb-b4da-0025900809c6@.service.
Running command: systemctl start ceph-cb07b196-0d40-11eb-b4da-0025900809c6@mon.senta02
Running command: systemctl is-enabled firewalld.service
systemctl:stdout enabled
Running command: systemctl is-active firewalld.service
systemctl:stdout active
firewalld ready
Running command: /usr/bin/firewall-cmd --permanent --query-service ceph-mon
/usr/bin/firewall-cmd:stdout yes
firewalld service ceph-mon is enabled in current zone
Running command: /usr/bin/firewall-cmd --reload
/usr/bin/firewall-cmd:stdout success
Waiting for mon to start...
Waiting for mon...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (1/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (2/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (3/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (4/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (5/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (6/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (7/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (8/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (9/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
mon not available, waiting (10/10)...
Running command: /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Non-zero exit code 13 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=docker.io/ceph/daemon-base:latest-master-devel -e NODE_NAME=senta02 -v /var/lib/ceph/cb07b196-0d40-11eb-b4da-0025900809c6/mon.senta02:/var/lib/ceph/mon/ceph-senta02:z -v /tmp/ceph-tmp17aitrf6:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp1ws9biyf:/etc/ceph/ceph.conf:z docker.io/ceph/daemon-base:latest-master-devel status
/usr/bin/ceph:stderr [errno 13] RADOS permission denied (error connecting to the cluster)
Traceback (most recent call last):
  File "./cephadm", line 6088, in <module>
    r = args.func()
  File "./cephadm", line 1385, in _default_image
    return func()
  File "./cephadm", line 3020, in command_bootstrap
    is_available('mon', is_mon_available)
  File "./cephadm", line 1110, in is_available
    % (what, retry))
__main__.Error: mon not available after 10 tries
Releasing lock 139624145427144 on /run/cephadm/cb07b196-0d40-11eb-b4da-0025900809c6.lock

Actions

Copy link

#12