Project

General

Profile

Actions

Bug #51258

closed

cephadm bootstrap: applying host specs suddenly removes the admin keyring from bootstrap host

Added by Francesco Pantano almost 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
ux
Backport:
Regression:
Yes
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There's a job in OpenStack which is able to test the latest pacific bits for both ceph containers and cephadm.
Using [1] and [2], the job, which is supposed to deploy a new cluster, fails with [3]:

```

Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)", "stderr_lines": ["Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)
```

[root@standalone ~]# ls /etc/ceph/
[root@standalone ~]#


The problem I found, is the /etc/ceph is empty, so that kind of failure is expected.
However, the bootstrap command [4], generates both conf and keyrings, and I can see them being generated, but after some time, they're gone.

In addition, you won't be able to interact with the Ceph cluster and `cephadm shell` returns something like:

```

[vagrant@standalone ~]$ sudo cephadm shell
Inferring fsid 4b5c8c0a-ff60-454b-a1b4-9747aa737d19
Inferring config /var/lib/ceph/4b5c8c0a-ff60-454b-a1b4-9747aa737d19/mon.standalone.localdomain/config
Using recent ceph image quay.ceph.io/ceph-ci/daemon@sha256:ec271e81d73b6687ad2e097e6b8784066a8f092f2ce4c2cbc2ec2095ff0d8d27
cephceph[ceph: root@standalone /]# ceph -s
2021-06-17T09:54:00.382+0000 7f59def71700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-06-17T09:54:00.382+0000 7f59def71700 -1 AuthRegistry(0x7f59d805ed00) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-06-17T09:54:00.382+0000 7f59def71700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-06-17T09:54:00.382+0000 7f59def71700 -1 AuthRegistry(0x7f59def6fea0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-06-17T09:54:00.383+0000 7f59dcd0d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-06-17T09:54:00.383+0000 7f59def71700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
[errno 13] RADOS permission denied (error connecting to the cluster)

```
[ceph: root@standalone /]# ls /etc/ceph/
ceph.conf  rbdmap

where ceph.conf is:
```

# minimal ceph.conf for 4b5c8c0a-ff60-454b-a1b4-9747aa737d19
[global]
        fsid = 4b5c8c0a-ff60-454b-a1b4-9747aa737d19
        mon_host = [v2:192.168.24.1:3300/0,v1:192.168.24.1:6789/0]
[mon.standalone.localdomain]
public network = 192.168.24.0/24

```
This means you have the ceph config file because it's taken from /var/lib/ceph/4b5c8c0a-ff60-454b-a1b4-9747aa737d19/mon.standalone.localdomain/config,
but there's no keyring.

Note that the bootstrap command (which happens before applying any OSD) works properly, and you should be able to
interact with the cluster using `cephadm shell` or any other client, and in /etc/ceph you can see both ceph.conf and
the keyring, but something happen when the osds are applied using something like:

```
---

addr: 192.168.24.1
hostname: standalone.localdomain
labels:
- osd
- mgr
- mon
service_type: host
---
placement:
  hosts:
  - standalone.localdomain
service_id: mon
service_name: mon
service_type: mon
---
placement:
  hosts:
  - standalone.localdomain
service_id: mgr
service_name: mgr
service_type: mgr
---
data_devices:
  paths:
  - /dev/ceph_vg/ceph_lv_data
placement:
  hosts:
  - standalone.localdomain
service_id: default_drive_group
service_name: osd.default_drive_group
service_type: osd

```

[1] container: quay.ceph.io/ceph-ci/daemon:latest-pacific-devel
[2] cephadm version(s):
a. https://cbs.centos.org/koji/buildinfo?buildID=33232
b. https://cbs.centos.org/koji/buildinfo?buildID=33140
[3] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_50e/778915/40/check/tripleo-ci-centos-8-scenario004-standalone/50ea241/logs/undercloud/home/zuul/standalone-ansible-d4qzhga2/cephadm/cephadm_command.log
[4] https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/bootstrap.yaml#L57-L58


Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #51277: cephadm bootstrap: unable to set up admin labelCan't reproduce

Actions
Actions

Also available in: Atom PDF