Bug #51258
Updated by Sebastian Wagner almost 3 years ago
There's a job in OpenStack which is able to test the latest pacific bits for both ceph containers and cephadm. Using [1] and [2], the job, which is supposed to deploy a new cluster, fails with [3]: ``` <pre> Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)", "stderr_lines": ["Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',) ``` [root@standalone ~]# ls /etc/ceph/ [root@standalone ~]# </pre> The problem I found, is the /etc/ceph is empty, so that kind of failure is expected. However, the bootstrap command [4], generates both conf and keyrings, and I can see them being generated, but after some time, they're gone. In addition, you won't be able to interact with the Ceph cluster and `cephadm shell` returns something like: ``` <pre> [vagrant@standalone ~]$ sudo cephadm shell Inferring fsid 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 Inferring config /var/lib/ceph/4b5c8c0a-ff60-454b-a1b4-9747aa737d19/mon.standalone.localdomain/config Using recent ceph image quay.ceph.io/ceph-ci/daemon@sha256:ec271e81d73b6687ad2e097e6b8784066a8f092f2ce4c2cbc2ec2095ff0d8d27 cephceph[ceph: root@standalone /]# ceph -s 2021-06-17T09:54:00.382+0000 7f59def71700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory 2021-06-17T09:54:00.382+0000 7f59def71700 -1 AuthRegistry(0x7f59d805ed00) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx 2021-06-17T09:54:00.382+0000 7f59def71700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory 2021-06-17T09:54:00.382+0000 7f59def71700 -1 AuthRegistry(0x7f59def6fea0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx 2021-06-17T09:54:00.383+0000 7f59dcd0d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] 2021-06-17T09:54:00.383+0000 7f59def71700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication [errno 13] RADOS permission denied (error connecting to the cluster) </pre> ``` <pre> [ceph: root@standalone /]# ls /etc/ceph/ ceph.conf rbdmap </pre> where ceph.conf is: ``` <pre> # minimal ceph.conf for 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 [global] fsid = 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 mon_host = [v2:192.168.24.1:3300/0,v1:192.168.24.1:6789/0] [mon.standalone.localdomain] public network = 192.168.24.0/24 </pre> ``` This means you have the ceph config file because it's taken from /var/lib/ceph/4b5c8c0a-ff60-454b-a1b4-9747aa737d19/mon.standalone.localdomain/config, but there's no keyring. Note that the bootstrap command (which happens before applying any OSD) works properly, and you should be able to interact with the cluster using `cephadm shell` or any other client, and in /etc/ceph you can see both ceph.conf and the keyring, but something happen when the osds are applied using something like: ``` --- <pre> addr: 192.168.24.1 hostname: standalone.localdomain labels: - osd - mgr - mon service_type: host --- placement: hosts: - standalone.localdomain service_id: mon service_name: mon service_type: mon --- placement: hosts: - standalone.localdomain service_id: mgr service_name: mgr service_type: mgr --- data_devices: paths: - /dev/ceph_vg/ceph_lv_data placement: hosts: - standalone.localdomain service_id: default_drive_group service_name: osd.default_drive_group service_type: osd </pre> ``` [1] container: quay.ceph.io/ceph-ci/daemon:latest-pacific-devel [2] cephadm version(s): a. https://cbs.centos.org/koji/buildinfo?buildID=33232 b. https://cbs.centos.org/koji/buildinfo?buildID=33140 [3] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_50e/778915/40/check/tripleo-ci-centos-8-scenario004-standalone/50ea241/logs/undercloud/home/zuul/standalone-ansible-d4qzhga2/cephadm/cephadm_command.log [4] https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/bootstrap.yaml#L57-L58