Project

General

Profile

Bug #51258

Updated by Sebastian Wagner almost 3 years ago

There's a job in OpenStack which is able to test the latest pacific bits for both ceph containers and cephadm. 
 Using [1] and [2], the job, which is supposed to deploy a new cluster, fails with [3]: 

 ``` 
 <pre> 
 Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)", "stderr_lines": ["Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',) 
 ``` 

 [root@standalone ~]# ls /etc/ceph/ 
 [root@standalone ~]# 

 </pre> 
 The problem I found, is the /etc/ceph is empty, so that kind of failure is expected. 
 However, the bootstrap command [4], generates both conf and keyrings, and I can see them being generated, but after some time, they're gone. 

 In addition, you won't be able to interact with the Ceph cluster and `cephadm shell` returns something like: 

 ``` 
 <pre> 
 [vagrant@standalone ~]$ sudo cephadm shell 
 Inferring fsid 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 
 Inferring config /var/lib/ceph/4b5c8c0a-ff60-454b-a1b4-9747aa737d19/mon.standalone.localdomain/config 
 Using recent ceph image quay.ceph.io/ceph-ci/daemon@sha256:ec271e81d73b6687ad2e097e6b8784066a8f092f2ce4c2cbc2ec2095ff0d8d27 
 cephceph[ceph: root@standalone /]# ceph -s 
 2021-06-17T09:54:00.382+0000 7f59def71700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory 
 2021-06-17T09:54:00.382+0000 7f59def71700 -1 AuthRegistry(0x7f59d805ed00) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx 
 2021-06-17T09:54:00.382+0000 7f59def71700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory 
 2021-06-17T09:54:00.382+0000 7f59def71700 -1 AuthRegistry(0x7f59def6fea0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx 
 2021-06-17T09:54:00.383+0000 7f59dcd0d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] 
 2021-06-17T09:54:00.383+0000 7f59def71700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication 
 [errno 13] RADOS permission denied (error connecting to the cluster) 
 </pre> 
 ``` 
 <pre> 
 [ceph: root@standalone /]# ls /etc/ceph/ 
 ceph.conf    rbdmap 
 </pre> 

 where ceph.conf is: 
 ``` 
 <pre> 
 # minimal ceph.conf for 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 
 [global] 
         fsid = 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 
         mon_host = [v2:192.168.24.1:3300/0,v1:192.168.24.1:6789/0] 
 [mon.standalone.localdomain] 
 public network = 192.168.24.0/24 
 </pre> 
 ``` 
 This means you have the ceph config file because it's taken from /var/lib/ceph/4b5c8c0a-ff60-454b-a1b4-9747aa737d19/mon.standalone.localdomain/config, 
 but there's no keyring. 

 Note that the bootstrap command (which happens before applying any OSD) works properly, and you should be able to 
 interact with the cluster using `cephadm shell` or any other client, and in /etc/ceph you can see both ceph.conf and 
 the keyring, but something happen when the osds are applied using something like: 

 ``` 
 --- 
 <pre> 
 addr: 192.168.24.1 
 hostname: standalone.localdomain 
 labels: 
 - osd 
 - mgr 
 - mon 
 service_type: host 
 --- 
 placement: 
   hosts: 
   - standalone.localdomain 
 service_id: mon 
 service_name: mon 
 service_type: mon 
 --- 
 placement: 
   hosts: 
   - standalone.localdomain 
 service_id: mgr 
 service_name: mgr 
 service_type: mgr 
 --- 
 data_devices: 
   paths: 
   - /dev/ceph_vg/ceph_lv_data 
 placement: 
   hosts: 
   - standalone.localdomain 
 service_id: default_drive_group 
 service_name: osd.default_drive_group 
 service_type: osd 
 </pre> 

 ``` 

 [1] container: quay.ceph.io/ceph-ci/daemon:latest-pacific-devel 
 [2] cephadm version(s):  
     a. https://cbs.centos.org/koji/buildinfo?buildID=33232 
     b. https://cbs.centos.org/koji/buildinfo?buildID=33140 
 [3] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_50e/778915/40/check/tripleo-ci-centos-8-scenario004-standalone/50ea241/logs/undercloud/home/zuul/standalone-ansible-d4qzhga2/cephadm/cephadm_command.log 
 [4] https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/bootstrap.yaml#L57-L58

Back