Project

General

Profile

Bug #49870

Updated by Sebastian Wagner about 3 years ago

When using cephadm-15.2.9-2.el8.noarch with the command below to apply the attached spec: 

 <pre> 
 /usr/sbin/cephadm --image undercloud.ctlplane.mydomain.tld:8787/ceph-ci/daemon:v5.0.7-stable-5.0-octopus-centos-8-x86_64 bootstrap --skip-firewalld --ssh-private-key /home/ceph-admin/.ssh/id_rsa --ssh-public-key /home/ceph-admin/.ssh/id_rsa.pub --ssh-user ceph-admin --allow-fqdn-hostname --output-keyring /etc/ceph/ceph.client.admin.keyring --output-config /etc/ceph/ceph.conf --fsid 91c9b592-5317-4d92-97c1-6a9f0e9460cc --config /home/ceph-admin/bootstrap_ceph.conf --skip-monitoring-stack --skip-dashboard --mon-ip 172.16.11.239 

 The output is the following with a failure from invalid JSON: 

 Verifying podman|docker is present... 
 Verifying lvm2 is present... 
 Verifying time synchronization is in place... 
 Unit chronyd.service is enabled and running 
 Repeating the final host check... 
 podman|docker (/bin/podman) is present 
 systemctl is present 
 lvcreate is present 
 Unit chronyd.service is enabled and running 
 Host looks OK 
 Cluster fsid: 91c9b592-5317-4d92-97c1-6a9f0e9460cc 
 Verifying IP 172.16.11.239 port 3300 ... 
 Verifying IP 172.16.11.239 port 6789 ... 
 Mon IP 172.16.11.239 is in CIDR network 172.16.11.0/24 
 Pulling container image undercloud.ctlplane.mydomain.tld:8787/ceph-ci/daemon:v5.0.7-stable-5.0-octopus-centos-8-x86_64... 
 Extracting ceph user uid/gid from container image... 
 Creating initial keys... 
 Creating initial monmap... 
 Creating mon... 
 Waiting for mon to start... 
 Waiting for mon... 
 mon is available 
 Assimilating anything we can from ceph.conf... 
 Generating new minimal ceph.conf... 
 Restarting the monitor... 
 Setting mon public_network... 
 Creating mgr... 
 Verifying port 9283 ... 
 Wrote keyring to /etc/ceph/ceph.client.admin.keyring 
 Wrote config to /etc/ceph/ceph.conf 
 Waiting for mgr to start... 
 Waiting for mgr... 
 mgr not available, waiting (1/10)... 
 mgr not available, waiting (2/10)... 
 mgr not available, waiting (3/10)... 
 mgr is available 
 Enabling cephadm module... 
 Traceback (most recent call last): 
 File \"/usr/sbin/cephadm\", line 6151, in <module> 
 r = args.func() 
 File \"/usr/sbin/cephadm\", line 1410, in _default_image 
 return func() 
 File \"/usr/sbin/cephadm\", line 3145, in command_bootstrap 
 wait_for_mgr_restart() 
 File \"/usr/sbin/cephadm\", line 3124, in wait_for_mgr_restart 
 j = json.loads(out) 
 File \"/usr/lib64/python3.6/json/__init__.py\", line 354, in loads 
 return _default_decoder.decode(s) 
 File \"/usr/lib64/python3.6/json/decoder.py\", line 339, in decode 
 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
 File \"/usr/lib64/python3.6/json/decoder.py\", line 355, in raw_decode 
 obj, end = self.scan_once(s, idx) 
 json.decoder.JSONDecodeError: Unterminated string starting at: line 3217 column 25 (char 114687) 
 </pre> 

 Which is here in the code: 

  https://github.com/ceph/ceph/blob/octopus/src/cephadm/cephadm#L3123 

 Looks like in the following: 

         out = cli(['mgr', 'dump']) 
         j = json.loads(out) 

 that out contains invalid JSON so json.loads(out) throws an excpetion. 

 I can reproduce the failure condition by doing the following: 

 While the spec is applying run the following on the same host: 

  sudo cephadm shell -- ceph mgr dump    | jq . 

 There will be times that jq complains about invalid JSON. 

 I conjecture it's becuase the MGR is restarting. Perhaps a try catch retry is needed in case invalid json is returned because the mgr is not available? 

Back