Project

General

Profile

Actions

Bug #49870

closed

When 'ceph mgr dump' returns invalid JSON during the middle of spec application the spec application fails

Added by John Fulton about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
orchestrator
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When using cephadm-15.2.9-2.el8.noarch with the command below to apply the attached spec:

/usr/sbin/cephadm --image undercloud.ctlplane.mydomain.tld:8787/ceph-ci/daemon:v5.0.7-stable-5.0-octopus-centos-8-x86_64 bootstrap --skip-firewalld --ssh-private-key /home/ceph-admin/.ssh/id_rsa --ssh-public-key /home/ceph-admin/.ssh/id_rsa.pub --ssh-user ceph-admin --allow-fqdn-hostname --output-keyring /etc/ceph/ceph.client.admin.keyring --output-config /etc/ceph/ceph.conf --fsid 91c9b592-5317-4d92-97c1-6a9f0e9460cc --config /home/ceph-admin/bootstrap_ceph.conf --skip-monitoring-stack --skip-dashboard --mon-ip 172.16.11.239

The output is the following with a failure from invalid JSON:

Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 91c9b592-5317-4d92-97c1-6a9f0e9460cc
Verifying IP 172.16.11.239 port 3300 ...
Verifying IP 172.16.11.239 port 6789 ...
Mon IP 172.16.11.239 is in CIDR network 172.16.11.0/24
Pulling container image undercloud.ctlplane.mydomain.tld:8787/ceph-ci/daemon:v5.0.7-stable-5.0-octopus-centos-8-x86_64...
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network...
Creating mgr...
Verifying port 9283 ...
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Wrote config to /etc/ceph/ceph.conf
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/10)...
mgr not available, waiting (2/10)...
mgr not available, waiting (3/10)...
mgr is available
Enabling cephadm module...
Traceback (most recent call last):
File \"/usr/sbin/cephadm\", line 6151, in <module>
r = args.func()
File \"/usr/sbin/cephadm\", line 1410, in _default_image
return func()
File \"/usr/sbin/cephadm\", line 3145, in command_bootstrap
wait_for_mgr_restart()
File \"/usr/sbin/cephadm\", line 3124, in wait_for_mgr_restart
j = json.loads(out)
File \"/usr/lib64/python3.6/json/__init__.py\", line 354, in loads
return _default_decoder.decode(s)
File \"/usr/lib64/python3.6/json/decoder.py\", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File \"/usr/lib64/python3.6/json/decoder.py\", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 3217 column 25 (char 114687)

Which is here in the code:

https://github.com/ceph/ceph/blob/octopus/src/cephadm/cephadm#L3123

Looks like in the following:

out = cli(['mgr', 'dump'])
j = json.loads(out)

that out contains invalid JSON so json.loads(out) throws an excpetion.

I can reproduce the failure condition by doing the following:

While the spec is applying run the following on the same host:

sudo cephadm shell -- ceph mgr dump  | jq .

There will be times that jq complains about invalid JSON.

I conjecture it's becuase the MGR is restarting. Perhaps a try catch retry is needed in case invalid json is returned because the mgr is not available?


Files

ceph_spec.yaml (963 Bytes) ceph_spec.yaml John Fulton, 03/17/2021 04:35 PM
Actions

Also available in: Atom PDF