Actions
Bug #44313
closedceph-volume prepare is not idempotent and may get called twice
Status:
Resolved
Priority:
High
Assignee:
-
Category:
cephadm
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
symptom is a failure like so:
2020-02-26T14:09:05.739+0000 7f08a7654700 0 [cephadm] [ERROR] [root] cephadm exited with an error code: 1, stderr:INFO:cephadm:/bin/podman:stderr --> RuntimeError: skipping vg_nvme/lv_4, it is already prepared Traceback (most recent call last): File "<stdin>", line 3553, in <module> File "<stdin>", line 689, in _infer_fsid File "<stdin>", line 2235, in command_ceph_volume File "<stdin>", line 515, in call_throws RuntimeError: Failed command: /bin/podman run --rm --net=host --privileged --group-add=disk -e CONTAINER_IMAGE=quay.io/ceph-ci/ceph:6b7a962b487812ecf30bdcadbe10a9ed7aecef6a -e NODE_NAME=smithi177 -v /var/run/ceph/7ad78048-58a0-11ea-9a17-001a4aab830c:/var/run/ceph:z -v /var/log/ceph/7ad78048-58a0-11ea-9a17-001a4aab830c:/var/log/ceph:z -v /var/lib/ceph/7ad78048-58a0-11ea-9a17-001a4aab830c/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /tmp/ceph-tmpeo999psn:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpby37boi9:/var/lib/ceph/bootstrap-osd/ceph.keyring:z --entrypoint /usr/sbin/ceph-volume quay.io/ceph-ci/ceph:6b7a962b487812ecf30bdcadbe10a9ed7aecef6a lvm prepare --bluestore --data vg_nvme/lv_4 --no-systemd Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1372, in _run_cephadm code, '\n'.join(err))) RuntimeError: cephadm exited with an error code: 1, stderr:INFO:cephadm:/bin/podman:stderr --> RuntimeError: skipping vg_nvme/lv_4, it is already prepared Traceback (most recent call last): File "<stdin>", line 3553, in <module> File "<stdin>", line 689, in _infer_fsid File "<stdin>", line 2235, in command_ceph_volume File "<stdin>", line 515, in call_throws
the problem is that the CLI command is sent and processed twice, due to a tcp failure/retry between the ceph cli tool and the mgr. in this case, we see
2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:49.921103+0000 mgr.y (mgr.14131) 196 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "ta rget": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:50.735133+0000 mgr.y (mgr.14131) 197 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "ta rget": ["mon-mgr", ""]}]: dispatch
/a/sage-2020-02-26_08:10:43-rados-wip-sage2-testing-2020-02-25-2110-distro-basic-smithi/4804047
Actions