Bug #44313
closedceph-volume prepare is not idempotent and may get called twice
0%
Description
symptom is a failure like so:
2020-02-26T14:09:05.739+0000 7f08a7654700 0 [cephadm] [ERROR] [root] cephadm exited with an error code: 1, stderr:INFO:cephadm:/bin/podman:stderr --> RuntimeError: skipping vg_nvme/lv_4, it is already prepared Traceback (most recent call last): File "<stdin>", line 3553, in <module> File "<stdin>", line 689, in _infer_fsid File "<stdin>", line 2235, in command_ceph_volume File "<stdin>", line 515, in call_throws RuntimeError: Failed command: /bin/podman run --rm --net=host --privileged --group-add=disk -e CONTAINER_IMAGE=quay.io/ceph-ci/ceph:6b7a962b487812ecf30bdcadbe10a9ed7aecef6a -e NODE_NAME=smithi177 -v /var/run/ceph/7ad78048-58a0-11ea-9a17-001a4aab830c:/var/run/ceph:z -v /var/log/ceph/7ad78048-58a0-11ea-9a17-001a4aab830c:/var/log/ceph:z -v /var/lib/ceph/7ad78048-58a0-11ea-9a17-001a4aab830c/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /tmp/ceph-tmpeo999psn:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpby37boi9:/var/lib/ceph/bootstrap-osd/ceph.keyring:z --entrypoint /usr/sbin/ceph-volume quay.io/ceph-ci/ceph:6b7a962b487812ecf30bdcadbe10a9ed7aecef6a lvm prepare --bluestore --data vg_nvme/lv_4 --no-systemd Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1372, in _run_cephadm code, '\n'.join(err))) RuntimeError: cephadm exited with an error code: 1, stderr:INFO:cephadm:/bin/podman:stderr --> RuntimeError: skipping vg_nvme/lv_4, it is already prepared Traceback (most recent call last): File "<stdin>", line 3553, in <module> File "<stdin>", line 689, in _infer_fsid File "<stdin>", line 2235, in command_ceph_volume File "<stdin>", line 515, in call_throws
the problem is that the CLI command is sent and processed twice, due to a tcp failure/retry between the ceph cli tool and the mgr. in this case, we see
2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:49.921103+0000 mgr.y (mgr.14131) 196 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "ta rget": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:50.735133+0000 mgr.y (mgr.14131) 197 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "ta rget": ["mon-mgr", ""]}]: dispatch
/a/sage-2020-02-26_08:10:43-rados-wip-sage2-testing-2020-02-25-2110-distro-basic-smithi/4804047
Updated by Sage Weil about 4 years ago
One possible fix would be to make ceph-volume itself idempotent, so that calling prepare on an already-prepared device succeeds.
An alternative workaround would be to make cephadm look for 'RuntimeError: skipping vg_nvme/lv_4, it is already prepared' in the error output and procede...
Updated by Sage Weil about 4 years ago
similar failure, this time a 'daemon mon add' dup:
/a/sage-2020-02-26_08:10:43-rados-wip-sage2-testing-2020-02-25-2110-distro-basic-smithi/4804139
Updated by Sebastian Wagner about 4 years ago
Your QA run really had a excessive amount of duplicates:
Ah, each mon logs the command.
2020-02-26T14:06:14.858 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:06:14 smithi198 bash[7136]: audit 2020-02-26T14:06:10.057474+0000 mgr.y (mgr.14131) 108 : audit [DBG] from='client.14292 v1:172.21.15.198:0/3261589327' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi198:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:06:14.858 INFO:ceph.mon.b.smithi086.stdout:Feb 26 14:06:14 smithi086 bash[7084]: audit 2020-02-26T14:06:10.057474+0000 mgr.y (mgr.14131) 108 : audit [DBG] from='client.14292 v1:172.21.15.198:0/3261589327' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi198:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:06:14.859 INFO:ceph.mon.c.smithi177.stdout:Feb 26 14:06:14 smithi177 bash[7060]: audit 2020-02-26T14:06:10.057474+0000 mgr.y (mgr.14131) 108 : audit [DBG] from='client.14292 v1:172.21.15.198:0/3261589327' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi198:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:07:35.375 INFO:ceph.mon.b.smithi086.stdout:Feb 26 14:07:35 smithi086 bash[7084]: audit 2020-02-26T14:07:34.343735+0000 mgr.y (mgr.14131) 154 : audit [DBG] from='client.14400 v1:172.21.15.86:0/3100800274' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi086:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:07:35.378 INFO:ceph.mon.c.smithi177.stdout:Feb 26 14:07:35 smithi177 bash[7060]: audit 2020-02-26T14:07:34.343735+0000 mgr.y (mgr.14131) 154 : audit [DBG] from='client.14400 v1:172.21.15.86:0/3100800274' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi086:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:07:35.380 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:07:35 smithi198 bash[7136]: audit 2020-02-26T14:07:34.343735+0000 mgr.y (mgr.14131) 154 : audit [DBG] from='client.14400 v1:172.21.15.86:0/3100800274' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi086:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:08:55.721 INFO:ceph.mon.b.smithi086.stdout:Feb 26 14:08:55 smithi086 bash[7084]: audit 2020-02-26T14:08:49.921103+0000 mgr.y (mgr.14131) 196 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:08:55.721 INFO:ceph.mon.b.smithi086.stdout:Feb 26 14:08:55 smithi086 bash[7084]: audit 2020-02-26T14:08:50.735133+0000 mgr.y (mgr.14131) 197 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:08:55.722 INFO:ceph.mon.c.smithi177.stdout:Feb 26 14:08:55 smithi177 bash[7060]: audit 2020-02-26T14:08:49.921103+0000 mgr.y (mgr.14131) 196 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:08:55.722 INFO:ceph.mon.c.smithi177.stdout:Feb 26 14:08:55 smithi177 bash[7060]: audit 2020-02-26T14:08:50.735133+0000 mgr.y (mgr.14131) 197 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:49.921103+0000 mgr.y (mgr.14131) 196 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch 2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:50.735133+0000 mgr.y (mgr.14131) 197 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
Updated by Sebastian Wagner about 4 years ago
Sage Weil wrote:
An alternative workaround would be to make cephadm look for 'RuntimeError: skipping vg_nvme/lv_4, it is already prepared' in the error output and procede...
I don't know how that should be doable for non-trivial drive groups
Updated by Sebastian Wagner about 4 years ago
- Has duplicate Bug #44312: ceph-volume prepare is not idempotent and may get called twice added
Updated by Sage Weil about 4 years ago
33755 fixed this for c-v prepare on a single device, which is what teuthology does.
Updated by Sebastian Wagner about 4 years ago
- Related to Bug #44824: cephadm: adding osd device is not idempotent added
Updated by Sebastian Wagner about 4 years ago
- Category set to cephadm
- Status changed from New to Resolved
- Pull request ID set to 33755
this particular issue was resolved.