Project

General

Profile

Actions

Bug #44313

closed

ceph-volume prepare is not idempotent and may get called twice

Added by Sage Weil about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

symptom is a failure like so:

2020-02-26T14:09:05.739+0000 7f08a7654700  0 [cephadm] [ERROR] [root] cephadm exited with an error code: 1, stderr:INFO:cephadm:/bin/podman:stderr -->  RuntimeError: skipping vg_nvme/lv_4, it is already prepared
Traceback (most recent call last):
  File "<stdin>", line 3553, in <module>
  File "<stdin>", line 689, in _infer_fsid
  File "<stdin>", line 2235, in command_ceph_volume
  File "<stdin>", line 515, in call_throws
RuntimeError: Failed command: /bin/podman run --rm --net=host --privileged --group-add=disk -e CONTAINER_IMAGE=quay.io/ceph-ci/ceph:6b7a962b487812ecf30bdcadbe10a9ed7aecef6a -e NODE_NAME=smithi177 -v /var/run/ceph/7ad78048-58a0-11ea-9a17-001a4aab830c:/var/run/ceph:z -v /var/log/ceph/7ad78048-58a0-11ea-9a17-001a4aab830c:/var/log/ceph:z -v /var/lib/ceph/7ad78048-58a0-11ea-9a17-001a4aab830c/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /tmp/ceph-tmpeo999psn:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpby37boi9:/var/lib/ceph/bootstrap-osd/ceph.keyring:z --entrypoint /usr/sbin/ceph-volume quay.io/ceph-ci/ceph:6b7a962b487812ecf30bdcadbe10a9ed7aecef6a lvm prepare --bluestore --data vg_nvme/lv_4 --no-systemd
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1372, in _run_cephadm
    code, '\n'.join(err)))
RuntimeError: cephadm exited with an error code: 1, stderr:INFO:cephadm:/bin/podman:stderr -->  RuntimeError: skipping vg_nvme/lv_4, it is already prepared
Traceback (most recent call last):
  File "<stdin>", line 3553, in <module>
  File "<stdin>", line 689, in _infer_fsid
  File "<stdin>", line 2235, in command_ceph_volume
  File "<stdin>", line 515, in call_throws

the problem is that the CLI command is sent and processed twice, due to a tcp failure/retry between the ceph cli tool and the mgr. in this case, we see

2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:49.921103+0000 mgr.y (mgr.14131) 196 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "ta
rget": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:50.735133+0000 mgr.y (mgr.14131) 197 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "ta
rget": ["mon-mgr", ""]}]: dispatch

/a/sage-2020-02-26_08:10:43-rados-wip-sage2-testing-2020-02-25-2110-distro-basic-smithi/4804047

Related issues 2 (0 open2 closed)

Related to Orchestrator - Bug #44824: cephadm: adding osd device is not idempotentResolved

Actions
Has duplicate Orchestrator - Bug #44312: ceph-volume prepare is not idempotent and may get called twiceDuplicate

Actions
Actions #1

Updated by Sage Weil about 4 years ago

One possible fix would be to make ceph-volume itself idempotent, so that calling prepare on an already-prepared device succeeds.

An alternative workaround would be to make cephadm look for 'RuntimeError: skipping vg_nvme/lv_4, it is already prepared' in the error output and procede...

Actions #2

Updated by Sage Weil about 4 years ago

similar failure, this time a 'daemon mon add' dup:

/a/sage-2020-02-26_08:10:43-rados-wip-sage2-testing-2020-02-25-2110-distro-basic-smithi/4804139

Actions #3

Updated by Sebastian Wagner about 4 years ago

Your QA run really had a excessive amount of duplicates:

Ah, each mon logs the command.

2020-02-26T14:06:14.858 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:06:14 smithi198 bash[7136]: audit 2020-02-26T14:06:10.057474+0000 mgr.y (mgr.14131) 108 : audit [DBG] from='client.14292 v1:172.21.15.198:0/3261589327' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi198:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:06:14.858 INFO:ceph.mon.b.smithi086.stdout:Feb 26 14:06:14 smithi086 bash[7084]: audit 2020-02-26T14:06:10.057474+0000 mgr.y (mgr.14131) 108 : audit [DBG] from='client.14292 v1:172.21.15.198:0/3261589327' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi198:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:06:14.859 INFO:ceph.mon.c.smithi177.stdout:Feb 26 14:06:14 smithi177 bash[7060]: audit 2020-02-26T14:06:10.057474+0000 mgr.y (mgr.14131) 108 : audit [DBG] from='client.14292 v1:172.21.15.198:0/3261589327' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi198:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:07:35.375 INFO:ceph.mon.b.smithi086.stdout:Feb 26 14:07:35 smithi086 bash[7084]: audit 2020-02-26T14:07:34.343735+0000 mgr.y (mgr.14131) 154 : audit [DBG] from='client.14400 v1:172.21.15.86:0/3100800274' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi086:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:07:35.378 INFO:ceph.mon.c.smithi177.stdout:Feb 26 14:07:35 smithi177 bash[7060]: audit 2020-02-26T14:07:34.343735+0000 mgr.y (mgr.14131) 154 : audit [DBG] from='client.14400 v1:172.21.15.86:0/3100800274' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi086:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:07:35.380 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:07:35 smithi198 bash[7136]: audit 2020-02-26T14:07:34.343735+0000 mgr.y (mgr.14131) 154 : audit [DBG] from='client.14400 v1:172.21.15.86:0/3100800274' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi086:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:08:55.721 INFO:ceph.mon.b.smithi086.stdout:Feb 26 14:08:55 smithi086 bash[7084]: audit 2020-02-26T14:08:49.921103+0000 mgr.y (mgr.14131) 196 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:08:55.721 INFO:ceph.mon.b.smithi086.stdout:Feb 26 14:08:55 smithi086 bash[7084]: audit 2020-02-26T14:08:50.735133+0000 mgr.y (mgr.14131) 197 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:08:55.722 INFO:ceph.mon.c.smithi177.stdout:Feb 26 14:08:55 smithi177 bash[7060]: audit 2020-02-26T14:08:49.921103+0000 mgr.y (mgr.14131) 196 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:08:55.722 INFO:ceph.mon.c.smithi177.stdout:Feb 26 14:08:55 smithi177 bash[7060]: audit 2020-02-26T14:08:50.735133+0000 mgr.y (mgr.14131) 197 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:49.921103+0000 mgr.y (mgr.14131) 196 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
2020-02-26T14:08:55.723 INFO:ceph.mon.a.smithi198.stdout:Feb 26 14:08:55 smithi198 bash[7136]: audit 2020-02-26T14:08:50.735133+0000 mgr.y (mgr.14131) 197 : audit [DBG] from='client.14508 v1:172.21.15.177:0/4072385491' entity='client.admin' cmd=[{"prefix": "orch osd create", "svc_arg": "smithi177:vg_nvme/lv_4", "target": ["mon-mgr", ""]}]: dispatch
Actions #4

Updated by Sebastian Wagner about 4 years ago

Sage Weil wrote:

An alternative workaround would be to make cephadm look for 'RuntimeError: skipping vg_nvme/lv_4, it is already prepared' in the error output and procede...

I don't know how that should be doable for non-trivial drive groups

Actions #5

Updated by Sebastian Wagner about 4 years ago

  • Has duplicate Bug #44312: ceph-volume prepare is not idempotent and may get called twice added
Actions #6

Updated by Sage Weil about 4 years ago

33755 fixed this for c-v prepare on a single device, which is what teuthology does.

Actions #7

Updated by Sebastian Wagner about 4 years ago

  • Related to Bug #44824: cephadm: adding osd device is not idempotent added
Actions #8

Updated by Sebastian Wagner about 4 years ago

  • Category set to cephadm
  • Status changed from New to Resolved
  • Pull request ID set to 33755

this particular issue was resolved.

Actions

Also available in: Atom PDF