Project

General

Profile

Bug #45327

cephadm: Orch daemon add is not idempotent

Added by Sebastian Wagner over 1 year ago. Updated 8 months ago.

Status:
Closed
Priority:
High
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

audit [DBG] from='client.14180 v1:172.21.15.139:0/1866757471' entity='client.admin' cmd=[{"prefix": "orch daemon add", "daemon_type": "mon", "placement": "smithi139:[v1:172.21.15.139:6789]=c", "target": ["mon-mgr", ""]}]: dispatch
audit [INF] from='mgr.14141 v1:172.21.15.131:0/1812629321' entity='mgr.y' cmd=[{"prefix": "auth get", "entity": "mon."}]: dispatch
audit [DBG] from='mgr.14141 v1:172.21.15.131:0/1812629321' entity='mgr.y' cmd=[{"prefix": "config generate-minimal-conf"}]: dispatch
cephadm [INF] Deploying daemon mon.c on smithi139
audit [DBG] from='mgr.14141 v1:172.21.15.131:0/1812629321' entity='mgr.y' cmd=[{"prefix": "config get", "who": "mon.c", "key": "container_image"}]: dispatch
audit [DBG] from='client.14180 v1:172.21.15.139:0/1866757471' entity='client.admin' cmd=[{"prefix": "orch daemon add", "daemon_type": "mon", "placement": "smithi139:[v1:172.21.15.139:6789]=c", "target": ["mon-mgr", ""]}]: dispatch

Remote method threw exception: Traceback (most recent call last):
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1153, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 110, in handle_command
    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 308, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 72, in <lambda>
    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 63, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 658, in _daemon_add_misc
    completion = self.add_mon(spec)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 1542, in inner
    completion = self._oremote(method_name, args, kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 1614, in _oremote
    return mgr.remote(o, meth, *args, **kwargs)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1515, in remote
    args, kwargs)
RuntimeError: Remote method threw exception: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/module.py", line 2740, in add_mon
    return self._add_daemon('mon', spec, self._create_mon)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 2652, in _add_daemon
    create_func, config_func)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 2667, in _create_daemons
    spec.service_id, name)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1228, in get_unique_name
    raise orchestrator.OrchestratorValidationError('name %s already in use', forcename)
orchestrator._interface.OrchestratorValidationError: ('name %s already in use', 'c')

http://pulpito.ceph.com/teuthology-2020-04-27_03:30:02-rados-octopus-distro-basic-smithi/4988571/


Related issues

Related to Orchestrator - Bug #44824: cephadm: adding osd device is not idempotent New
Related to Orchestrator - Bug #52742: octopus: orchestrator._interface.OrchestratorValidationError: name mon.c already in use Won't Fix
Duplicated by Orchestrator - Bug #45296: cephadm: daemon add mon failure: orchestrator._interface.OrchestratorValidationError: ('name %s already in use', 'b') Duplicate
Duplicated by Orchestrator - Bug #46854: Error ENOENT: name mon.smithi074 already in use seen on octopus Duplicate
Duplicated by Orchestrator - Bug #47709: orchestrator._interface.OrchestratorValidationError: name mon.c already in use Duplicate

History

#1 Updated by Sebastian Wagner over 1 year ago

  • Subject changed from Orch daemon add is not idempotent to cephadm: Orch daemon add is not idempotent

#2 Updated by Sebastian Wagner over 1 year ago

  • Priority changed from Normal to High

#3 Updated by Sebastian Wagner over 1 year ago

  • Duplicated by Bug #45296: cephadm: daemon add mon failure: orchestrator._interface.OrchestratorValidationError: ('name %s already in use', 'b') added

#4 Updated by Sebastian Wagner over 1 year ago

Ok,

ceph orch apply ...

is already idempotent.

On the other hand,

ceph orch daemon add ...

is supposed to be raw and direct without any magic behind it. I for one would prefer to have Teuthology invoke `apply` instead of `daemon add`.

#5 Updated by Joshua Schmid over 1 year ago

I'd argue that we should only maintain one command for service creation and I would recommend going with for `apply`.

We should adapt the teuthology codepath to use/support `apply` as well.

#6 Updated by Neha Ojha over 1 year ago

  • Backport set to octopus

/a/yuriw-2020-05-02_20:02:46-rados-wip-yuri6-testing-2020-04-30-2259-octopus-distro-basic-smithi/5016611/

#7 Updated by Sebastian Wagner over 1 year ago

  • Source set to Q/A

#8 Updated by Sebastian Wagner over 1 year ago

  • Related to Bug #44824: cephadm: adding osd device is not idempotent added

#9 Updated by Sebastian Wagner over 1 year ago

`daemon add` is too low level. If we want commands to be idempotent, we have to remove calling them cephadm.py

#10 Updated by Sebastian Wagner over 1 year ago

https://pulpito.ceph.com/swagner-2020-08-03_12:11:23-rados:cephadm-wip-swagner-testing-2020-08-03-1038-distro-basic-smithi/5284050/

> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:4502cc1b194810b439c691dc8aeabee06dcc5c6b shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 2523a68e-d589-11ea-a070-001a4aab830c -- ceph orch daemon add mon smithi193:172.21.15.193=smithi193
cephadm 2020-08-03T13:02:24.253615+0000 mgr.smithi089.ppezhp (mgr.14169) 63 : cephadm [INF] Deploying daemon mgr.smithi193.xqzdep on smithi193
cluster 2020-08-03T13:02:24.597060+0000 mgr.smithi089.ppezhp (mgr.14169) 64 : cluster [DBG] pgmap v34: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
audit 2020-08-03T13:02:25.255138+0000 mon.smithi089 (mon.0) 278 : audit [INF] from='client.? 172.21.15.89:0/2356875247' entity='client.admin' cmd='[{"prefix": "osd crush tunables", "profile": "default"}]': finished
cluster 2020-08-03T13:02:25.255187+0000 mon.smithi089 (mon.0) 279 : cluster [DBG] osdmap e7: 0 total, 0 up, 0 in
audit 2020-08-03T13:02:25.817810+0000 mon.smithi089 (mon.0) 282 : audit [DBG] from='mgr.14169 172.21.15.89:0/385454869' entity='mgr.smithi089.ppezhp' cmd=[{"prefix": "config get", "who": "mon", "key": "public_network"}]: dispatch
audit 2020-08-03T13:02:25.818730+0000 mon.smithi089 (mon.0) 283 : audit [INF] from='mgr.14169 172.21.15.89:0/385454869' entity='mgr.smithi089.ppezhp' cmd=[{"prefix": "auth get", "entity": "mon."}]: dispatch
audit 2020-08-03T13:02:25.819219+0000 mon.smithi089 (mon.0) 284 : audit [DBG] from='mgr.14169 172.21.15.89:0/385454869' entity='mgr.smithi089.ppezhp' cmd=[{"prefix": "config get", "who": "mon", "key": "public_network"}]: dispatch
audit 2020-08-03T13:02:25.819717+0000 mon.smithi089 (mon.0) 285 : audit [DBG] from='mgr.14169 172.21.15.89:0/385454869' entity='mgr.smithi089.ppezhp' cmd=[{"prefix": "config generate-minimal-conf"}]: dispatch
audit 2020-08-03T13:02:25.820440+0000 mon.smithi089 (mon.0) 286 : audit [DBG] from='mgr.14169 172.21.15.89:0/385454869' entity='mgr.smithi089.ppezhp' cmd=[{"prefix": "config get", "who": "mon.smithi193", "key": "container_image"}]: dispatch
cephadm 2020-08-03T13:02:25.820087+0000 mgr.smithi089.ppezhp (mgr.14169) 65 : cephadm [INF] Deploying daemon mon.smithi193 on smithi193
cluster 2020-08-03T13:02:26.597340+0000 mgr.smithi089.ppezhp (mgr.14169) 66 : cluster [DBG] pgmap v36: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
Error EINVAL: name mon.smithi193 already in use

#11 Updated by Sebastian Wagner over 1 year ago

  • Duplicated by Bug #46854: Error ENOENT: name mon.smithi074 already in use seen on octopus added

#12 Updated by Sebastian Wagner about 1 year ago

  • Duplicated by Bug #47709: orchestrator._interface.OrchestratorValidationError: name mon.c already in use added

#13 Updated by Sebastian Wagner 12 months ago

ok:

We must not call daemon add in Teuthology. It's just too low level and not meant to be idempotent.

Would be great, if someone want to take this.

#14 Updated by Juan Miguel Olmo Martínez 12 months ago

  • Assignee set to Juan Miguel Olmo Martínez

#15 Updated by Deepika Upadhyay 9 months ago

te_daemons
2021-02-28T12:07:16.867 INFO:journalctl@ceph.mgr.y.smithi014.stdout:Feb 28 12:07:16 smithi014 bash[11619]:     forcename=name)
2021-02-28T12:07:16.867 INFO:journalctl@ceph.mgr.y.smithi014.stdout:Feb 28 12:07:16 smithi014 bash[11619]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 535, in get_unique_name
2021-02-28T12:07:16.867 INFO:journalctl@ceph.mgr.y.smithi014.stdout:Feb 28 12:07:16 smithi014 bash[11619]:     f'name {daemon_type}.{forcename} already in use')
2021-02-28T12:07:16.867 INFO:journalctl@ceph.mgr.y.smithi014.stdout:Feb 28 12:07:16 smithi014 bash[11619]: orchestrator._interface.OrchestratorValidationError: name mon.c already in use
2021-02-28T12:07:16.868 INFO:journalctl@ceph.mgr.y.smithi014.stdout:Feb 28 12:07:16 smithi014 bash[11619]: debug 2021-02-28T12:07:16.470+0000 7f10ff1a3700 -1 mgr.server reply reply (22) Invalid argument name mon.c already in use
2021-02-28T12:07:17.178 INFO:journalctl@ceph.mon.b.smithi063.stdout:Feb 28 12:07:16 smithi063 bash[11262]: audit 2021-02-28T12:07:16.458970+0000 mon.a (mon.0) 172 : audit [DBG] from='mgr.14138 v1:172.21.15.14:0/505964003' entity='mgr.y' cmd=[{"prefix": "config get", "who": "mon", "key": "container_image"}]: dispatch
2021-02-28T12:07:17.178 INFO:journalctl@ceph.mon.b.smithi063.stdout:Feb 28 12:07:16 smithi063 bash[11262]: audit 2021-02-28T12:07:16.461861+0000 mon.a (mon.0) 173 : audit [INF] from='mgr.14138 v1:172.21.15.14:0/505964003' entity='mgr.y'
2021-02-28T12:07:17.363 INFO:journalctl@ceph.mon.a.smithi014.stdout:Feb 28 12:07:16 smithi014 bash[11359]: audit 2021-02-28T12:07:16.458970+0000 mon.a (mon.0) 172 : audit [DBG] from='mgr.14138 v1:172.21.15.14:0/505964003' entity='mgr.y' cmd=[{"prefix": "config get", "who": "mon", "key": "container_image"}]: dispatch
2021-02-28T12:07:17.364 INFO:journalctl@ceph.mon.a.smithi014.stdout:Feb 28 12:07:16 smithi014 bash[11359]: audit 2021-02-28T12:07:16.461861+0000 mon.a (mon.0) 173 : audit [INF] from='mgr.14138 v1:172.21.15.14:0/505964003' entity='mgr.y'
2021-02-28T12:07:17.492 DEBUG:teuthology.orchestra.run:got remote process result: 22
2021-02-28T12:07:17.493 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):

/ceph/teuthology-archive/teuthology-2021-02-24_03:30:04-rados-octopus-distro-basic-smithi/5910301/teuthology.log"

#16 Updated by Sage Weil 8 months ago

  • Status changed from New to Fix Under Review
  • Backport changed from octopus to pacific
  • Pull request ID set to 40314

we can't rely on 'orch apply mon ...' in octopus, at least not with the current scheduler, since it can't handle multiple daemons of the same type on the same host.

#17 Updated by Deepika Upadhyay 8 months ago

  description: rados/thrash-old-clients/{0-size-min-size-overrides/3-size-2-min-size
    1-install/luminous backoff/peering_and_degraded ceph clusters/{openstack three-plus-one}
    d-balancer/off distro$/{ubuntu_18.04} msgr-failures/osd-delay rados thrashers/default
    thrashosds-health workloads/test_rbd_api}

2021-04-01T16:44:18.342 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]: Traceback (most recent call last):
2021-04-01T16:44:18.343 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 294, in _finalize
2021-04-01T16:44:18.343 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:     next_result = self._on_complete(self._value)
2021-04-01T16:44:18.343 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 107, in <lambda>
2021-04-01T16:44:18.344 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:     return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
2021-04-01T16:44:18.344 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1917, in add_mon
2021-04-01T16:44:18.344 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:     return self._add_daemon('mon', spec, self.mon_service.prepare_create)
2021-04-01T16:44:18.344 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1860, in _add_daemon
2021-04-01T16:44:18.344 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:     create_func, config_func)
2021-04-01T16:44:18.345 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 1880, in _create_daemons
2021-04-01T16:44:18.345 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:     forcename=name)
2021-04-01T16:44:18.345 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 535, in get_unique_name
2021-04-01T16:44:18.345 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]:     f'name {daemon_type}.{forcename} already in use')
2021-04-01T16:44:18.346 INFO:journalctl@ceph.mgr.y.smithi033.stdout:Apr 01 16:44:17 smithi033 bash[12352]: orchestrator._interface.OrchestratorValidationError: name mon.b already in use

/ceph/teuthology-archive/yuriw-2021-04-01_15:23:17-rados-wip-yuri-testing-2021-03-31-1516-octopus-distro-basic-smithi/6015105/teuthology.log

#18 Updated by Sebastian Wagner 8 months ago

  • Status changed from Fix Under Review to Closed

#19 Updated by Sebastian Wagner 2 months ago

  • Related to Bug #52742: octopus: orchestrator._interface.OrchestratorValidationError: name mon.c already in use added

Also available in: Atom PDF