Project

General

Profile

Actions

Bug #45973

closed

Adopted MDS daemons are removed by the orchestrator because they're orphans

Added by Tim Serong almost 4 years ago. Updated about 3 years ago.

Status:
Rejected
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The docs say that when converting to cephadm, one needs to redeploy MDS daemons. However, it is possible to adopt them (cephadm adopt [...] --name mds.myhost seems to work just fine). The problem is that shortly after being adopted, the cephadm orchestrator decides that the MDS is an orphan (there's no service spec), and goes and removes the daemon.

If the correct procedure is always to redeploy, and never to adopt an MDS, then cephadm adopt should be presumably be changed to refuse to adopt MDSes (the same is possibly true for RGW, but I haven't verified this).

If, on the other hand, it's permitted to adopt an MDS, then I guess a service spec needs to be created for it automatically?

What's the right thing to do here?


Related issues 1 (1 open0 closed)

Related to Orchestrator - Bug #46561: cephadm: monitoring services adoption doesn't honor the container imageNew

Actions
Actions #1

Updated by Sebastian Wagner almost 4 years ago

Hm. Isn't this a big flaw in adopt, not just for MDS?

We might need to apply something like this before adopting any daemons

service_type: mds
service_id: XXX
unmanaged: true

And run something like

service_type: mds
service_id: XXX
unmanaged: false
placement: ...

after the adoption is done.

Actions #2

Updated by Tim Serong almost 4 years ago

Sebastian Wagner wrote:

Hm. Isn't this a big flaw in adopt, not just for MDS?

Not in practice so far. The docs say to adopt MON, MGR and OSD, and to redeploy everything else. The cephadm orchestrator doesn't care if MON, MGR and OSD don't have service specs (https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/module.py#L1889), so doesn't remove them as orphans.

That said, the comment on https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/module.py#L1890 claims that MON and MGR specs should always exist, and the fact that they don't after an adopt may mean that this all only works by accident. In which case, yes, this probably needs some attention.

We might need to apply something like this before adopting any daemons
[...]

How do we apply service specs before adoption? The orchestrator can't be enabled until after MONs and MGRs are adopted...

Actions #3

Updated by Sebastian Wagner almost 4 years ago

It's not an accident that this is working. OTOH, this needs behavior needs improvement. Let me think about the chicken-and-egg problem a bit.

Actions #4

Updated by Sebastian Wagner almost 4 years ago

  • Priority changed from Normal to High
Actions #5

Updated by Tim Serong almost 4 years ago

We have the same problem with adopted prometheus instances (I adopted one, it was working fine for a few minutes, then the orhcestrator went and removed it)

Actions #6

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Sebastian Wagner
  • Pull request ID set to 35669
Actions #7

Updated by Sebastian Wagner over 3 years ago

  • Status changed from Fix Under Review to New
Actions #8

Updated by Sebastian Wagner over 3 years ago

  • Assignee deleted (Sebastian Wagner)
Actions #9

Updated by Sebastian Wagner over 3 years ago

  • Pull request ID deleted (35669)

still open

Actions #10

Updated by Sebastian Wagner about 3 years ago

  • Priority changed from High to Low

prio=low. probably easier to simply redeploy MDS for upstream and find a typical downstream solution for downstream.

Actions #11

Updated by Sebastian Wagner about 3 years ago

  • Related to Bug #46561: cephadm: monitoring services adoption doesn't honor the container image added
Actions #12

Updated by Sebastian Wagner about 3 years ago

  • Status changed from New to Rejected

fixed by both downstreams

Actions

Also available in: Atom PDF