Bug #45973
closedAdopted MDS daemons are removed by the orchestrator because they're orphans
0%
Description
The docs say that when converting to cephadm, one needs to redeploy MDS daemons. However, it is possible to adopt them (cephadm adopt [...] --name mds.myhost
seems to work just fine). The problem is that shortly after being adopted, the cephadm orchestrator decides that the MDS is an orphan (there's no service spec), and goes and removes the daemon.
If the correct procedure is always to redeploy, and never to adopt an MDS, then cephadm adopt
should be presumably be changed to refuse to adopt MDSes (the same is possibly true for RGW, but I haven't verified this).
If, on the other hand, it's permitted to adopt an MDS, then I guess a service spec needs to be created for it automatically?
What's the right thing to do here?
Updated by Sebastian Wagner almost 4 years ago
Hm. Isn't this a big flaw in adopt, not just for MDS?
We might need to apply something like this before adopting any daemons
service_type: mds service_id: XXX unmanaged: true
And run something like
service_type: mds service_id: XXX unmanaged: false placement: ...
after the adoption is done.
Updated by Tim Serong almost 4 years ago
Sebastian Wagner wrote:
Hm. Isn't this a big flaw in adopt, not just for MDS?
Not in practice so far. The docs say to adopt MON, MGR and OSD, and to redeploy everything else. The cephadm orchestrator doesn't care if MON, MGR and OSD don't have service specs (https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/module.py#L1889), so doesn't remove them as orphans.
That said, the comment on https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/module.py#L1890 claims that MON and MGR specs should always exist, and the fact that they don't after an adopt may mean that this all only works by accident. In which case, yes, this probably needs some attention.
We might need to apply something like this before adopting any daemons
[...]
How do we apply service specs before adoption? The orchestrator can't be enabled until after MONs and MGRs are adopted...
Updated by Sebastian Wagner almost 4 years ago
It's not an accident that this is working. OTOH, this needs behavior needs improvement. Let me think about the chicken-and-egg problem a bit.
Updated by Sebastian Wagner almost 4 years ago
- Priority changed from Normal to High
Updated by Tim Serong almost 4 years ago
We have the same problem with adopted prometheus instances (I adopted one, it was working fine for a few minutes, then the orhcestrator went and removed it)
Updated by Sebastian Wagner almost 4 years ago
- Status changed from New to Fix Under Review
- Assignee set to Sebastian Wagner
- Pull request ID set to 35669
Updated by Sebastian Wagner over 3 years ago
- Status changed from Fix Under Review to New
Updated by Sebastian Wagner over 3 years ago
- Assignee deleted (
Sebastian Wagner)
Updated by Sebastian Wagner about 3 years ago
- Priority changed from High to Low
prio=low. probably easier to simply redeploy MDS for upstream and find a typical downstream solution for downstream.
Updated by Sebastian Wagner about 3 years ago
- Related to Bug #46561: cephadm: monitoring services adoption doesn't honor the container image added
Updated by Sebastian Wagner about 3 years ago
- Status changed from New to Rejected
fixed by both downstreams