Project

General

Profile

Actions

Bug #63886

open

Module 'cephadm' has failed: 'cephadm'

Added by Vadym Kukharenko 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/octopus-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello.
I converted my ceph from ceph-ansible to cephadm usage.
After migration to cephadm orchestration I saw that my container was 15.2.13 version and I tried run upgrade to 15.2.17, by command:

ceph orch upgrade start --image quay.io/ceph/ceph:v15.2.17

System didn't do anything with current mgr service, so I tried manually redeploy mgr used next commands:

ceph orch daemon redeploy mgr.ceph03 --image quay.io/ceph/ceph:v15.2.17

So after this steps got RED ceph:

    health: HEALTH_ERR
            Module 'cephadm' has failed: 'cephadm'

I found in the logs extra error and nothing else:

mgr.ceph01 [ERR] Unhandled exception from module 'cephadm' while running on mgr.ceph01: 'cephadm'

Could somebody help me to resolve this issue?
PS If I disable cephadm module, ceph become green.

root@ceph01:~# ceph orch ps
NAME                                                                      HOST    STATUS          REFRESHED  AGE  VERSION    IMAGE NAME                                IMAGE ID      CONTAINER ID
cephadm.f46dc95b01feeedb28941a48e2f1d0abb51139ca828de11150ea7122a8e3549c  ceph01  stopped         74s ago    -    <unknown>  <unknown>                                 <unknown>     <unknown>
cephadm.f46dc95b01feeedb28941a48e2f1d0abb51139ca828de11150ea7122a8e3549c  ceph02  stopped         74s ago    -    <unknown>  <unknown>                                 <unknown>     <unknown>
cephadm.f46dc95b01feeedb28941a48e2f1d0abb51139ca828de11150ea7122a8e3549c  ceph03  stopped         75s ago    -    <unknown>  <unknown>                                 <unknown>     <unknown>
cephadm.f46dc95b01feeedb28941a48e2f1d0abb51139ca828de11150ea7122a8e3549c  ceph04  stopped         75s ago    -    <unknown>  <unknown>                                 <unknown>     <unknown>
cephadm.f46dc95b01feeedb28941a48e2f1d0abb51139ca828de11150ea7122a8e3549c  ceph05  stopped         74s ago    -    <unknown>  <unknown>                                 <unknown>     <unknown>
Actions #1

Updated by Vadym Kukharenko 5 months ago

Here is stacktrace from ceph health:

023-12-22T05:41:08.157670+0000 mgr.ceph03.bgxubk [ERR] _Promise failed
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 294, in _finalize
    next_result = self._on_complete(self._value)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 115, in <lambda>
    return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1350, in describe_service
    hosts=[dd.hostname]
  File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 435, in __init__
    assert service_type in ServiceSpec.KNOWN_SERVICE_TYPES, service_type
AssertionError: cephadm

Actions #2

Updated by Vadym Kukharenko 5 months ago

Vadym Kukharenko wrote:

Hello.
I converted my ceph from ceph-ansible to cephadm usage.
After migration to cephadm orchestration I saw that my container was 15.2.13 version and I tried run upgrade to 15.2.17, by command:
[...]

System didn't do anything with current mgr service, so I tried manually redeploy mgr used next commands:
[...]

So after this steps got RED ceph:
[...]

I found in the logs extra error and nothing else:
[...]
Could somebody help me to resolve this issue?
PS If I disable cephadm module, ceph become green.

[...]

I fixed this issue. Main problem was that after my converting to cephadm and apply osd_spec, but still didn't convert all osd from unmanaged to managed service. So I removed osd service:

ceph orch rm osd.osd_spec

Then you need to clean directory /var/lib/ceph/<fsid>/ from cephadm.f46dc95b01feeedb28941a48e2f1d0abb51139ca828de11150ea7122a8e3549c files at each host.

ceph mgr fail; ceph -W cephadm

That's it.

Actions

Also available in: Atom PDF