Project

General

Profile

Actions

Bug #52454

closed

mgr/cephadm: orch maintenance enter command failed

Added by Nizamudeen A over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a cluster with 3 hosts and with 3 mons. when I entered the orch maintenance enter command on the host with only a mon it failed on this error.

Aug 30 11:13:54 ceph-node-00.cephlab.com ceph-mgr[13294]: log_channel(audit) log [DBG] : from='client.14199 -' entity='client.admin' cmd=[{"prefix": "orch host maintenance enter", "hostname": "ceph-node-02.cephlab.com", "target": ["mon-mgr", ""]}]: dispatch
Aug 30 11:13:54 ceph-node-00.cephlab.com ceph-mgr[13294]: [cephadm INFO asyncio] poll took 35226.431 ms: 1 events
Aug 30 11:13:54 ceph-node-00.cephlab.com ceph-mgr[13294]: log_channel(cephadm) log [INF] : poll took 35226.431 ms: 1 events
Aug 30 11:13:54 ceph-node-00.cephlab.com ceph-mgr[13294]: log_channel(cluster) log [DBG] : pgmap v121: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
Aug 30 11:13:54 ceph-node-00.cephlab.com conmon[13289]: 2021-08-30T11:13:54.561+0000 7eff25323700 -1 mgr.server reply reply (22) Invalid argument Failed to place ceph-node-02.cephlab.com into maintenance for cluster 261bbac6-0982-11ec-9a2b-5254007ec620
Aug 30 11:13:54 ceph-node-00.cephlab.com conmon[13289]: 
Aug 30 11:13:54 ceph-node-00.cephlab.com ceph-mgr[13294]: mgr.server reply reply (22) Invalid argument Failed to place ceph-node-02.cephlab.com into maintenance for cluster 261bbac6-0982-11ec-9a2b-5254007ec620

Similar error happened while I was exiting a host from maintenance (I added a host in maintenance)

Output of maintenance exit command

Sep 06 15:39:22 ceph-node-00.cephlab.com ceph-mgr[13555]: log_channel(audit) log [DBG] : from='client.14178 -' entity='client.admin' cmd=[{"prefix": "orch host maintenance exit", "hostname": "ceph-node-01.cephlab.com", "target": ["mon-mgr", ""]}]: dispatch
Sep 06 15:39:22 ceph-node-00.cephlab.com ceph-mgr[13555]: [cephadm INFO asyncio] poll took 20869.058 ms: 1 events
Sep 06 15:39:22 ceph-node-00.cephlab.com ceph-mgr[13555]: log_channel(cephadm) log [INF] : poll took 20869.058 ms: 1 events
Sep 06 15:39:22 ceph-node-00.cephlab.com conmon[13550]: 2021-09-06T15:39:22.765+0000 7fc952d5f700 -1 mgr.server reply reply (22) Invalid argument Failed to exit maintenance state for host ceph-node-01.cephlab.com, cluster 3e76f40a-0f27-11ec-b612-525400d48f70
Sep 06 15:39:22 ceph-node-00.cephlab.com conmon[13550]:
Sep 06 15:39:22 ceph-node-00.cephlab.com ceph-mgr[13555]: mgr.server reply reply (22) Invalid argument Failed to exit maintenance state for host ceph-node-01.cephlab.com, cluster 3e76f40a-0f27-11ec-b612-525400d48f70
Sep 06 15:39:23 ceph-node-00.cephlab.com ceph-mgr[13555]: log_channel(cluster) log [DBG] : pgmap v204: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail

From dashboard

Sep 06 15:37:04 ceph-node-00.cephlab.com ceph-mgr[13555]: [dashboard ERROR exception] Dashboard Exception
                                                          Traceback (most recent call last):
                                                            File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 89, in handle_orchestrator_error
                                                              yield
                                                            File "/lib64/python3.6/contextlib.py", line 52, in inner
                                                              return func(*args, **kwds)
                                                            File "/usr/share/ceph/mgr/dashboard/controllers/host.py", line 431, in set
                                                              orch.hosts.exit_maintenance(hostname)
                                                            File "/usr/share/ceph/mgr/dashboard/services/orchestrator.py", line 38, in inner
                                                              raise_if_exception(completion)
                                                            File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 224, in raise_if_exception
                                                              raise e
                                                          orchestrator._interface.OrchestratorError: Failed to exit maintenance state for host ceph-node-01.cephlab.com, cluster 3e76f40a-0f27-11ec-b612-525400d48f70

                                                          During handling of the above exception, another exception occurred:

                                                          Traceback (most recent call last):
                                                            File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 46, in dashboard_exception_handler
                                                              return handler(*args, **kwargs)
                                                            File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
                                                              return self.callable(*self.args, **self.kwargs)
                                                            File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 717, in inner
                                                              ret = func(*args, **kwargs)
                                                            File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 949, in wrapper
                                                              return func(*vpath, **params)
                                                            File "/usr/share/ceph/mgr/dashboard/controllers/orchestrator.py", line 33, in _inner
                                                              return method(self, *args, **kwargs)
                                                            File "/lib64/python3.6/contextlib.py", line 52, in inner
                                                              return func(*args, **kwds)
                                                            File "/lib64/python3.6/contextlib.py", line 99, in __exit__
                                                              self.gen.throw(type, value, traceback)
                                                            File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 91, in handle_orchestrator_error
                                                              raise DashboardException(e, component=component)
                                                          dashboard.exceptions.DashboardException: Failed to exit maintenance state for host ceph-node-01.cephlab.com, cluster 3e76f40a-0f27-11ec-b612-525400d48f70

The mon goes down and I've also noticed other services in that host going into error state. (not sure if its related).

Actions #1

Updated by Nizamudeen A over 2 years ago

  • Description updated (diff)
Actions #2

Updated by Nizamudeen A over 2 years ago

  • Description updated (diff)
Actions #3

Updated by Nizamudeen A over 2 years ago

  • Description updated (diff)
Actions #4

Updated by Nizamudeen A over 2 years ago

  • Description updated (diff)
Actions #5

Updated by Adam King over 2 years ago

  • Status changed from New to Resolved
  • Pull request ID set to 43275
Actions

Also available in: Atom PDF