Bug #58921
openMgr crashing with dashboard module enabled in 16.2.9
0%
Description
Hi,
We noticed some issues with the orchestrator. We'added new hosts with new drives which aren't automatically detected by the orchestrator. Checking the mgr logs I noticed it was crashing when having the dashboard module enabled (maybe the path has an extra backslash in the code) :
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: Internal Server Error
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: Traceback (most recent call last):
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: File "/lib/python3.6/site-packages/cherrypy/lib/static.py", line 58, in serve_file
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: st = os.stat(path)
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: FileNotFoundError: [Errno 2] No such file or directory: '/usr/share/ceph/mgr/dashboard/frontend/dist/en-US/prometheus_receiver'
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: During handling of the above exception, another exception occurred:
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: Traceback (most recent call last):
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: return handler(*args, **kwargs)
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in call
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: return self.callable(*self.args, **self.kwargs)
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: File "/usr/share/ceph/mgr/dashboard/controllers/home.py", line 135, in call
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: return serve_file(full_path)
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: File "/lib/python3.6/site-packages/cherrypy/lib/static.py", line 65, in serve_file
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: raise cherrypy.NotFound()
Feb 01 10:01:08 ds-ceph01-madrid bash2829574: cherrypy._cperror.NotFound: (404, "The path '/prometheus_receiver' was not found.")
After disabling the dashboard module, the new drives were detected and the new osd containers (docker) were deployed.
However, I now noticed another orch issue even with the dashboard disabled :
- I have a failed drive (osd.92)
- the drive was marked as down and out, the rebalancing was fine
- I'm trying to purge the osd after the rebalancing was completed in order to ask for a replacement with "ceph orch osd rm osd.92 --force".
- the purge does nothing :
ceph orch osd rm status
OSD HOST STATE PGS REPLACE FORCE ZAP DRAIN STARTED AT
92 node10 started 0 False True False
- the osd daemons are not refreshed :
ceph orch ps --daemon_type osd --daemon_id 92
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID
osd.92 node10 error 3d ago 4w - 4096M <unknown> <unknown>
- I don't have any other errors in the mgr logs even with debug 20 activated