Bug #23300
closedceph-mgr returns internal error
0%
Description
Hello,
after some weeks of running a new ceph cluster, we get the following answer from the mgr:
black3.place6:~# curl http://[2a0a:e5c0:2:1:20d:b9ff:fe48:3bb8]:9283/metrics
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>
<title>500 Internal Server Error</title>
<style type="text/css">
#powered_by {
margin-top: 20px;
border-top: 2px solid black;
font-style: italic;
}
#traceback {
color: red;
}
</style>
</head>
<body>
<h2>500 Internal Server Error</h2>
<p>The server encountered an unexpected condition which prevented it from fulfilling the request.</p>
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, in respond
response.body = self.handler()
File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 217, in __call__
self.body = self.oldhandler(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, in __call__
return self.callable(*self.args, **self.kwargs)
File "/usr/lib/ceph/mgr/prometheus/module.py", line 414, in metrics
metrics = global_instance().collect()
File "/usr/lib/ceph/mgr/prometheus/module.py", line 351, in collect
self.get_metadata_and_osd_status()
File "/usr/lib/ceph/mgr/prometheus/module.py", line 310, in get_metadata_and_osd_status
dev_class['class'],
KeyError: 'class'
<div id="powered_by">
<span>
Powered by <a href="http://www.cherrypy.org&quot;&gt;CherryPy 3.5.0</a>
</span>
</div>
</body>
</html>
Changing / starting another mgr does not fix this problem.
Using 12.2.4-1~bpo90+1 on Devuan Ascii
Files
Updated by Nico Schottelius about 6 years ago
Fun fact: it used to run fine until we were introducing new crush rules and changing the crush rule for a pool:
ceph osd crush rule create-replicated hdd-small default host hdd-small ceph osd crush rule create-replicated hdd-big default host hdd-big ceph osd pool set hdd crush_rule hdd-big
Updated by Nico Schottelius about 6 years ago
Found it! We had several osds without a device class attached, because we did not want to use them at the moment.
Adding a "fake" class to it fixed the mgr's prometheus interface.
[20:23:10] server1.place6:~# ceph osd crush set-device-class notinuse 12 14 11 13 25 4 set osd(s) 4,11,12,13,14,25 to class 'notinuse' [20:25:20] server1.place6:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 125.59419 root default -7 46.60368 host server2 15 hdd-big 9.09511 osd.15 up 1.00000 1.00000 20 hdd-big 9.09511 osd.20 up 1.00000 1.00000 21 hdd-big 9.09511 osd.21 up 1.00000 1.00000 7 hdd-small 4.54776 osd.7 up 1.00000 1.00000 8 hdd-small 4.54776 osd.8 up 1.00000 1.00000 10 hdd-small 4.54776 osd.10 up 1.00000 1.00000 12 notinuse 0.21767 osd.12 up 1.00000 1.00000 14 notinuse 5.45741 osd.14 up 1.00000 1.00000 -5 42.50967 host server3 9 hdd-big 9.09511 osd.9 up 1.00000 1.00000 16 hdd-big 9.09511 osd.16 up 1.00000 1.00000 19 hdd-big 9.09511 osd.19 up 1.00000 1.00000 3 hdd-small 4.54776 osd.3 up 1.00000 1.00000 5 hdd-small 4.54776 osd.5 up 1.00000 1.00000 6 hdd-small 4.54776 osd.6 up 1.00000 1.00000 11 notinuse 0.45424 osd.11 up 1.00000 1.00000 13 notinuse 0.90907 osd.13 up 1.00000 1.00000 25 notinuse 0.21776 osd.25 up 1.00000 1.00000 -2 36.48083 host server4 2 hdd-big 9.09511 osd.2 up 1.00000 1.00000 17 hdd-big 9.09511 osd.17 up 1.00000 1.00000 18 hdd-big 9.09511 osd.18 up 1.00000 1.00000 0 hdd-small 4.54776 osd.0 up 1.00000 1.00000 1 hdd-small 4.54776 osd.1 up 1.00000 1.00000 4 notinuse 0.09999 osd.4 up 1.00000 1.00000 [20:26:39] server1.place6:~#
Updated by John Spray about 6 years ago
- Category set to prometheus module
- Status changed from New to Duplicate
This was fixed in master recently and is being backported to luminous here: https://github.com/ceph/ceph/pull/20642