Actions
Bug #56239
closedcrash: File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample)
% Done:
0%
Source:
Telemetry
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
372a820cbfc5af971785d9b6af2a345a1670c04429583dc564e357c04a53cf64
8f6cf6368e0ca8ac93beab8b45a0d5013805b9ef39286850ba17798f822e180c
d0ea52fbf30312347be61ce51cd1f6c5483dfaba1767a0eb62791d1f194f3381
ef43174c3be0e2b9ccb951f18b2301de313327d53325698fe20fbd29db555a38
2364791fa429f484e2ac788d520a6c4752a9e95983682b39f621373401ca0734
Crash signature (v2):
Description
Sanitized backtrace:
File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample) File "mgr/devicehealth/module.py", in _get_device_metrics: with self._db_lock, self.db: File "mgr/mgr_module.py", in db: raise MgrDBNotReady();
Crash dump sample:
{ "archived": "2022-06-19 10:32:56.076950", "backtrace": [ " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 764, in get_recent_device_metrics\n return self._get_device_metrics(devid, min_sample=min_sample)", " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 553, in _get_device_metrics\n with self._db_lock, self.db:", " File \"/usr/share/ceph/mgr/mgr_module.py\", line 1203, in db\n raise MgrDBNotReady();", "<redacted>" ], "ceph_version": "17.2.0", "crash_id": "2022-06-18T19:09:19.112675Z_db7d5934-7e5a-4ee8-908e-4ee606f9dd1c", "entity_name": "mgr.8db3d30b2fe0f2dc446f5bc8b03f08b697cf9f58", "mgr_module": "devicehealth", "mgr_module_caller": "ActivePyModule::dispatch_remote get_recent_device_metrics", "mgr_python_exception": "MgrDBNotReady", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mgr", "stack_sig": "bb14694bacd8d2b1a934cf4a3f4a27f50f27e160354c2f796b64991db731505e", "timestamp": "2022-06-18T19:09:19.112675Z", "utsname_machine": "x86_64", "utsname_release": "5.15.0-39-generic", "utsname_sysname": "Linux", "utsname_version": "#42-Ubuntu SMP Thu Jun 9 23:42:32 UTC 2022" }
Files
Updated by Telemetry Bot almost 2 years ago
Updated by Telemetry Bot almost 2 years ago
- Crash signature (v1) updated (diff)
- Affected Versions v17.2.1, v17.2.2 added
Updated by Telemetry Bot 12 months ago
- Crash signature (v1) updated (diff)
- Affected Versions v17.2.3, v17.2.4, v17.2.5, v17.2.6 added
Updated by Laura Flores 11 months ago
- Crash signature (v1) updated (diff)
Happened in the gibba cluster:
[lflores@gibba001 ~]$ sudo ceph -s
cluster:
id: 5363501e-fdf2-11ed-bac8-3cecef3d8fb8
health: HEALTH_WARN
1 pool(s) do not have an application enabled
1 mgr modules have recently crashed
services:
mon: 5 daemons, quorum gibba001,gibba002,gibba005,gibba003,gibba004 (age 38h)
mgr: gibba006.afdywy(active, since 38h), standbys: gibba008.nemumh
osd: 62 osds: 62 up (since 38h), 62 in (since 38h); 18 remapped pgs
rgw: 6 daemons active (6 hosts, 1 zones)
data:
pools: 6 pools, 257 pgs
objects: 83.37M objects, 318 GiB
usage: 1.1 TiB used, 9.4 TiB / 11 TiB avail
pgs: 20809893/250109739 objects misplaced (8.320%)
239 active+clean
18 active+remapped+backfilling
io:
client: 63 KiB/s rd, 0 B/s wr, 63 op/s rd, 42 op/s wr
recovery: 1.0 MiB/s, 266 objects/s
progress:
Global Recovery Event (0s)
[............................]
[lflores@gibba001 ~]$ sudo ceph health detail
HEALTH_WARN 1 pool(s) do not have an application enabled; 1 mgr modules have recently crashed
[WRN] POOL_APP_NOT_ENABLED: 1 pool(s) do not have an application enabled
application not enabled on pool 'foo'
use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.
[WRN] RECENT_MGR_MODULE_CRASH: 1 mgr modules have recently crashed
mgr module devicehealth crashed in daemon mgr.gibba001.nkuepu on host gibba001 at 2023-05-29T07:32:20.873598Z
[lflores@gibba001 ~]$ sudo ceph crash info 2023-05-29T07:32:20.873598Z_0465ae2d-0220-4d9b-9ef8-debf2e6a5d70
{
"backtrace": [
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 764, in get_recent_device_metrics\n return self._get_device_metrics(devid, min_sample=min_sample)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 553, in _get_device_metrics\n with self._db_lock, self.db:",
" File \"/usr/share/ceph/mgr/mgr_module.py\", line 1233, in db\n raise MgrDBNotReady();",
"mgr_module.MgrDBNotReady"
],
"ceph_version": "17.2.6",
"crash_id": "2023-05-29T07:32:20.873598Z_0465ae2d-0220-4d9b-9ef8-debf2e6a5d70",
"entity_name": "mgr.gibba001.nkuepu",
"mgr_module": "devicehealth",
"mgr_module_caller": "ActivePyModule::dispatch_remote get_recent_device_metrics",
"mgr_python_exception": "MgrDBNotReady",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "fbbc6a4724a20738af8118fb5d84831008735002870daa3a76853a0dcaaa3f92",
"timestamp": "2023-05-29T07:32:20.873598Z",
"utsname_hostname": "gibba001",
"utsname_machine": "x86_64",
"utsname_release": "4.18.0-301.1.el8.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Tue Apr 13 16:24:22 UTC 2021"
}
From the mgr log:
2023-05-29T07:32:20.746+0000 7fe13d427700 0 [telemetry INFO root] Compiling and sending report to https://telemetry.ceph.com/report
2023-05-29T07:32:20.764+0000 7fe13d427700 0 [telemetry INFO root] Sending ceph report to: https://telemetry.ceph.com/report
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev c158f0be-5ee5-43ec-9dc4-5754658550ba does not exist
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev b16c5b1b-f70c-4902-a80a-58955b08c131 does not exist
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev d8460a9b-583b-4f9d-849c-3ed28768bbff does not exist
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev fbae7d8f-22a6-4ca3-8304-18a178d62c55 does not exist
2023-05-29T07:32:20.796+0000 7fe15c602700 0 [progress WARNING root] complete: ev 91c8d6fc-a976-4651-84f9-72dbc59c52b5 does not exist
2023-05-29T07:32:20.797+0000 7fe15c602700 0 [progress WARNING root] complete: ev 12f6ceb0-d855-4345-95cf-616f4429160b does not exist
2023-05-29T07:32:20.797+0000 7fe15c602700 0 [progress WARNING root] complete: ev b9af52da-d16d-4106-89b2-eb2220aff415 does not exist
2023-05-29T07:32:20.797+0000 7fe15c602700 0 [progress WARNING root] complete: ev 40bdf7b1-80d7-4fd3-beb6-069b394d7f31 does not exist
2023-05-29T07:32:20.821+0000 7fe1843a6700 0 [prometheus INFO cherrypy.error] [29/May/2023:07:32:20] ENGINE Serving on http://:::9283
2023-05-29T07:32:20.821+0000 7fe1843a6700 0 [prometheus INFO cherrypy.error] [29/May/2023:07:32:20] ENGINE Bus STARTED
2023-05-29T07:32:20.821+0000 7fe1843a6700 0 [prometheus INFO root] Engine started.
2023-05-29T07:32:20.871+0000 7fe13d427700 0 [telemetry INFO root] Sent report to https://telemetry.ceph.com/report
2023-05-29T07:32:20.872+0000 7fe13d427700 -1 Remote method threw exception: Traceback (most recent call last):
File "/usr/share/ceph/mgr/devicehealth/module.py", line 764, in get_recent_device_metrics
return self._get_device_metrics(devid, min_sample=min_sample)
File "/usr/share/ceph/mgr/devicehealth/module.py", line 553, in _get_device_metrics
with self._db_lock, self.db:
File "/usr/share/ceph/mgr/mgr_module.py", line 1233, in db
raise MgrDBNotReady();
mgr_module.MgrDBNotReady
2023-05-29T07:32:20.872+0000 7fe13d427700 0 [telemetry ERROR root] Unable to get recent metrics from device with id "TOSHIBA_MG04ACA1_Y9I3K2IYF6XF": Remote method threw exception: Traceback (most recent call last):
File "/usr/share/ceph/mgr/devicehealth/module.py", line 764, in get_recent_device_metrics
return self._get_device_metrics(devid, min_sample=min_sample)
File "/usr/share/ceph/mgr/devicehealth/module.py", line 553, in _get_device_metrics
with self._db_lock, self.db:
File "/usr/share/ceph/mgr/mgr_module.py", line 1233, in db
raise MgrDBNotReady();
mgr_module.MgrDBNotReady
2023-05-29T07:32:20.872+0000 7fe13d427700 0 [telemetry ERROR root] Unable to send device report: Device channel is on, but the generated report was empty.
Updated by Laura Flores 11 months ago
Updated by Laura Flores 11 months ago
Could this be an sqlite issue rather than a problem with the devicehealth module?
src/pybind/mgr/mgr_module.py
1223 @property
1224 def db(self) -> sqlite3.Connection:
1225 assert self._db_lock.locked()
1226 if self._db is not None:
1227 return self._db
1228 db_allowed = self.get_ceph_option("mgr_pool")
1229 if not db_allowed:
1230 raise MgrDBNotReady();
1231 self._db = self.open_db()
1232 if self._db is None:
1233 raise MgrDBNotReady();
1234 return self._db
Updated by Yaarit Hatuka 11 months ago
- Project changed from mgr to cephsqlite
- Category deleted (
devicehealth module)
Looks like a sqlite issue; Patrick, can you please take a look?
Updated by Patrick Donnelly 11 months ago
- Status changed from New to Fix Under Review
- Assignee set to Patrick Donnelly
- Target version set to v19.0.0
- Backport set to reef,quincy,pacific
- Pull request ID set to 51858
Updated by Patrick Donnelly 11 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 11 months ago
- Copied to Backport #61834: quincy: crash: File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample) added
Updated by Backport Bot 11 months ago
- Copied to Backport #61835: pacific: crash: File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample) added
Updated by Backport Bot 11 months ago
- Copied to Backport #61836: reef: crash: File "mgr/devicehealth/module.py", in get_recent_device_metrics: return self._get_device_metrics(devid, min_sample=min_sample) added
Updated by Patrick Donnelly 6 months ago
- Status changed from Pending Backport to Resolved
Actions