Project

General

Profile

Actions

Bug #58652

closed

Module 'devicehealth' has failed: disk I/O error

Added by satish patel about 1 year ago. Updated about 1 year ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
ceph-mgr
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Quincy
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I am running Quincy (17.2.5) release on 3 node ceph on top of Ubuntu 22.04 and today encounter following error:

root@ceph1:~# ceph -s
cluster:
id: cd748128-a3ea-11ed-9e46-c309158fad32
health: HEALTH_ERR

1 mgr modules have recently crashed
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 2d)
mgr: ceph1.ckfkeb(active, since 6h), standbys: ceph2.aaptny
osd: 9 osds: 9 up (since 2d), 9 in (since 2d)
data:
pools: 4 pools, 128 pgs
objects: 1.18k objects, 4.7 GiB
usage: 17 GiB used, 16 TiB / 16 TiB avail
pgs: 128 active+clean

root@ceph1:~# ceph health
HEALTH_ERR Module 'devicehealth' has failed: disk I/O error; 1 mgr modules have recently crashed
root@ceph1:~# ceph crash ls
ID ENTITY NEW
2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035 mgr.ceph1.ckfkeb *
root@ceph1:~# ceph crash info 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035 {
"backtrace": [
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373, in serve\n self.scrape_all()",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425, in scrape_all\n self.put_device_metrics(device, data)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500, in put_device_metrics\n self._create_device(devid)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487, in _create_device\n cursor = self.db.execute(SQL, (devid,))",
"sqlite3.OperationalError: disk I/O error"
],
"ceph_version": "17.2.5",
"crash_id": "2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035",
"entity_name": "mgr.ceph1.ckfkeb",
"mgr_module": "devicehealth",
"mgr_module_caller": "PyModuleRunner::serve",
"mgr_python_exception": "OperationalError",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "7e506cc2729d5a18403f0373447bb825b42aafa2405fb0e5cfffc2896b093ed8",
"timestamp": "2023-02-07T00:07:12.739187Z",
"utsname_hostname": "ceph1",
"utsname_machine": "x86_64",
"utsname_release": "5.15.0-58-generic",
"utsname_sysname": "Linux",
"utsname_version": "#64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023"

  1. My auth for MGR

mgr.ceph1.ckfkeb
key: AQCrSN1jQBmkGxAAcJgLndLU7r4uMETZOWmFwg==
caps: [mds] allow *
caps: [mon] profile mgr
caps: [osd] allow *
mgr.ceph2.aaptny
key: AQBOS91jadj8GhAAfizKqqFJ1C5UJgX6+msb4Q==
caps: [mds] allow *
caps: [mon] profile mgr
caps: [osd] allow *


Related issues 1 (0 open1 closed)

Is duplicate of cephsqlite - Bug #55606: [ERR] Unhandled exception from module ''devicehealth'' while running on mgr.y: unknownResolvedPatrick Donnelly

Actions
Actions #1

Updated by Patrick Donnelly about 1 year ago

  • Status changed from New to Duplicate
Actions #2

Updated by Patrick Donnelly about 1 year ago

  • Is duplicate of Bug #55606: [ERR] Unhandled exception from module ''devicehealth'' while running on mgr.y: unknown added
Actions

Also available in: Atom PDF