Bug #58652: Module 'devicehealth' has failed: disk I/O error - mgr - Ceph

Actions

Copy link

Bug #58652

closed

Module 'devicehealth' has failed: disk I/O error

Added by satish patel about 1 year ago. Updated about 1 year ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

ceph-mgr

Target version:

% Done:

Source:

Community (user)

Tags:

Quincy

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I am running Quincy (17.2.5) release on 3 node ceph on top of Ubuntu 22.04 and today encounter following error:

root@ceph1:~# ceph -s
cluster:
id: cd748128-a3ea-11ed-9e46-c309158fad32
health: HEALTH_ERR

1 mgr modules have recently crashed

services:
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 2d)
    mgr: ceph1.ckfkeb(active, since 6h), standbys: ceph2.aaptny
    osd: 9 osds: 9 up (since 2d), 9 in (since 2d)

data:
    pools:   4 pools, 128 pgs
    objects: 1.18k objects, 4.7 GiB
    usage:   17 GiB used, 16 TiB / 16 TiB avail
    pgs:     128 active+clean

root@ceph1:~# ceph health
HEALTH_ERR Module 'devicehealth' has failed: disk I/O error; 1 mgr modules have recently crashed
root@ceph1:~# ceph crash ls
ID ENTITY NEW
2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035 mgr.ceph1.ckfkeb *
root@ceph1:~# ceph crash info 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035 {
"backtrace": [
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373, in serve\n self.scrape_all()",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425, in scrape_all\n self.put_device_metrics(device, data)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500, in put_device_metrics\n self._create_device(devid)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487, in _create_device\n cursor = self.db.execute(SQL, (devid,))",
"sqlite3.OperationalError: disk I/O error"
],
"ceph_version": "17.2.5",
"crash_id": "2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035",
"entity_name": "mgr.ceph1.ckfkeb",
"mgr_module": "devicehealth",
"mgr_module_caller": "PyModuleRunner::serve",
"mgr_python_exception": "OperationalError",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-mgr",
"stack_sig": "7e506cc2729d5a18403f0373447bb825b42aafa2405fb0e5cfffc2896b093ed8",
"timestamp": "2023-02-07T00:07:12.739187Z",
"utsname_hostname": "ceph1",
"utsname_machine": "x86_64",
"utsname_release": "5.15.0-58-generic",
"utsname_sysname": "Linux",
"utsname_version": "#64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023"

My auth for MGR

mgr.ceph1.ckfkeb
key: AQCrSN1jQBmkGxAAcJgLndLU7r4uMETZOWmFwg==
caps: [mds] allow *
caps: [mon] profile mgr
caps: [osd] allow *
mgr.ceph2.aaptny
key: AQBOS91jadj8GhAAfizKqqFJ1C5UJgX6+msb4Q==
caps: [mds] allow *
caps: [mon] profile mgr
caps: [osd] allow *

Related issues 1 (0 open — 1 closed)