Bug #55142
closed[ERR] : Unhandled exception from module 'devicehealth' while running on mgr.gibba002.nzpbzu: disk I/O error
0%
Description
022-03-31T00:15:17.829+0000 7fcf56511700 0 [balancer INFO root] ceph osd pg-upmap-items 204.c4d mappings [{'from': 878, 'to': 191}] 2022-03-31T00:15:18.419+0000 7fcf49cf8700 -1 client.1735696: SimpleRADOSStriper: lock: main.db: lock failed: (108) Cannot send after transport endpoint shutdown 2022-03-31T00:15:18.419+0000 7fcf49cf8700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.gibba002.nzpbzu: disk I/O er ror 2022-03-31T00:15:18.419+0000 7fcf49cf8700 -1 devicehealth.serve: 2022-03-31T00:15:18.419+0000 7fcf49cf8700 -1 Traceback (most recent call last): File "/usr/share/ceph/mgr/devicehealth/module.py", line 373, in serve self.scrape_all() File "/usr/share/ceph/mgr/devicehealth/module.py", line 425, in scrape_all self.put_device_metrics(device, data) File "/usr/share/ceph/mgr/devicehealth/module.py", line 500, in put_device_metrics self._create_device(devid) File "/usr/share/ceph/mgr/devicehealth/module.py", line 487, in _create_device cursor = self.db.execute(SQL, (devid,)) sqlite3.OperationalError: disk I/O error 2022-03-31T00:15:18.568+0000 7fcff717d700 -1 mgr handle_mgr_map I was active but no longer am 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn e: '/usr/bin/ceph-mgr' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 0: '/usr/bin/ceph-mgr' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 1: '-n' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 2: 'mgr.gibba002.nzpbzu' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 3: '-f' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 4: '--setuser' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 5: 'ceph' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 6: '--setgroup' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 7: 'ceph' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 8: '--default-log-to-file=false' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 9: '--default-log-to-journald=true' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn 10: '--default-log-to-stderr=false' 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn respawning with exe /usr/bin/ceph-mgr 2022-03-31T00:15:18.568+0000 7fcff717d700 1 mgr respawn exe_path /proc/self/exe 2022-03-31T00:15:19.967+0000 7f2c2da44000 0 ceph version 17.1.0-138-g723fda64 (723fda64a662bb79871e590698268007049bcf7f) quincy (stable), process ceph-mgr, pid 8 2022-03-31T00:15:19.967+0000 7f2c2da44000 0 pidfile_write: ignore empty --pid-file 2022-03-31T00:15:21.569+0000 7f2c2da44000 1 mgr[py] Loading python module 'mirroring' 2022-03-31T00:15:22.569+0000 7f2c2da44000 1 mgr[py] Loading python module 'stats'
Updated by Yaarit Hatuka about 2 years ago
- Project changed from mgr to cephsqlite
- Category deleted (
devicehealth module) - Assignee changed from Yaarit Hatuka to Venky Shankar
- Source set to Development
- Backport set to pacific, quincy
- Affected Versions v16.2.7 added
I tried to reproduce it on the gibba cluster by scraping all devices (with `sudo ceph device scrape-health-metrics`), but the exception did not appear in the logs again.
The following implies that the error relates to a locking issue in src/SimpleRADOSStriper.cc:
2022-03-31T00:15:18.419+0000 7fcf49cf8700 -1 client.1735696: SimpleRADOSStriper: lock: main.db: lock failed: (108) Cannot send after transport endpoint shutdown
Venky, can you please take a look?
Updated by Venky Shankar about 2 years ago
- Assignee changed from Venky Shankar to Patrick Donnelly
Yaarit Hatuka wrote:
I tried to reproduce it on the gibba cluster by scraping all devices (with `sudo ceph device scrape-health-metrics`), but the exception did not appear in the logs again.
The following implies that the error relates to a locking issue in src/SimpleRADOSStriper.cc:
2022-03-31T00:15:18.419+0000 7fcf49cf8700 -1 client.1735696: SimpleRADOSStriper: lock: main.db: lock failed: (108) Cannot send after transport endpoint shutdown
Venky, can you please take a look?
This is (most likely) not related to CephFS, so, I'm probably not the intended assignee for this tracker.
Quick check to `src/SimpleRADOSStriper.cc' has Patrick Donnelly as the author, who works on CephFS, but I'm pretty sure cephsqlite was developed as a standalone project rather than anything related to CephFS.
Assigning to Patrick (who is on PTO until Mayish, so this might take a while to be looked into).
Updated by Yaarit Hatuka almost 2 years ago
- Related to Bug #55606: [ERR] Unhandled exception from module ''devicehealth'' while running on mgr.y: unknown added
Updated by Patrick Donnelly almost 2 years ago
- Status changed from New to Need More Info
This error is generated when the cephsqlite RADOS instance is blocklisted. So this is likely a symptom and not a bug.
Updated by Laura Flores almost 2 years ago
/a/yuriw-2022-05-27_21:59:17-rados-wip-yuri-testing-2022-05-27-0934-distro-default-smithi/6851244
Updated by Laura Flores almost 2 years ago
/a/yuriw-2022-06-09_22:06:32-rados-wip-yuri3-testing-2022-06-09-1314-distro-default-smithi/6871541
Updated by Laura Flores almost 2 years ago
- Subject changed from [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.gibba002.nzpbzu: disk I/O er ror to [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.gibba002.nzpbzu: disk I/O error
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6944298/
Updated by Neha Ojha over 1 year ago
/a/yuriw-2022-09-15_17:53:16-rados-quincy-release-distro-default-smithi/7034360
Updated by Laura Flores over 1 year ago
/a/yuriw-2022-09-29_16:44:24-rados-wip-lflores-testing-distro-default-smithi/7048202
Updated by Patrick Donnelly about 1 year ago
- Is duplicate of Bug #55606: [ERR] Unhandled exception from module ''devicehealth'' while running on mgr.y: unknown added
Updated by Patrick Donnelly about 1 year ago
- Related to deleted (Bug #55606: [ERR] Unhandled exception from module ''devicehealth'' while running on mgr.y: unknown)
Updated by Patrick Donnelly about 1 year ago
- Status changed from Need More Info to Duplicate
Updated by Laura Flores about 1 year ago
- Translation missing: en.field_tag_list set to test-failure
/a/lflores-2023-03-27_20:42:09-rados-wip-aclamk-bs-elastic-shared-blob-quincy-distro-default-smithi/7221723