Project

General

Profile

Actions

Bug #59313

open

Module 'devicehealth' has failed: unable to open database file

Added by Jamin Collins about 1 year ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm experiencing an error with the devicehealth module. ceph status indicates that it (devicehealth) has failed and cant open it's database file. However, I can't seem to find any information on where this database file should be, in order to verify it exists or create it.

$ ceph versions
{
    "mon": {
        "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 1
    },
    "osd": {
        "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 16
    },
    "mds": {},
    "overall": {
        "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 20
    }
}

Attempting to turn monitoring off or on, results in a similar error.

$ ceph device monitoring off
Error EIO: Module 'devicehealth' has experienced an error and cannot handle commands: unable to open database file

$ ceph device monitoring on
Error EIO: Module 'devicehealth' has experienced an error and cannot handle commands: unable to open database file

I'm happy to provide any additional details or information needed.

Any guidance would be most appreciated.

Actions #1

Updated by Peter Pavlisko 12 months ago

I happen to have exactly the same problem with exactly the same Ceph version. It started after manual upgrade from 16.2.10 to 17.2.5. I followed the upgrade guide: https://ceph.io/en/news/blog/2022/v17-2-0-quincy-released/#upgrading-non-cephadm-clusters

The error message after restarting was:

Module 'devicehealth' has failed: unable to open database file

When I emerged (compiled on Gentoo linux) sys-cluster/ceph without "sqlite" USE flag, the message changed slightly to this:

Module 'devicehealth' has failed: no such vfs: ceph

Particularly troubling is that the whole cluster is now in HEALTH_ERR state because of it. We have the ceph status reading wired into our internal monitoring software and now it shows ceph cluster as failed. We can go on and ignore the reading, but if something bad would actually happen to the cluster we now have no way to distinguish it from the current HEALTH_ERR state.

I would appreciate any lead. Do I need to create some files, delete some files, run some commands? How do I get rid of this error? Or, if this is a bug, how can I be helpful in locating it?

Actions #2

Updated by Peter Pavlisko 12 months ago

To any unfortunate soul like me, googling the same error message... the solution (workaround?), that worked for me:

  1. stop all mgr daemons
  2. delete ".mgr" pool, where the sqlite databases for mgr modules live:
    ceph tell mon.\* injectargs '--mon-allow-pool-delete=true'
    ceph osd pool delete .mgr .mgr --yes-i-really-really-mean-it
    ceph tell mon.\* injectargs '--mon-allow-pool-delete=false'

    (if you are unsure about deleting the pool, you can just rename it to something else - "ceph osd pool rename ...")
  3. start all mgr daemons

The daemon will recreate the pool and populate it again.

Actions #3

Updated by Alex Moore 12 months ago

Thanks a lot - I hit the same issue, and these steps also resolved it for me.

Note that prior to finding this post, I had separately noticed that following the upgrade to quincy, the output from "ceph config get mgr mgr/devicehealth/pool_name" was still "device_health_metrics", yet that pool didn't exist, and the quincy release notes implied it had been renamed to ".mgr". So I had tried running "ceph config set mgr mgr/devicehealth/pool_name .mgr", and then restarting all mgr daemons, but that didn't help. However following Peter's steps above did fix the issue.

Actions

Also available in: Atom PDF