Project

General

Profile

Actions

Bug #58316

open

Ceph health metric Scraping still broken

Added by Janek Bevendorff over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This was brought up in #46285 already, but the issue has been marked as rejected.

When I run ceph device scrape-health-metrics HGST_HUH721010AL5200_7JKMZYKG to collect SMART metrics for a device and then list them via ceph device get-health-metrics HGST_HUH721010AL5200_7JKMZYKG, I only get

{
    "20221220-090607": {
        "dev": "/dev/sdd",
        "error": "smartctl failed",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 1",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "hgst",
        "smartctl_error_code": -22,
        "smartctl_output": "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n" 
    }
}

The device is NOT an NVMe drive, it's an SAS-attached spinning disk. The same happens for ALL other (SAS) devices in our cluster. In fact, it's been doing that from day one when the device health feature came out and I have only been waiting for this to be fixed eventually, but the issue is still there.

I am running the latest Pacific release and smartmontools 7.1.

Actions

Also available in: Atom PDF