Bug #58316: Ceph health metric Scraping still broken - RADOS - Ceph

Actions

Copy link

Bug #58316

open

Ceph health metric Scraping still broken

Added by Janek Bevendorff over 1 year ago. Updated over 1 year ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v16.2.10

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This was brought up in #46285 already, but the issue has been marked as rejected.

When I run ceph device scrape-health-metrics HGST_HUH721010AL5200_7JKMZYKG to collect SMART metrics for a device and then list them via ceph device get-health-metrics HGST_HUH721010AL5200_7JKMZYKG, I only get

{
    "20221220-090607": {
        "dev": "/dev/sdd",
        "error": "smartctl failed",
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 1",
        "nvme_smart_health_information_add_log_error_code": -22,
        "nvme_vendor": "hgst",
        "smartctl_error_code": -22,
        "smartctl_output": "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n" 
    }
}

The device is NOT an NVMe drive, it's an SAS-attached spinning disk. The same happens for ALL other (SAS) devices in our cluster. In fact, it's been doing that from day one when the device health feature came out and I have only been waiting for this to be fixed eventually, but the issue is still there.

I am running the latest Pacific release and smartmontools 7.1.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #58316

Ceph health metric Scraping still broken

Updated by Janek Bevendorff over 1 year ago