Project

General

Profile

Actions

Bug #48604

closed

orchestrator: query-daemon-health-metrics fails, no smartctl output

Added by Volker Theile over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Running 'ceph device query-daemon-health-metrics' causes failures, but smartctl_output does not contain helpful information. The 'stdout:' text should contain the smartctl output as far as i understood the C code, but it doesn't.

https://github.com/ceph/ceph/blob/octopus/src/common/blkdev.cc#L728
https://github.com/ceph/ceph/blob/octopus/src/common/blkdev.cc#L758

 :~ # ceph device query-daemon-health-metrics osd.6
{
    "HUH721010ALE600______00YK043D7A01892LEN_1EK70PSZ" : {
        "dev" : "/dev/sdc",
        "error" : "smartctl failed",
        "nvme_smart_health_information_add_log_error" : "nvme returned an error: sudo: exit status: 1",
        "nvme_smart_health_information_add_log_error_code" : -22,
        "nvme_vendor" : "ata",
        "smartctl_error_code" : -22,
        "smartctl_output" : "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n" 
    },
    "KCM51VUG800G_79M0A01PTZZF" : {
        "dev" : "/dev/nvme1n1",
        "error" : "smartctl failed",
        "nvme_smart_health_information_add_log_error" : "nvme returned an error: sudo: exit status: 1",
        "nvme_smart_health_information_add_log_error_code" : -22,
        "nvme_vendor" : "lvm",
        "smartctl_error_code" : -22,
        "smartctl_output" : "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n" 
    }
}

It is possible to run smartctl manually without problems.

{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      0
    ],
    "svn_revision": "4917",
    "platform_info": "x86_64-linux-5.3.18-24.37-default",
    "build_info": "(SUSE RPM)",
    "argv": [
      "smartctl",
      "-a",
      "--json=o",
      "/dev/sdc" 
    ],
    "output": [
      "smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-24.37-default] (SUSE RPM)",
      "Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org",
      "",
      "=== START OF INFORMATION SECTION ===",
      "Device Model:     HUH721010ALE600      00YK043D7A01892LEN",
      "Serial Number:    1EK70PSZ",
      "LU WWN Device Id: 5 000cca 27eed77be",
      "Firmware Version: LHGNK9Q7",
      "User Capacity:    10,000,831,348,736 bytes [10.0 TB]",
      "Sector Sizes:     512 bytes logical, 4096 bytes physical",
      "Rotation Rate:    7200 rpm",
      "Form Factor:      3.5 inches",
      "Device is:        Not in smartctl database [for details use: -P showall]",
      "ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4",
      "SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)",
      "Local Time is:    Tue Nov 24 09:55:24 2020 GMT",
      "SMART support is: Available - device has SMART capability.",
      "SMART support is: Enabled",
      "",
      "=== START OF READ SMART DATA SECTION ===",
      "SMART overall-health self-assessment test result: PASSED",
      "",
      "General SMART Values:",
      "Offline data collection status:  (0x82)\tOffline data collection activity",
      "\t\t\t\t\twas completed without error.",
      "\t\t\t\t\tAuto Offline Data Collection: Enabled.",
      "Self-test execution status:      (   0)\tThe previous self-test routine completed",
      "\t\t\t\t\twithout error or no self-test has ever ",
      "\t\t\t\t\tbeen run.",
      "Total time to complete Offline ",
      "data collection: \t\t(   93) seconds.",
      "Offline data collection",
      "capabilities: \t\t\t (0x5b) SMART execute Offline immediate.",
      "\t\t\t\t\tAuto Offline data collection on/off support.",
      "\t\t\t\t\tSuspend Offline collection upon new",
      "\t\t\t\t\tcommand.",
      "\t\t\t\t\tOffline surface scan supported.",
      "\t\t\t\t\tSelf-test supported.",
      "\t\t\t\t\tNo Conveyance Self-test supported.",
      "\t\t\t\t\tSelective Self-test supported.",
      "SMART capabilities:            (0x0003)\tSaves SMART data before entering",
      "\t\t\t\t\tpower-saving mode.",
      "\t\t\t\t\tSupports SMART auto save timer.",
      "Error logging capability:        (0x01)\tError logging supported.",
      "\t\t\t\t\tGeneral Purpose Logging supported.",
      "Short self-test routine ",
      "recommended polling time: \t (   2) minutes.",
      "Extended self-test routine",
      "recommended polling time: \t (1105) minutes.",
      "SCT capabilities: \t       (0x003d)\tSCT Status supported.",
      "\t\t\t\t\tSCT Error Recovery Control supported.",
      "\t\t\t\t\tSCT Feature Control supported.",
      ...

Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #48737: octopus: orchestrator: query-daemon-health-metrics fails, no smartctl outputResolvedNathan CutlerActions
Actions

Also available in: Atom PDF