Project

General

Profile

Bug #46285

osd: error from smartctl is always reported as invalid JSON

Added by Josh Durgin over 3 years ago. Updated over 3 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When smartctl returns an error, the osd always reports it as invalid json. We meant to give a better error, but the conditional here is incorrect https://github.com/ceph/ceph/pull/28848/files#diff-14b6f2e2d1fee0fc37c98071f13661f4R738-R739

For example:

[root@extensa003 ~]# ceph device get-health-metrics AVAGO_SMC3108_00f11da416efd1fc2200d36c23800403
{
    "20200629-001032": {
        "dev": "/dev/sdg", 
        "error": "smartctl returned invalid JSON", 
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", 
        "nvme_smart_health_information_add_log_error_code": -22, 
        "nvme_vendor": "avago" 
    }, 
    "20200630-000823": {
        "dev": "/dev/sdg", 
        "error": "smartctl returned invalid JSON", 
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", 
        "nvme_smart_health_information_add_log_error_code": -22, 
        "nvme_vendor": "avago" 
    }
}

[root@mero007 ~]# smartctl -a --json "/dev/sdg" 
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      0
    ],
    "svn_revision": "4883",
    "platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-a",
      "--json",
      "/dev/sdg" 
    ],
    "exit_status": 4
  },
  "device": {
    "name": "/dev/sdg",
    "info_name": "/dev/sdg",
    "type": "scsi",
    "protocol": "SCSI" 
  },
  "vendor": "AVAGO",
  "product": "SMC3108",
  "model_name": "AVAGO SMC3108",
  "revision": "4.68",
  "scsi_version": "SPC-3",
  "user_capacity": {
    "blocks": 15626993664,
    "bytes": 8001020755968
  },
  "logical_block_size": 512,
  "physical_block_size": 4096,
  "serial_number": "00f11da416efd1fc2200d36c23800403",
  "device_type": {
    "scsi_value": 0,
    "name": "disk" 
  },
  "local_time": {
    "time_t": 1593500798,
    "asctime": "Tue Jun 30 07:06:38 2020 UTC" 
  },
  "temperature": {
    "current": 0,
    "drive_trip": 0
  }
}

History

#1 Updated by Yaarit Hatuka over 3 years ago

Which version is this cluster running?

I would expect to see this "output" key in the command's output:
https://github.com/ceph/ceph/pull/28848/files#diff-14b6f2e2d1fee0fc37c98071f13661f4R750

Is the cluster containerized?
If so - smartctl version in the container might be old (meaning --json is indeed invalid), versus smartctl version outside (which is 7.0 and has --json option).

Btw, the osd also returns other smartctl errors, for example:

{
"dev": "/dev/sdg",
"error": "smartctl failed",
"host_id": "obfuscated here",
"nvme_vendor": "lsi",
"smartctl_output": "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n",
"smartctl_error_code": -22,
"nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 1",
"nvme_smart_health_information_add_log_error_code": -22
}

(taken from telemetry device report of a HW RAID controller)

#2 Updated by Josh Durgin over 3 years ago

  • Status changed from New to Rejected

turns out the report was from an earlier version (it did not contain the 'output' key)

#3 Updated by Daniƫl Vos over 3 years ago

Yaarit Hatuka wrote:

Is the cluster containerized?
If so - smartctl version in the container might be old (meaning --json is indeed invalid), versus smartctl version outside (which is 7.0 and has --json option).

smartmontools in the ceph/ceph:latest image is indeed version 6.6

Also available in: Atom PDF