Bug #46285
closedosd: error from smartctl is always reported as invalid JSON
0%
Description
When smartctl returns an error, the osd always reports it as invalid json. We meant to give a better error, but the conditional here is incorrect https://github.com/ceph/ceph/pull/28848/files#diff-14b6f2e2d1fee0fc37c98071f13661f4R738-R739
For example:
[root@extensa003 ~]# ceph device get-health-metrics AVAGO_SMC3108_00f11da416efd1fc2200d36c23800403 { "20200629-001032": { "dev": "/dev/sdg", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" }, "20200630-000823": { "dev": "/dev/sdg", "error": "smartctl returned invalid JSON", "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", "nvme_smart_health_information_add_log_error_code": -22, "nvme_vendor": "avago" } } [root@mero007 ~]# smartctl -a --json "/dev/sdg" { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 0 ], "svn_revision": "4883", "platform_info": "x86_64-linux-3.10.0-1062.7.1.el7.x86_64", "build_info": "(local build)", "argv": [ "smartctl", "-a", "--json", "/dev/sdg" ], "exit_status": 4 }, "device": { "name": "/dev/sdg", "info_name": "/dev/sdg", "type": "scsi", "protocol": "SCSI" }, "vendor": "AVAGO", "product": "SMC3108", "model_name": "AVAGO SMC3108", "revision": "4.68", "scsi_version": "SPC-3", "user_capacity": { "blocks": 15626993664, "bytes": 8001020755968 }, "logical_block_size": 512, "physical_block_size": 4096, "serial_number": "00f11da416efd1fc2200d36c23800403", "device_type": { "scsi_value": 0, "name": "disk" }, "local_time": { "time_t": 1593500798, "asctime": "Tue Jun 30 07:06:38 2020 UTC" }, "temperature": { "current": 0, "drive_trip": 0 } }
Updated by Yaarit Hatuka almost 4 years ago
Which version is this cluster running?
I would expect to see this "output" key in the command's output:
https://github.com/ceph/ceph/pull/28848/files#diff-14b6f2e2d1fee0fc37c98071f13661f4R750
Is the cluster containerized?
If so - smartctl version in the container might be old (meaning --json is indeed invalid), versus smartctl version outside (which is 7.0 and has --json option).
Btw, the osd also returns other smartctl errors, for example:
{
"dev": "/dev/sdg",
"error": "smartctl failed",
"host_id": "obfuscated here",
"nvme_vendor": "lsi",
"smartctl_output": "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n",
"smartctl_error_code": -22,
"nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 1",
"nvme_smart_health_information_add_log_error_code": -22
}
(taken from telemetry device report of a HW RAID controller)
Updated by Josh Durgin almost 4 years ago
- Status changed from New to Rejected
turns out the report was from an earlier version (it did not contain the 'output' key)
Updated by Daniƫl Vos over 3 years ago
Yaarit Hatuka wrote:
Is the cluster containerized?
If so - smartctl version in the container might be old (meaning --json is indeed invalid), versus smartctl version outside (which is 7.0 and has --json option).
smartmontools in the ceph/ceph:latest image is indeed version 6.6