Bug #48604
Updated by Volker Theile over 3 years ago
Running 'ceph device query-daemon-health-metrics' causes failures, but smartctl_output does not contain helpful information. The 'stdout:' text should contain the smartctl output as far as i understood the C code, but it doesn't. https://github.com/ceph/ceph/blob/octopus/src/common/blkdev.cc#L728 https://github.com/ceph/ceph/blob/master/src/common/blkdev.cc#L735 https://github.com/ceph/ceph/blob/octopus/src/common/blkdev.cc#L758 https://github.com/ceph/ceph/blob/master/src/common/blkdev.cc#L753 <pre> :~ # ceph device query-daemon-health-metrics osd.6 { "HUH721010ALE600______00YK043D7A01892LEN_1EK70PSZ" : { "dev" : "/dev/sdc", "error" : "smartctl failed", "nvme_smart_health_information_add_log_error" : "nvme returned an error: sudo: exit status: 1", "nvme_smart_health_information_add_log_error_code" : -22, "nvme_vendor" : "ata", "smartctl_error_code" : -22, "smartctl_output" : "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n" }, "KCM51VUG800G_79M0A01PTZZF" : { "dev" : "/dev/nvme1n1", "error" : "smartctl failed", "nvme_smart_health_information_add_log_error" : "nvme returned an error: sudo: exit status: 1", "nvme_smart_health_information_add_log_error_code" : -22, "nvme_vendor" : "lvm", "smartctl_error_code" : -22, "smartctl_output" : "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n" } } </pre> It is possible to run smartctl manually without problems. <pre> { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 0 ], "svn_revision": "4917", "platform_info": "x86_64-linux-5.3.18-24.37-default", "build_info": "(SUSE RPM)", "argv": [ "smartctl", "-a", "--json=o", "/dev/sdc" ], "output": [ "smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-24.37-default] (SUSE RPM)", "Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org", "", "=== START OF INFORMATION SECTION ===", "Device Model: HUH721010ALE600 00YK043D7A01892LEN", "Serial Number: 1EK70PSZ", "LU WWN Device Id: 5 000cca 27eed77be", "Firmware Version: LHGNK9Q7", "User Capacity: 10,000,831,348,736 bytes [10.0 TB]", "Sector Sizes: 512 bytes logical, 4096 bytes physical", "Rotation Rate: 7200 rpm", "Form Factor: 3.5 inches", "Device is: Not in smartctl database [for details use: -P showall]", "ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4", "SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)", "Local Time is: Tue Nov 24 09:55:24 2020 GMT", "SMART support is: Available - device has SMART capability.", "SMART support is: Enabled", "", "=== START OF READ SMART DATA SECTION ===", "SMART overall-health self-assessment test result: PASSED", "", "General SMART Values:", "Offline data collection status: (0x82)\tOffline data collection activity", "\t\t\t\t\twas completed without error.", "\t\t\t\t\tAuto Offline Data Collection: Enabled.", "Self-test execution status: ( 0)\tThe previous self-test routine completed", "\t\t\t\t\twithout error or no self-test has ever ", "\t\t\t\t\tbeen run.", "Total time to complete Offline ", "data collection: \t\t( 93) seconds.", "Offline data collection", "capabilities: \t\t\t (0x5b) SMART execute Offline immediate.", "\t\t\t\t\tAuto Offline data collection on/off support.", "\t\t\t\t\tSuspend Offline collection upon new", "\t\t\t\t\tcommand.", "\t\t\t\t\tOffline surface scan supported.", "\t\t\t\t\tSelf-test supported.", "\t\t\t\t\tNo Conveyance Self-test supported.", "\t\t\t\t\tSelective Self-test supported.", "SMART capabilities: (0x0003)\tSaves SMART data before entering", "\t\t\t\t\tpower-saving mode.", "\t\t\t\t\tSupports SMART auto save timer.", "Error logging capability: (0x01)\tError logging supported.", "\t\t\t\t\tGeneral Purpose Logging supported.", "Short self-test routine ", "recommended polling time: \t ( 2) minutes.", "Extended self-test routine", "recommended polling time: \t (1105) minutes.", "SCT capabilities: \t (0x003d)\tSCT Status supported.", "\t\t\t\t\tSCT Error Recovery Control supported.", "\t\t\t\t\tSCT Feature Control supported.", ... </pre>