Project

General

Profile

Bug #48604

Updated by Volker Theile over 3 years ago

Running 'ceph device query-daemon-health-metrics' causes failures, but smartctl_output does not contain helpful information. The 'stdout:' text should contain the smartctl output as far as i understood the C code, but it doesn't. 

 https://github.com/ceph/ceph/blob/octopus/src/common/blkdev.cc#L728 https://github.com/ceph/ceph/blob/master/src/common/blkdev.cc#L735 
 https://github.com/ceph/ceph/blob/octopus/src/common/blkdev.cc#L758 https://github.com/ceph/ceph/blob/master/src/common/blkdev.cc#L753 

 <pre> 
  :~ # ceph device query-daemon-health-metrics osd.6 
 { 
     "HUH721010ALE600______00YK043D7A01892LEN_1EK70PSZ" : { 
         "dev" : "/dev/sdc", 
         "error" : "smartctl failed", 
         "nvme_smart_health_information_add_log_error" : "nvme returned an error: sudo: exit status: 1", 
         "nvme_smart_health_information_add_log_error_code" : -22, 
         "nvme_vendor" : "ata", 
         "smartctl_error_code" : -22, 
         "smartctl_output" : "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n" 
     }, 
     "KCM51VUG800G_79M0A01PTZZF" : { 
         "dev" : "/dev/nvme1n1", 
         "error" : "smartctl failed", 
         "nvme_smart_health_information_add_log_error" : "nvme returned an error: sudo: exit status: 1", 
         "nvme_smart_health_information_add_log_error_code" : -22, 
         "nvme_vendor" : "lvm", 
         "smartctl_error_code" : -22, 
         "smartctl_output" : "smartctl returned an error (1): stderr:\nsudo: exit status: 1\nstdout:\n" 
     } 
 } 
 </pre> 

 It is possible to run smartctl manually without problems. 

 <pre> 
 { 
   "json_format_version": [ 
     1, 
     0 
   ], 
   "smartctl": { 
     "version": [ 
       7, 
       0 
     ], 
     "svn_revision": "4917", 
     "platform_info": "x86_64-linux-5.3.18-24.37-default", 
     "build_info": "(SUSE RPM)", 
     "argv": [ 
       "smartctl", 
       "-a", 
       "--json=o", 
       "/dev/sdc" 
     ], 
     "output": [ 
       "smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-24.37-default] (SUSE RPM)", 
       "Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org", 
       "", 
       "=== START OF INFORMATION SECTION ===", 
       "Device Model:       HUH721010ALE600        00YK043D7A01892LEN", 
       "Serial Number:      1EK70PSZ", 
       "LU WWN Device Id: 5 000cca 27eed77be", 
       "Firmware Version: LHGNK9Q7", 
       "User Capacity:      10,000,831,348,736 bytes [10.0 TB]", 
       "Sector Sizes:       512 bytes logical, 4096 bytes physical", 
       "Rotation Rate:      7200 rpm", 
       "Form Factor:        3.5 inches", 
       "Device is:          Not in smartctl database [for details use: -P showall]", 
       "ATA Version is:     ACS-2, ATA8-ACS T13/1699-D revision 4", 
       "SATA Version is:    SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)", 
       "Local Time is:      Tue Nov 24 09:55:24 2020 GMT", 
       "SMART support is: Available - device has SMART capability.", 
       "SMART support is: Enabled", 
       "", 
       "=== START OF READ SMART DATA SECTION ===", 
       "SMART overall-health self-assessment test result: PASSED", 
       "", 
       "General SMART Values:", 
       "Offline data collection status:    (0x82)\tOffline data collection activity", 
       "\t\t\t\t\twas completed without error.", 
       "\t\t\t\t\tAuto Offline Data Collection: Enabled.", 
       "Self-test execution status:        (     0)\tThe previous self-test routine completed", 
       "\t\t\t\t\twithout error or no self-test has ever ", 
       "\t\t\t\t\tbeen run.", 
       "Total time to complete Offline ", 
       "data collection: \t\t(     93) seconds.", 
       "Offline data collection", 
       "capabilities: \t\t\t (0x5b) SMART execute Offline immediate.", 
       "\t\t\t\t\tAuto Offline data collection on/off support.", 
       "\t\t\t\t\tSuspend Offline collection upon new", 
       "\t\t\t\t\tcommand.", 
       "\t\t\t\t\tOffline surface scan supported.", 
       "\t\t\t\t\tSelf-test supported.", 
       "\t\t\t\t\tNo Conveyance Self-test supported.", 
       "\t\t\t\t\tSelective Self-test supported.", 
       "SMART capabilities:              (0x0003)\tSaves SMART data before entering", 
       "\t\t\t\t\tpower-saving mode.", 
       "\t\t\t\t\tSupports SMART auto save timer.", 
       "Error logging capability:          (0x01)\tError logging supported.", 
       "\t\t\t\t\tGeneral Purpose Logging supported.", 
       "Short self-test routine ", 
       "recommended polling time: \t (     2) minutes.", 
       "Extended self-test routine", 
       "recommended polling time: \t (1105) minutes.", 
       "SCT capabilities: \t         (0x003d)\tSCT Status supported.", 
       "\t\t\t\t\tSCT Error Recovery Control supported.", 
       "\t\t\t\t\tSCT Feature Control supported.", 
       ... 
 </pre>

Back