Project

General

Profile

Actions

Bug #43006

open

Device monitoring - get-health-metrics - json parse error

Added by Olivier Sauzet over 4 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Monitoring
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

My Ceph node are on "Ubuntu-18.04" and 4.15.0-66-generic kernel.

I install smartmontools like that :

wget http://launchpadlibrarian.net/425861623/smartmontools_7.0-0ubuntu1~ubuntu18.04.1_amd64.deb
dpkg -i smartmontools_7.0-0ubuntu1~ubuntu18.04.1_amd64.deb

My device list :

 ceph device ls
DEVICE                                 HOST:DEV   DAEMONS LIFE EXPECTANCY 
ATA_HGST_HUS726040AL_K3G1BXMB          weil02:sdf osd.7                   
ATA_ST2000VX008-2E31_Z5232C5P          weil02:sdg osd.11                  
ATA_ST4000NM0004-1FT_Z4F04GEK          weil02:sde osd.6                   
HGST_HUS726020ALA610_K5HKP7YD          weil01:sdc osd.3                   
HGST_HUS726040ALA610_K4H4WG8B          weil01:sdb osd.2                   
Hitachi_HUA722020ALA330_JK1151YAHB9EJZ weil04:sdb osd.10                  
Hitachi_HUA722020ALA330_JK1151YAHL1MEZ weil04:sda osd.9                   
Hitachi_HUA722020ALA330_JK1151YAHL7GXZ weil04:sdd osd.0                   
SEAGATE_ST2000NM0023_Z1X1C7GD          weil02:sdd osd.5                   
ST2000NM0033-9ZM175_Z1X0RFXY           weil04:sdc osd.8                   
WDC_WD2003FZEX-00Z4SA0_WD-WCC5C53H4FE0 weil01:sdd osd.4           

Some disk have some error like this one (osd.3) :

ceph device info HGST_HUS726020ALA610_K5HKP7YD
device HGST_HUS726020ALA610_K5HKP7YD
attachment weil01:sdc
daemons osd.3
ceph device get-health-metrics HGST_HUS726020ALA610_K5HKP7YD
    "20191121-132728": {
        "nvme_smart_health_information_add_log_error_code": -22, 
        "nvme_vendor": "hgst_hus726020ala610", 
        "nvme_smart_health_information_add_log_error": "nvme returned an error: sudo: exit status: 231", 
        "dev": "/dev/sdc", 
        "error": "smartctl returned invalid JSON" 
    }, 
  • Its strange, because another drive osd.2 (same model,same host) return some information (but have in JSON output the same error from nvme ) ! the output is in the attached files: HGST_output_K4H4WG8B.txt :
    ceph device get-health-metrics HGST_HUS726040ALA610_K4H4WG8B
    
  • The smart output of osd.3 :
    smartctl -a --json /dev/sdc
    

    (the output is in the attached files: output_smart_K5HKP7YD.txt)

Files

HGST_output_K4H4WG8B.txt (96.6 KB) HGST_output_K4H4WG8B.txt health-metrics HGST_HUS726040ALA610_K4H4WG8B Olivier Sauzet, 11/25/2019 12:53 PM
output_smart_K5HKP7YD.txt (23 KB) output_smart_K5HKP7YD.txt smart output for HGST_HUS726020ALA610_K5HKP7YD Olivier Sauzet, 11/25/2019 12:56 PM
Actions

Also available in: Atom PDF