Project

General

Profile

Bug #51554

mgr/devicehealth: health warning caused by AttributeError: 'NoneType' object has no attribute 'get'

Added by Robert Sander over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
devicehealth module
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific, quincy
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

health warning caused by AttributeError: 'NoneType' object has no attribute 'get'

The cluster status is not healthy because the devicehealth module throws an exception.

Environment

  • ceph version string: 16.2.4
  • Platform (OS/distro/release): Container images from docker.io/ceph/ceph

How reproducible

Restarting the mgr containers does not fix the issue.

Actual results

Jun 30 16:07:09 al111 bash171790: debug 2021-06-30T14:07:09.939+0000 7f2a31d64700 -1 devicehealth.serve:
Jun 30 16:07:09 al111 bash171790: debug 2021-06-30T14:07:09.939+0000 7f2a31d64700 -1 Traceback (most recent call last):
Jun 30 16:07:09 al111 bash171790: File "/usr/share/ceph/mgr/devicehealth/module.py", line 330, in serve
Jun 30 16:07:09 al111 bash171790: self.scrape_all()
Jun 30 16:07:09 al111 bash171790: File "/usr/share/ceph/mgr/devicehealth/module.py", line 390, in scrape_all
Jun 30 16:07:09 al111 bash171790: self.put_device_metrics(ioctx, device, data)
Jun 30 16:07:09 al111 bash171790: File "/usr/share/ceph/mgr/devicehealth/module.py", line 477, in put_device_metrics
Jun 30 16:07:09 al111 bash171790: wear_level = get_ata_wear_level(data)
Jun 30 16:07:09 al111 bash171790: File "/usr/share/ceph/mgr/devicehealth/module.py", line 33, in get_ata_wear_level
Jun 30 16:07:09 al111 bash171790: if page.get("number") != 7:
Jun 30 16:07:09 al111 bash171790: AttributeError: 'NoneType' object has no attribute 'get'

Expected results

No Python exception.

eVtran-smartctl.json (37.2 KB) Rob Logan, 02/22/2022 03:26 PM

smart.json - "smartctl -x --json" output (44.2 KB) Alfredo Rezinovsky, 02/22/2022 03:29 PM


Related issues

Copied to mgr - Backport #54394: quincy: mgr/devicehealth: health warning caused by AttributeError: 'NoneType' object has no attribute 'get' Resolved
Copied to mgr - Backport #54395: pacific: mgr/devicehealth: health warning caused by AttributeError: 'NoneType' object has no attribute 'get' Resolved

History

#1 Updated by Stefan Fleischmann over 2 years ago

Same problem here with Ceph 16.2.5. Is someone looking into this?

#2 Updated by Yaarit Hatuka over 2 years ago

Thanks, Robert, Stefan, for reporting this.

This seems like a nonstandard output of smartctl command.

Can you please share the output of smartctl on the device where this happens?
Specifically:
  • the vendor and model of this device
  • the entire content of 'ata_device_statistics' key
  • smartctl version

#3 Updated by Robert Sander over 2 years ago

Yaarit Hatuka wrote:

  • the entire content of 'ata_device_statistics' key

Where do I find this information?

#4 Updated by Michael Wodniok over 2 years ago

Robert Sander wrote:

Yaarit Hatuka wrote:

  • the entire content of 'ata_device_statistics' key

Where do I find this information?

The problem is: how do you know which disk causes the error?

We have several disk types in use and here is one which does not have any tabular SMART data available in ceph:

root@rz2b-cn11:~# cephadm shell --fsid 41902fa4-3ecf-11eb-94ef-258486fe8a0f -c /etc/ceph/ceph.conf -n osd.3 -- smartctl -a /dev/sde
Using recent ceph image ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
smartctl 7.1 2020-04-05 r5049 [x86_64-linux-5.4.0-81-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST1000NX0323
Revision:             K002
Compliance:           SPC-4
User Capacity:        1,000,204,886,016 bytes [1.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c5007f260653
Serial number:        S4700YQR0000J507296Q
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Aug 30 09:07:02 2021 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     34 C
Drive Trip Temperature:        60 C

Manufactured in week 01 of year 2015
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  32
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  2067
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 3465542296
  Blocks received from initiator = 1594710387
  Blocks read from cache and sent to initiator = 119140210
  Number of read and write commands whose size <= segment size = 37058510
  Number of read and write commands whose size > segment size = 82828

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 41393.57
  number of minutes until next internal SMART test = 31

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   2659695438        0         0  2659695438          0      14194.861           0
write:         0        0         0         0          0       6539.483           0

Non-medium error count:       84

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No Self-tests have been logged

As you can see there is any smart data listed in tabular form. Could this cause the issue?

#5 Updated by Steven Lewis over 2 years ago

Hi

I have had the same problem I have tracked it down to the parsing of the json response from to "smartctl -x --json" call

I have attached the json from the "smartctl -x --json /dev/sda" from my node.

It looks like the code is looping to find page 7, but some of the page are null and not a object, resulting in a NoneType.

Hope that helps.

From File "/usr/share/ceph/mgr/devicehealth/module.py"

    for page in data.get("ata_device_statistics", {}).get("pages", []):
        if page.get("number") != 7:
            continue
        for item in page.get("table", []):
            if item["offset"] == 8:
                return item["value"] / 100.0
    return None

Cut Down Json showing the null Pages

{
  "ata_device_statistics": {
    "pages": [
      {
        "number": 1,
        "name": "General Statistics",
        "revision": 2,
        "table": [
          ]
      },
      null,
      null,
      {
        "number": 4,
        "name": "General Errors Statistics",
        "revision": 1,
        "table": [
        ]
      },
      {
        "number": 5,
        "name": "Temperature Statistics",
        "revision": 1,
        "table": [
        ]
      },
      {
        "number": 6,
        "name": "Transport Statistics",
        "revision": 1,
        "table": [
        ]
      },
      {
        "number": 7,
        "name": "Solid State Device Statistics",
        "revision": 1,
        "table": [
        ]
      }
    ]
  }
}

Full Json Dump

{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      2
    ],
    "svn_revision": "5155",
    "platform_info": "x86_64-linux-5.11.22-4-pve",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-x",
      "--json",
      "/dev/sda" 
    ],
    "exit_status": 0
  },
  "device": {
    "name": "/dev/sda",
    "info_name": "/dev/sda [SAT]",
    "type": "sat",
    "protocol": "ATA" 
  },
  "model_family": "Intel 540 Series SSDs",
  "model_name": "INTEL SSDSC2KW480H6",
  "serial_number": "CVLT7311029M480EGN",
  "wwn": {
    "naa": 5,
    "oui": 6083300,
    "id": 5612639719
  },
  "firmware_version": "LSF036C",
  "user_capacity": {
    "blocks": 937703088,
    "bytes": 480103981056
  },
  "logical_block_size": 512,
  "physical_block_size": 512,
  "rotation_rate": 0,
  "form_factor": {
    "ata_value": 3,
    "name": "2.5 inches" 
  },
  "trim": {
    "supported": true,
    "deterministic": true,
    "zeroed": false
  },
  "in_smartctl_database": true,
  "ata_version": {
    "string": "ACS-3 (minor revision not indicated)",
    "major_value": 2044,
    "minor_value": 65535
  },
  "sata_version": {
    "string": "SATA 3.2",
    "value": 255
  },
  "interface_speed": {
    "max": {
      "sata_value": 14,
      "string": "6.0 Gb/s",
      "units_per_second": 60,
      "bits_per_unit": 100000000
    },
    "current": {
      "sata_value": 3,
      "string": "6.0 Gb/s",
      "units_per_second": 60,
      "bits_per_unit": 100000000
    }
  },
  "local_time": {
    "time_t": 1632228178,
    "asctime": "Tue Sep 21 13:42:58 2021 BST" 
  },
  "ata_apm": {
    "enabled": true,
    "level": 254,
    "string": "maximum performance",
    "max_performance": true,
    "min_power": false,
    "with_standby": false
  },
  "read_lookahead": {
    "enabled": true
  },
  "write_cache": {
    "enabled": true
  },
  "ata_security": {
    "state": 41,
    "string": "Disabled, frozen [SEC2]",
    "enabled": false,
    "frozen": true
  },
  "smart_status": {
    "passed": true
  },
  "ata_smart_data": {
    "offline_data_collection": {
      "status": {
        "value": 0,
        "string": "was never started" 
      },
      "completion_seconds": 0
    },
    "self_test": {
      "status": {
        "value": 0,
        "string": "completed without error",
        "passed": true
      },
      "polling_minutes": {
        "short": 2,
        "extended": 30
      }
    },
    "capabilities": {
      "values": [
        83,
        3
      ],
      "exec_offline_immediate_supported": true,
      "offline_is_aborted_upon_new_cmd": false,
      "offline_surface_scan_supported": false,
      "self_tests_supported": true,
      "conveyance_self_test_supported": false,
      "selective_self_test_supported": true,
      "attribute_autosave_enabled": true,
      "error_logging_supported": true,
      "gp_logging_supported": true
    }
  },
  "ata_sct_capabilities": {
    "value": 57,
    "error_recovery_control_supported": true,
    "feature_control_supported": true,
    "data_table_supported": true
  },
  "ata_smart_attributes": {
    "revision": 1,
    "table": [
      {
        "id": 5,
        "name": "Reallocated_Sector_Ct",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 78,
          "string": "78" 
        }
      },
      {
        "id": 9,
        "name": "Power_On_Hours_and_Msec",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 652,
          "string": "652h+00m+00.000s" 
        }
      },
      {
        "id": 12,
        "name": "Power_Cycle_Count",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 12,
          "string": "12" 
        }
      },
      {
        "id": 170,
        "name": "Available_Reservd_Space",
        "value": 99,
        "worst": 99,
        "thresh": 10,
        "when_failed": "",
        "flags": {
          "value": 51,
          "string": "PO--CK ",
          "prefailure": true,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 171,
        "name": "Program_Fail_Count",
        "value": 100,
        "worst": 100,
        "thresh": 10,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 172,
        "name": "Erase_Fail_Count",
        "value": 100,
        "worst": 100,
        "thresh": 10,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 174,
        "name": "Unexpect_Power_Loss_Ct",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 4,
          "string": "4" 
        }
      },
      {
        "id": 183,
        "name": "SATA_Downshift_Count",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 184,
        "name": "End-to-End_Error",
        "value": 100,
        "worst": 100,
        "thresh": 90,
        "when_failed": "",
        "flags": {
          "value": 51,
          "string": "PO--CK ",
          "prefailure": true,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 187,
        "name": "Uncorrectable_Error_Cnt",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 190,
        "name": "Airflow_Temperature_Cel",
        "value": 34,
        "worst": 38,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 111671640098,
          "string": "34 (Min/Max 26/38)" 
        }
      },
      {
        "id": 192,
        "name": "Power-Off_Retract_Count",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 4,
          "string": "4" 
        }
      },
      {
        "id": 199,
        "name": "UDMA_CRC_Error_Count",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 225,
        "name": "Host_Writes_32MiB",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 105048,
          "string": "105048" 
        }
      },
      {
        "id": 226,
        "name": "Workld_Media_Wear_Indic",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 227,
        "name": "Workld_Host_Reads_Perc",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 228,
        "name": "Workload_Minutes",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 232,
        "name": "Available_Reservd_Space",
        "value": 99,
        "worst": 99,
        "thresh": 10,
        "when_failed": "",
        "flags": {
          "value": 51,
          "string": "PO--CK ",
          "prefailure": true,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 233,
        "name": "Media_Wearout_Indicator",
        "value": 99,
        "worst": 99,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 0,
          "string": "0" 
        }
      },
      {
        "id": 241,
        "name": "Total_LBAs_Written",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 105048,
          "string": "105048" 
        }
      },
      {
        "id": 242,
        "name": "Total_LBAs_Read",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 244998,
          "string": "244998" 
        }
      },
      {
        "id": 249,
        "name": "NAND_Writes_1GiB",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 2720,
          "string": "2720" 
        }
      },
      {
        "id": 252,
        "name": "Unknown_Attribute",
        "value": 100,
        "worst": 100,
        "thresh": 0,
        "when_failed": "",
        "flags": {
          "value": 50,
          "string": "-O--CK ",
          "prefailure": false,
          "updated_online": true,
          "performance": false,
          "error_rate": false,
          "event_count": true,
          "auto_keep": true
        },
        "raw": {
          "value": 7,
          "string": "7" 
        }
      }
    ]
  },
  "power_on_time": {
    "hours": 652,
    "minutes": 0
  },
  "power_cycle_count": 12,
  "temperature": {
    "current": 42,
    "power_cycle_min": 36,
    "power_cycle_max": 46,
    "lifetime_min": 44,
    "lifetime_max": 46,
    "op_limit_min": 0,
    "op_limit_max": 85,
    "limit_min": 0,
    "limit_max": 100,
    "lifetime_over_limit_minutes": 0,
    "lifetime_under_limit_minutes": 0
  },
  "ata_log_directory": {
    "gp_dir_version": 1,
    "smart_dir_version": 1,
    "smart_dir_multi_sector": true,
    "table": [
      {
        "address": 0,
        "name": "Log Directory",
        "read": true,
        "write": false,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 1,
        "name": "Summary SMART error log",
        "read": true,
        "write": false,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 2,
        "name": "Comprehensive SMART error log",
        "read": true,
        "write": false,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 3,
        "name": "Ext. Comprehensive SMART error log",
        "read": true,
        "write": false,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 4,
        "name": "Device Statistics log",
        "read": true,
        "write": false,
        "gp_sectors": 8,
        "smart_sectors": 8
      },
      {
        "address": 6,
        "name": "SMART self-test log",
        "read": true,
        "write": false,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 7,
        "name": "Extended self-test log",
        "read": true,
        "write": false,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 9,
        "name": "Selective self-test log",
        "read": true,
        "write": true,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 16,
        "name": "NCQ Command Error log",
        "read": true,
        "write": false,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 17,
        "name": "SATA Phy Event Counters log",
        "read": true,
        "write": false,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 48,
        "name": "IDENTIFY DEVICE data log",
        "read": true,
        "write": false,
        "gp_sectors": 9,
        "smart_sectors": 9
      },
      {
        "address": 128,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 129,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 130,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 131,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 132,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 133,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 134,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 135,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 136,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 137,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 138,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 139,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 140,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 141,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 142,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 143,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 144,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 145,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 146,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 147,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 148,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 149,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 150,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 151,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 152,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 153,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 154,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 155,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 156,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 157,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 158,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 159,
        "name": "Host vendor specific log",
        "read": true,
        "write": true,
        "gp_sectors": 16,
        "smart_sectors": 16
      },
      {
        "address": 223,
        "name": "Device vendor specific log",
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 224,
        "name": "SCT Command/Status",
        "read": true,
        "write": true,
        "gp_sectors": 1,
        "smart_sectors": 1
      },
      {
        "address": 225,
        "name": "SCT Data Transfer",
        "read": true,
        "write": true,
        "gp_sectors": 1,
        "smart_sectors": 1
      }
    ]
  },
  "ata_smart_error_log": {
    "extended": {
      "revision": 1,
      "sectors": 1,
      "count": 0
    }
  },
  "ata_smart_self_test_log": {
    "extended": {
      "revision": 1,
      "sectors": 1,
      "count": 0
    }
  },
  "ata_smart_selective_self_test_log": {
    "revision": 1,
    "table": [
      {
        "lba_min": 70403103932424,
        "lba_max": 70403103932424,
        "status": {
          "value": 0,
          "string": "Not_testing" 
        }
      },
      {
        "lba_min": 70403103932424,
        "lba_max": 70403103932424,
        "status": {
          "value": 0,
          "string": "Not_testing" 
        }
      },
      {
        "lba_min": 70403103932424,
        "lba_max": 70403103932424,
        "status": {
          "value": 0,
          "string": "Not_testing" 
        }
      },
      {
        "lba_min": 70403103932424,
        "lba_max": 70403103932424,
        "status": {
          "value": 0,
          "string": "Not_testing" 
        }
      },
      {
        "lba_min": 70403103932424,
        "lba_max": 70403103932424,
        "status": {
          "value": 0,
          "string": "Not_testing" 
        }
      }
    ],
    "flags": {
      "value": 16392,
      "remainder_scan_enabled": false
    },
    "power_up_scan_resume_minutes": 0
  },
  "ata_sct_status": {
    "format_version": 3,
    "sct_version": 0,
    "device_state": {
      "value": 0,
      "string": "Active" 
    },
    "temperature": {
      "current": 42,
      "power_cycle_min": 36,
      "power_cycle_max": 46,
      "lifetime_min": 33,
      "lifetime_max": 46,
      "under_limit_count": 0,
      "over_limit_count": 0
    }
  },
  "ata_sct_temperature_history": {
    "version": 2,
    "sampling_period_minutes": 1,
    "logging_interval_minutes": 1,
    "temperature": {
      "op_limit_min": 0,
      "op_limit_max": 100,
      "limit_min": 0,
      "limit_max": 100
    },
    "size": 128,
    "index": 54,
    "table": [
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      41,
      42,
      42,
      42,
      42,
      42,
      41,
      41,
      41,
      41,
      42,
      42,
      42,
      42,
      42,
      42,
      42,
      42,
      41
    ]
  },
  "ata_sct_erc": {
    "read": {
      "enabled": false
    },
    "write": {
      "enabled": false
    }
  },
  "ata_device_statistics": {
    "pages": [
      {
        "number": 1,
        "name": "General Statistics",
        "revision": 2,
        "table": [
          {
            "offset": 8,
            "name": "Lifetime Power-On Resets",
            "size": 4,
            "value": 12,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 16,
            "name": "Power-on Hours",
            "size": 4,
            "value": 652,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 24,
            "name": "Logical Sectors Written",
            "size": 6,
            "value": 6884434936,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 32,
            "name": "Number of Write Commands",
            "size": 6,
            "value": 91239448,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 40,
            "name": "Logical Sectors Read",
            "size": 6,
            "value": 16056245581,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 48,
            "name": "Number of Read Commands",
            "size": 6,
            "value": 85003167,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      },
      null,
      null,
      {
        "number": 4,
        "name": "General Errors Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 8,
            "name": "Number of Reported Uncorrectable Errors",
            "size": 4,
            "value": 75,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 16,
            "name": "Resets Between Cmd Acceptance and Completion",
            "size": 4,
            "value": 4,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      },
      {
        "number": 5,
        "name": "Temperature Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 8,
            "name": "Current Temperature",
            "size": 1,
            "value": 42,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 16,
            "name": "Average Short Term Temperature",
            "size": 1,
            "value": 43,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 24,
            "name": "Average Long Term Temperature",
            "size": 1,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 32,
            "name": "Highest Temperature",
            "size": 1,
            "value": 46,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 40,
            "name": "Lowest Temperature",
            "size": 1,
            "value": 44,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 48,
            "name": "Highest Average Short Term Temperature",
            "size": 1,
            "value": 43,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 56,
            "name": "Lowest Average Short Term Temperature",
            "size": 1,
            "value": 43,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 64,
            "name": "Highest Average Long Term Temperature",
            "size": 1,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 72,
            "name": "Lowest Average Long Term Temperature",
            "size": 1,
            "flags": {
              "value": 128,
              "string": "---- ",
              "valid": false,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 80,
            "name": "Time in Over-Temperature",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 88,
            "name": "Specified Maximum Operating Temperature",
            "size": 1,
            "value": 85,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 96,
            "name": "Time in Under-Temperature",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 104,
            "name": "Specified Minimum Operating Temperature",
            "size": 1,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      },
      {
        "number": 6,
        "name": "Transport Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 8,
            "name": "Number of Hardware Resets",
            "size": 4,
            "value": 206,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          },
          {
            "offset": 24,
            "name": "Number of Interface CRC Errors",
            "size": 4,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      },
      {
        "number": 7,
        "name": "Solid State Device Statistics",
        "revision": 1,
        "table": [
          {
            "offset": 8,
            "name": "Percentage Used Endurance Indicator",
            "size": 1,
            "value": 0,
            "flags": {
              "value": 192,
              "string": "V--- ",
              "valid": true,
              "normalized": false,
              "supports_dsn": false,
              "monitored_condition_met": false
            }
          }
        ]
      }
    ]
  },
  "sata_phy_event_counters": {
    "table": [
      {
        "id": 1,
        "name": "Command failed due to ICRC error",
        "size": 2,
        "value": 0,
        "overflow": false
      },
      {
        "id": 5,
        "name": "R_ERR response for non-data FIS",
        "size": 2,
        "value": 0,
        "overflow": false
      },
      {
        "id": 10,
        "name": "Device-to-host register FISes sent due to a COMRESET",
        "size": 2,
        "value": 19,
        "overflow": false
      }
    ],
    "reset": false
  }
}

#6 Updated by Neha Ojha over 2 years ago

  • Assignee changed from Sage Weil to Yaarit Hatuka

#7 Updated by Rob Logan about 2 years ago

I'm experiencing the same issue. here are nulls from smartctl -x --json /dev/sda

@ {
"offset": 48,
"name": "Number of Read Commands",
"size": 6,
"value": 811764,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
null,
null, {
"number": 4,
"name": "General Errors Statistics",
"revision": 1,
"table": [ {
"offset": 8,
"name": "Number of Reported Uncorrectable Errors",
"size": 4,
"value": 0,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
@

@ {
"offset": 16,
"name": "Resets Between Cmd Acceptance and Completion",
"size": 4,
"value": 50,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
null, {
"number": 6,
"name": "Transport Statistics",
"revision": 1,
"table": [ {
"offset": 8,
"name": "Number of Hardware Resets",
"size": 4,
"value": 58,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
@

#8 Updated by Alfredo Rezinovsky about 2 years ago

Anyone? I'm no developer but seems a very simple thing to fix.
I cannot even disable the module to get rid of the HEALTH_ERR

#9 Updated by Yaarit Hatuka about 2 years ago

Rob Logan wrote:

I'm experiencing the same issue. here are nulls from smartctl -x --json /dev/sda

@ {
"offset": 48,
"name": "Number of Read Commands",
"size": 6,
"value": 811764,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
null,
null, {
"number": 4,
"name": "General Errors Statistics",
"revision": 1,
"table": [ {
"offset": 8,
"name": "Number of Reported Uncorrectable Errors",
"size": 4,
"value": 0,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
@

@ {
"offset": 16,
"name": "Resets Between Cmd Acceptance and Completion",
"size": 4,
"value": 50,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
null, {
"number": 6,
"name": "Transport Statistics",
"revision": 1,
"table": [ {
"offset": 8,
"name": "Number of Hardware Resets",
"size": 4,
"value": 58,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
@

Apologies for the delay!

Thanks Rob, in the output of "ata_device_statistics" above I don't see page number 7 ("Solid State Device Statistics"). Can you please specify the vendor and model for this device?

I will push the fix for it shortly.

#10 Updated by Alfredo Rezinovsky about 2 years ago

I have a

"model_family": "Kingston SSDNow UV400",
"model_name": "KINGSTON SUV400S37240G",
"serial_number": "50026B766A0429A5",

Attached full file.

#11 Updated by Rob Logan about 2 years ago

ata_device_statistics contains three nulls with number 7's name as "Percentage Used Endurance Indicator" for the eVtran drive. Attached is the full dump. Thanks for looking into this and I'd be happy to test /usr/share/ceph/mgr/devicehealth/module.py prior to commit. Running ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) via proxmox.

#12 Updated by Yaarit Hatuka about 2 years ago

  • Status changed from New to Fix Under Review
  • Backport set to pacific, quincy
  • Pull request ID set to 45121
  • Affected Versions v16.2.5, v16.2.6, v16.2.7 added

Thanks for your input, everyone. I pushed a fix to skip those null pages and tested it on the input you provided.
Rob, Alfredo, please see if this fix helps or there might be additional issues.
Thanks!

#13 Updated by Rob Logan about 2 years ago

Yup, works great! thanks!

2022-02-22T15:18:03.927303-0500 mon.nuc8a (mon.0) 271918 : cluster [INF] Manager daemon nuc8a is unresponsive, replacing it with standby daemon nuc10a
2022-02-22T15:18:03.937499-0500 mon.nuc8a (mon.0) 271919 : cluster [DBG] osdmap e575612: 7 total, 7 up, 7 in
2022-02-22T15:18:03.947206-0500 mon.nuc8a (mon.0) 271920 : cluster [DBG] mgrmap e694: nuc10a(active, starting, since 0.0199344s), standbys: nuc10b
2022-02-22T15:18:03.982817-0500 mon.nuc8a (mon.0) 271921 : cluster [INF] Manager daemon nuc10a is now available
2022-02-22T15:18:04.963010-0500 mon.nuc8a (mon.0) 271926 : cluster [DBG] mgrmap e695: nuc10a(active, since 1.03574s), standbys: nuc10b
2022-02-22T15:18:04.979763-0500 mon.nuc8a (mon.0) 271928 : cluster [DBG] osdmap e575613: 7 total, 7 up, 7 in
2022-02-22T15:18:05.978103-0500 mon.nuc8a (mon.0) 271932 : cluster [INF] Health check cleared: MGR_MODULE_ERROR (was: Module 'devicehealth' has failed: 'NoneType' object has no attribute 'get')
2022-02-22T15:18:05.978119-0500 mon.nuc8a (mon.0) 271933 : cluster [INF] Cluster is now healthy
2022-02-22T15:18:05.991664-0500 mon.nuc8a (mon.0) 271935 : cluster [DBG] osdmap e575614: 7 total, 7 up, 7 in
2022-02-22T15:18:07.033717-0500 mon.nuc8a (mon.0) 271936 : cluster [DBG] mgrmap e696: nuc10a(active, since 3s), standbys: nuc10b

#14 Updated by Yaarit Hatuka about 2 years ago

Great, thanks for checking!

#15 Updated by Steven Lewis about 2 years ago

I can also confirm that the fix is working for me as well.

#16 Updated by Sridhar Seshasayee about 2 years ago

  • Status changed from Fix Under Review to Pending Backport

#17 Updated by Sridhar Seshasayee about 2 years ago

  • Copied to Backport #54394: quincy: mgr/devicehealth: health warning caused by AttributeError: 'NoneType' object has no attribute 'get' added

#18 Updated by Sridhar Seshasayee about 2 years ago

  • Copied to Backport #54395: pacific: mgr/devicehealth: health warning caused by AttributeError: 'NoneType' object has no attribute 'get' added

#19 Updated by Yaarit Hatuka about 2 years ago

  • Status changed from Pending Backport to Resolved

all backports are merged; resolving

Also available in: Atom PDF