Bug #51554
mgr/devicehealth: health warning caused by AttributeError: 'NoneType' object has no attribute 'get'
0%
Description
health warning caused by AttributeError: 'NoneType' object has no attribute 'get'¶
The cluster status is not healthy because the devicehealth module throws an exception.
Environment¶
ceph version
string: 16.2.4- Platform (OS/distro/release): Container images from docker.io/ceph/ceph
How reproducible¶
Restarting the mgr containers does not fix the issue.
Actual results¶
Jun 30 16:07:09 al111 bash171790: debug 2021-06-30T14:07:09.939+0000 7f2a31d64700 -1 devicehealth.serve:
Jun 30 16:07:09 al111 bash171790: debug 2021-06-30T14:07:09.939+0000 7f2a31d64700 -1 Traceback (most recent call last):
Jun 30 16:07:09 al111 bash171790: File "/usr/share/ceph/mgr/devicehealth/module.py", line 330, in serve
Jun 30 16:07:09 al111 bash171790: self.scrape_all()
Jun 30 16:07:09 al111 bash171790: File "/usr/share/ceph/mgr/devicehealth/module.py", line 390, in scrape_all
Jun 30 16:07:09 al111 bash171790: self.put_device_metrics(ioctx, device, data)
Jun 30 16:07:09 al111 bash171790: File "/usr/share/ceph/mgr/devicehealth/module.py", line 477, in put_device_metrics
Jun 30 16:07:09 al111 bash171790: wear_level = get_ata_wear_level(data)
Jun 30 16:07:09 al111 bash171790: File "/usr/share/ceph/mgr/devicehealth/module.py", line 33, in get_ata_wear_level
Jun 30 16:07:09 al111 bash171790: if page.get("number") != 7:
Jun 30 16:07:09 al111 bash171790: AttributeError: 'NoneType' object has no attribute 'get'
Expected results¶
No Python exception.
Related issues
History
#1 Updated by Stefan Fleischmann over 1 year ago
Same problem here with Ceph 16.2.5. Is someone looking into this?
#2 Updated by Yaarit Hatuka over 1 year ago
Thanks, Robert, Stefan, for reporting this.
This seems like a nonstandard output of smartctl command.
Can you please share the output of smartctl on the device where this happens?Specifically:
- the vendor and model of this device
- the entire content of 'ata_device_statistics' key
- smartctl version
#3 Updated by Robert Sander over 1 year ago
Yaarit Hatuka wrote:
- the entire content of 'ata_device_statistics' key
Where do I find this information?
#4 Updated by Michael Wodniok over 1 year ago
Robert Sander wrote:
Yaarit Hatuka wrote:
- the entire content of 'ata_device_statistics' key
Where do I find this information?
The problem is: how do you know which disk causes the error?
We have several disk types in use and here is one which does not have any tabular SMART data available in ceph:
root@rz2b-cn11:~# cephadm shell --fsid 41902fa4-3ecf-11eb-94ef-258486fe8a0f -c /etc/ceph/ceph.conf -n osd.3 -- smartctl -a /dev/sde Using recent ceph image ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb smartctl 7.1 2020-04-05 r5049 [x86_64-linux-5.4.0-81-generic] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: SEAGATE Product: ST1000NX0323 Revision: K002 Compliance: SPC-4 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Logical block size: 4096 bytes LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 2.5 inches Logical Unit id: 0x5000c5007f260653 Serial number: S4700YQR0000J507296Q Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Mon Aug 30 09:07:02 2021 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Grown defects during certification <not available> Total blocks reassigned during format <not available> Total new blocks reassigned <not available> Power on minutes since format <not available> Current Drive Temperature: 34 C Drive Trip Temperature: 60 C Manufactured in week 01 of year 2015 Specified cycle count over device lifetime: 10000 Accumulated start-stop cycles: 32 Specified load-unload count over device lifetime: 300000 Accumulated load-unload cycles: 2067 Elements in grown defect list: 0 Vendor (Seagate Cache) information Blocks sent to initiator = 3465542296 Blocks received from initiator = 1594710387 Blocks read from cache and sent to initiator = 119140210 Number of read and write commands whose size <= segment size = 37058510 Number of read and write commands whose size > segment size = 82828 Vendor (Seagate/Hitachi) factory information number of hours powered up = 41393.57 number of minutes until next internal SMART test = 31 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 2659695438 0 0 2659695438 0 14194.861 0 write: 0 0 0 0 0 6539.483 0 Non-medium error count: 84 [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] No Self-tests have been logged
As you can see there is any smart data listed in tabular form. Could this cause the issue?
#5 Updated by Steven Lewis over 1 year ago
Hi
I have had the same problem I have tracked it down to the parsing of the json response from to "smartctl -x --json" call
I have attached the json from the "smartctl -x --json /dev/sda" from my node.
It looks like the code is looping to find page 7, but some of the page are null and not a object, resulting in a NoneType.
Hope that helps.
From File "/usr/share/ceph/mgr/devicehealth/module.py"
for page in data.get("ata_device_statistics", {}).get("pages", []):
if page.get("number") != 7:
continue
for item in page.get("table", []):
if item["offset"] == 8:
return item["value"] / 100.0
return None
Cut Down Json showing the null Pages
{
"ata_device_statistics": {
"pages": [
{
"number": 1,
"name": "General Statistics",
"revision": 2,
"table": [
]
},
null,
null,
{
"number": 4,
"name": "General Errors Statistics",
"revision": 1,
"table": [
]
},
{
"number": 5,
"name": "Temperature Statistics",
"revision": 1,
"table": [
]
},
{
"number": 6,
"name": "Transport Statistics",
"revision": 1,
"table": [
]
},
{
"number": 7,
"name": "Solid State Device Statistics",
"revision": 1,
"table": [
]
}
]
}
}
Full Json Dump
{
"json_format_version": [
1,
0
],
"smartctl": {
"version": [
7,
2
],
"svn_revision": "5155",
"platform_info": "x86_64-linux-5.11.22-4-pve",
"build_info": "(local build)",
"argv": [
"smartctl",
"-x",
"--json",
"/dev/sda"
],
"exit_status": 0
},
"device": {
"name": "/dev/sda",
"info_name": "/dev/sda [SAT]",
"type": "sat",
"protocol": "ATA"
},
"model_family": "Intel 540 Series SSDs",
"model_name": "INTEL SSDSC2KW480H6",
"serial_number": "CVLT7311029M480EGN",
"wwn": {
"naa": 5,
"oui": 6083300,
"id": 5612639719
},
"firmware_version": "LSF036C",
"user_capacity": {
"blocks": 937703088,
"bytes": 480103981056
},
"logical_block_size": 512,
"physical_block_size": 512,
"rotation_rate": 0,
"form_factor": {
"ata_value": 3,
"name": "2.5 inches"
},
"trim": {
"supported": true,
"deterministic": true,
"zeroed": false
},
"in_smartctl_database": true,
"ata_version": {
"string": "ACS-3 (minor revision not indicated)",
"major_value": 2044,
"minor_value": 65535
},
"sata_version": {
"string": "SATA 3.2",
"value": 255
},
"interface_speed": {
"max": {
"sata_value": 14,
"string": "6.0 Gb/s",
"units_per_second": 60,
"bits_per_unit": 100000000
},
"current": {
"sata_value": 3,
"string": "6.0 Gb/s",
"units_per_second": 60,
"bits_per_unit": 100000000
}
},
"local_time": {
"time_t": 1632228178,
"asctime": "Tue Sep 21 13:42:58 2021 BST"
},
"ata_apm": {
"enabled": true,
"level": 254,
"string": "maximum performance",
"max_performance": true,
"min_power": false,
"with_standby": false
},
"read_lookahead": {
"enabled": true
},
"write_cache": {
"enabled": true
},
"ata_security": {
"state": 41,
"string": "Disabled, frozen [SEC2]",
"enabled": false,
"frozen": true
},
"smart_status": {
"passed": true
},
"ata_smart_data": {
"offline_data_collection": {
"status": {
"value": 0,
"string": "was never started"
},
"completion_seconds": 0
},
"self_test": {
"status": {
"value": 0,
"string": "completed without error",
"passed": true
},
"polling_minutes": {
"short": 2,
"extended": 30
}
},
"capabilities": {
"values": [
83,
3
],
"exec_offline_immediate_supported": true,
"offline_is_aborted_upon_new_cmd": false,
"offline_surface_scan_supported": false,
"self_tests_supported": true,
"conveyance_self_test_supported": false,
"selective_self_test_supported": true,
"attribute_autosave_enabled": true,
"error_logging_supported": true,
"gp_logging_supported": true
}
},
"ata_sct_capabilities": {
"value": 57,
"error_recovery_control_supported": true,
"feature_control_supported": true,
"data_table_supported": true
},
"ata_smart_attributes": {
"revision": 1,
"table": [
{
"id": 5,
"name": "Reallocated_Sector_Ct",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 78,
"string": "78"
}
},
{
"id": 9,
"name": "Power_On_Hours_and_Msec",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 652,
"string": "652h+00m+00.000s"
}
},
{
"id": 12,
"name": "Power_Cycle_Count",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 12,
"string": "12"
}
},
{
"id": 170,
"name": "Available_Reservd_Space",
"value": 99,
"worst": 99,
"thresh": 10,
"when_failed": "",
"flags": {
"value": 51,
"string": "PO--CK ",
"prefailure": true,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 171,
"name": "Program_Fail_Count",
"value": 100,
"worst": 100,
"thresh": 10,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 172,
"name": "Erase_Fail_Count",
"value": 100,
"worst": 100,
"thresh": 10,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 174,
"name": "Unexpect_Power_Loss_Ct",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 4,
"string": "4"
}
},
{
"id": 183,
"name": "SATA_Downshift_Count",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 184,
"name": "End-to-End_Error",
"value": 100,
"worst": 100,
"thresh": 90,
"when_failed": "",
"flags": {
"value": 51,
"string": "PO--CK ",
"prefailure": true,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 187,
"name": "Uncorrectable_Error_Cnt",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 190,
"name": "Airflow_Temperature_Cel",
"value": 34,
"worst": 38,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 111671640098,
"string": "34 (Min/Max 26/38)"
}
},
{
"id": 192,
"name": "Power-Off_Retract_Count",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 4,
"string": "4"
}
},
{
"id": 199,
"name": "UDMA_CRC_Error_Count",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 225,
"name": "Host_Writes_32MiB",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 105048,
"string": "105048"
}
},
{
"id": 226,
"name": "Workld_Media_Wear_Indic",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 227,
"name": "Workld_Host_Reads_Perc",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 228,
"name": "Workload_Minutes",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 232,
"name": "Available_Reservd_Space",
"value": 99,
"worst": 99,
"thresh": 10,
"when_failed": "",
"flags": {
"value": 51,
"string": "PO--CK ",
"prefailure": true,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 233,
"name": "Media_Wearout_Indicator",
"value": 99,
"worst": 99,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 0,
"string": "0"
}
},
{
"id": 241,
"name": "Total_LBAs_Written",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 105048,
"string": "105048"
}
},
{
"id": 242,
"name": "Total_LBAs_Read",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 244998,
"string": "244998"
}
},
{
"id": 249,
"name": "NAND_Writes_1GiB",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 2720,
"string": "2720"
}
},
{
"id": 252,
"name": "Unknown_Attribute",
"value": 100,
"worst": 100,
"thresh": 0,
"when_failed": "",
"flags": {
"value": 50,
"string": "-O--CK ",
"prefailure": false,
"updated_online": true,
"performance": false,
"error_rate": false,
"event_count": true,
"auto_keep": true
},
"raw": {
"value": 7,
"string": "7"
}
}
]
},
"power_on_time": {
"hours": 652,
"minutes": 0
},
"power_cycle_count": 12,
"temperature": {
"current": 42,
"power_cycle_min": 36,
"power_cycle_max": 46,
"lifetime_min": 44,
"lifetime_max": 46,
"op_limit_min": 0,
"op_limit_max": 85,
"limit_min": 0,
"limit_max": 100,
"lifetime_over_limit_minutes": 0,
"lifetime_under_limit_minutes": 0
},
"ata_log_directory": {
"gp_dir_version": 1,
"smart_dir_version": 1,
"smart_dir_multi_sector": true,
"table": [
{
"address": 0,
"name": "Log Directory",
"read": true,
"write": false,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 1,
"name": "Summary SMART error log",
"read": true,
"write": false,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 2,
"name": "Comprehensive SMART error log",
"read": true,
"write": false,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 3,
"name": "Ext. Comprehensive SMART error log",
"read": true,
"write": false,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 4,
"name": "Device Statistics log",
"read": true,
"write": false,
"gp_sectors": 8,
"smart_sectors": 8
},
{
"address": 6,
"name": "SMART self-test log",
"read": true,
"write": false,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 7,
"name": "Extended self-test log",
"read": true,
"write": false,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 9,
"name": "Selective self-test log",
"read": true,
"write": true,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 16,
"name": "NCQ Command Error log",
"read": true,
"write": false,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 17,
"name": "SATA Phy Event Counters log",
"read": true,
"write": false,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 48,
"name": "IDENTIFY DEVICE data log",
"read": true,
"write": false,
"gp_sectors": 9,
"smart_sectors": 9
},
{
"address": 128,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 129,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 130,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 131,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 132,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 133,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 134,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 135,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 136,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 137,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 138,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 139,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 140,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 141,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 142,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 143,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 144,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 145,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 146,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 147,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 148,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 149,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 150,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 151,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 152,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 153,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 154,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 155,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 156,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 157,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 158,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 159,
"name": "Host vendor specific log",
"read": true,
"write": true,
"gp_sectors": 16,
"smart_sectors": 16
},
{
"address": 223,
"name": "Device vendor specific log",
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 224,
"name": "SCT Command/Status",
"read": true,
"write": true,
"gp_sectors": 1,
"smart_sectors": 1
},
{
"address": 225,
"name": "SCT Data Transfer",
"read": true,
"write": true,
"gp_sectors": 1,
"smart_sectors": 1
}
]
},
"ata_smart_error_log": {
"extended": {
"revision": 1,
"sectors": 1,
"count": 0
}
},
"ata_smart_self_test_log": {
"extended": {
"revision": 1,
"sectors": 1,
"count": 0
}
},
"ata_smart_selective_self_test_log": {
"revision": 1,
"table": [
{
"lba_min": 70403103932424,
"lba_max": 70403103932424,
"status": {
"value": 0,
"string": "Not_testing"
}
},
{
"lba_min": 70403103932424,
"lba_max": 70403103932424,
"status": {
"value": 0,
"string": "Not_testing"
}
},
{
"lba_min": 70403103932424,
"lba_max": 70403103932424,
"status": {
"value": 0,
"string": "Not_testing"
}
},
{
"lba_min": 70403103932424,
"lba_max": 70403103932424,
"status": {
"value": 0,
"string": "Not_testing"
}
},
{
"lba_min": 70403103932424,
"lba_max": 70403103932424,
"status": {
"value": 0,
"string": "Not_testing"
}
}
],
"flags": {
"value": 16392,
"remainder_scan_enabled": false
},
"power_up_scan_resume_minutes": 0
},
"ata_sct_status": {
"format_version": 3,
"sct_version": 0,
"device_state": {
"value": 0,
"string": "Active"
},
"temperature": {
"current": 42,
"power_cycle_min": 36,
"power_cycle_max": 46,
"lifetime_min": 33,
"lifetime_max": 46,
"under_limit_count": 0,
"over_limit_count": 0
}
},
"ata_sct_temperature_history": {
"version": 2,
"sampling_period_minutes": 1,
"logging_interval_minutes": 1,
"temperature": {
"op_limit_min": 0,
"op_limit_max": 100,
"limit_min": 0,
"limit_max": 100
},
"size": 128,
"index": 54,
"table": [
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
41,
42,
42,
42,
42,
42,
41,
41,
41,
41,
42,
42,
42,
42,
42,
42,
42,
42,
41
]
},
"ata_sct_erc": {
"read": {
"enabled": false
},
"write": {
"enabled": false
}
},
"ata_device_statistics": {
"pages": [
{
"number": 1,
"name": "General Statistics",
"revision": 2,
"table": [
{
"offset": 8,
"name": "Lifetime Power-On Resets",
"size": 4,
"value": 12,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 16,
"name": "Power-on Hours",
"size": 4,
"value": 652,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 24,
"name": "Logical Sectors Written",
"size": 6,
"value": 6884434936,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 32,
"name": "Number of Write Commands",
"size": 6,
"value": 91239448,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 40,
"name": "Logical Sectors Read",
"size": 6,
"value": 16056245581,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 48,
"name": "Number of Read Commands",
"size": 6,
"value": 85003167,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
null,
null,
{
"number": 4,
"name": "General Errors Statistics",
"revision": 1,
"table": [
{
"offset": 8,
"name": "Number of Reported Uncorrectable Errors",
"size": 4,
"value": 75,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 16,
"name": "Resets Between Cmd Acceptance and Completion",
"size": 4,
"value": 4,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
{
"number": 5,
"name": "Temperature Statistics",
"revision": 1,
"table": [
{
"offset": 8,
"name": "Current Temperature",
"size": 1,
"value": 42,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 16,
"name": "Average Short Term Temperature",
"size": 1,
"value": 43,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 24,
"name": "Average Long Term Temperature",
"size": 1,
"flags": {
"value": 128,
"string": "---- ",
"valid": false,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 32,
"name": "Highest Temperature",
"size": 1,
"value": 46,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 40,
"name": "Lowest Temperature",
"size": 1,
"value": 44,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 48,
"name": "Highest Average Short Term Temperature",
"size": 1,
"value": 43,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 56,
"name": "Lowest Average Short Term Temperature",
"size": 1,
"value": 43,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 64,
"name": "Highest Average Long Term Temperature",
"size": 1,
"flags": {
"value": 128,
"string": "---- ",
"valid": false,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 72,
"name": "Lowest Average Long Term Temperature",
"size": 1,
"flags": {
"value": 128,
"string": "---- ",
"valid": false,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 80,
"name": "Time in Over-Temperature",
"size": 4,
"value": 0,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 88,
"name": "Specified Maximum Operating Temperature",
"size": 1,
"value": 85,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 96,
"name": "Time in Under-Temperature",
"size": 4,
"value": 0,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 104,
"name": "Specified Minimum Operating Temperature",
"size": 1,
"value": 0,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
{
"number": 6,
"name": "Transport Statistics",
"revision": 1,
"table": [
{
"offset": 8,
"name": "Number of Hardware Resets",
"size": 4,
"value": 206,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
{
"offset": 24,
"name": "Number of Interface CRC Errors",
"size": 4,
"value": 0,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
{
"number": 7,
"name": "Solid State Device Statistics",
"revision": 1,
"table": [
{
"offset": 8,
"name": "Percentage Used Endurance Indicator",
"size": 1,
"value": 0,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
}
]
},
"sata_phy_event_counters": {
"table": [
{
"id": 1,
"name": "Command failed due to ICRC error",
"size": 2,
"value": 0,
"overflow": false
},
{
"id": 5,
"name": "R_ERR response for non-data FIS",
"size": 2,
"value": 0,
"overflow": false
},
{
"id": 10,
"name": "Device-to-host register FISes sent due to a COMRESET",
"size": 2,
"value": 19,
"overflow": false
}
],
"reset": false
}
}
#6 Updated by Neha Ojha over 1 year ago
- Assignee changed from Sage Weil to Yaarit Hatuka
#7 Updated by Rob Logan about 1 year ago
I'm experiencing the same issue. here are nulls from smartctl -x --json /dev/sda
@ {
"offset": 48,
"name": "Number of Read Commands",
"size": 6,
"value": 811764,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
null,
null,
{
"number": 4,
"name": "General Errors Statistics",
"revision": 1,
"table": [
{
"offset": 8,
"name": "Number of Reported Uncorrectable Errors",
"size": 4,
"value": 0,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
@
@ {
"offset": 16,
"name": "Resets Between Cmd Acceptance and Completion",
"size": 4,
"value": 50,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
null,
{
"number": 6,
"name": "Transport Statistics",
"revision": 1,
"table": [
{
"offset": 8,
"name": "Number of Hardware Resets",
"size": 4,
"value": 58,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
@
#8 Updated by Alfredo Rezinovsky about 1 year ago
Anyone? I'm no developer but seems a very simple thing to fix.
I cannot even disable the module to get rid of the HEALTH_ERR
#9 Updated by Yaarit Hatuka about 1 year ago
Rob Logan wrote:
I'm experiencing the same issue. here are nulls from
smartctl -x --json /dev/sda
@ {
"offset": 48,
"name": "Number of Read Commands",
"size": 6,
"value": 811764,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
null,
null, {
"number": 4,
"name": "General Errors Statistics",
"revision": 1,
"table": [ {
"offset": 8,
"name": "Number of Reported Uncorrectable Errors",
"size": 4,
"value": 0,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
@@ {
"offset": 16,
"name": "Resets Between Cmd Acceptance and Completion",
"size": 4,
"value": 50,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
}
]
},
null, {
"number": 6,
"name": "Transport Statistics",
"revision": 1,
"table": [ {
"offset": 8,
"name": "Number of Hardware Resets",
"size": 4,
"value": 58,
"flags": {
"value": 192,
"string": "V--- ",
"valid": true,
"normalized": false,
"supports_dsn": false,
"monitored_condition_met": false
}
},
@
Apologies for the delay!
Thanks Rob, in the output of "ata_device_statistics" above I don't see page number 7 ("Solid State Device Statistics"). Can you please specify the vendor and model for this device?
I will push the fix for it shortly.
#10 Updated by Alfredo Rezinovsky about 1 year ago
- File smart.json added
I have a
"model_family": "Kingston SSDNow UV400",
"model_name": "KINGSTON SUV400S37240G",
"serial_number": "50026B766A0429A5",
Attached full file.
#11 Updated by Rob Logan about 1 year ago
- File eVtran-smartctl.json added
ata_device_statistics contains three nulls with number 7's name as "Percentage Used Endurance Indicator" for the eVtran drive. Attached is the full dump. Thanks for looking into this and I'd be happy to test /usr/share/ceph/mgr/devicehealth/module.py prior to commit. Running ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable) via proxmox.
#12 Updated by Yaarit Hatuka about 1 year ago
- Status changed from New to Fix Under Review
- Backport set to pacific, quincy
- Pull request ID set to 45121
- Affected Versions v16.2.5, v16.2.6, v16.2.7 added
Thanks for your input, everyone. I pushed a fix to skip those null pages and tested it on the input you provided.
Rob, Alfredo, please see if this fix helps or there might be additional issues.
Thanks!
#13 Updated by Rob Logan about 1 year ago
Yup, works great! thanks!
2022-02-22T15:18:03.927303-0500 mon.nuc8a (mon.0) 271918 : cluster [INF] Manager daemon nuc8a is unresponsive, replacing it with standby daemon nuc10a 2022-02-22T15:18:03.937499-0500 mon.nuc8a (mon.0) 271919 : cluster [DBG] osdmap e575612: 7 total, 7 up, 7 in 2022-02-22T15:18:03.947206-0500 mon.nuc8a (mon.0) 271920 : cluster [DBG] mgrmap e694: nuc10a(active, starting, since 0.0199344s), standbys: nuc10b 2022-02-22T15:18:03.982817-0500 mon.nuc8a (mon.0) 271921 : cluster [INF] Manager daemon nuc10a is now available 2022-02-22T15:18:04.963010-0500 mon.nuc8a (mon.0) 271926 : cluster [DBG] mgrmap e695: nuc10a(active, since 1.03574s), standbys: nuc10b 2022-02-22T15:18:04.979763-0500 mon.nuc8a (mon.0) 271928 : cluster [DBG] osdmap e575613: 7 total, 7 up, 7 in 2022-02-22T15:18:05.978103-0500 mon.nuc8a (mon.0) 271932 : cluster [INF] Health check cleared: MGR_MODULE_ERROR (was: Module 'devicehealth' has failed: 'NoneType' object has no attribute 'get') 2022-02-22T15:18:05.978119-0500 mon.nuc8a (mon.0) 271933 : cluster [INF] Cluster is now healthy 2022-02-22T15:18:05.991664-0500 mon.nuc8a (mon.0) 271935 : cluster [DBG] osdmap e575614: 7 total, 7 up, 7 in 2022-02-22T15:18:07.033717-0500 mon.nuc8a (mon.0) 271936 : cluster [DBG] mgrmap e696: nuc10a(active, since 3s), standbys: nuc10b
#14 Updated by Yaarit Hatuka about 1 year ago
Great, thanks for checking!
#15 Updated by Steven Lewis about 1 year ago
I can also confirm that the fix is working for me as well.
#16 Updated by Sridhar Seshasayee about 1 year ago
- Status changed from Fix Under Review to Pending Backport
#17 Updated by Sridhar Seshasayee about 1 year ago
- Copied to Backport #54394: quincy: mgr/devicehealth: health warning caused by AttributeError: 'NoneType' object has no attribute 'get' added
#18 Updated by Sridhar Seshasayee about 1 year ago
- Copied to Backport #54395: pacific: mgr/devicehealth: health warning caused by AttributeError: 'NoneType' object has no attribute 'get' added
#19 Updated by Yaarit Hatuka about 1 year ago
- Status changed from Pending Backport to Resolved
all backports are merged; resolving