Project

General

Profile

Actions

Bug #46977

closed

mgr/dashboard: telemetry module throws error "list index out of range"

Added by Xen Gi over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ahoi,
I'm using cephadm on aarch64 with podman backend. Works good so far. This is the output when I try to interact with the telemetry module:

$ ceph telemetry show
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
WARNING: The same type, major and minor should not be used for multiple devices.
Error EIO: Module 'telemetry' has experienced an error and cannot handle commands: list index out of range

In the Dashboard web UI I get the following error:

MGR_MODULE_ERROR: Module 'telemetry' has failed: list index out of range

I haven't seen any related messages in the logs yet, but I didn't look too deep cause I'm not yet familiar enough with it.


Related issues 2 (0 open2 closed)

Copied to Dashboard - Backport #47192: octopus: mgr/dashboard: telemetry module throws error "list index out of range"ResolvedKiefer ChangActions
Copied to Dashboard - Backport #47193: nautilus: mgr/dashboard: telemetry module throws error "list index out of range"ResolvedYaarit HatukaActions
Actions #1

Updated by Xen Gi over 3 years ago

After reloading the mgr module The error is cleared and ceph telemetry show gives output without errors. How ever ceph telemetry show-device wives me this error:

$ ceph telemetry show-device
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1167, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/telemetry/module.py", line 767, in handle_command
    return 0, json.dumps(self.get_report('device'), indent=4, sort_keys=True), ''
  File "/usr/share/ceph/mgr/telemetry/module.py", line 786, in get_report
    return self.gather_device_report()
  File "/usr/share/ceph/mgr/telemetry/module.py", line 413, in gather_device_report
    serial = devid.rsplit('_', 1)[1]
IndexError: list index out of range

Also configuring telemetry in the dashboard web UI gives the error: The configuration could not be loaded.

Actions #2

Updated by Kiefer Chang over 3 years ago

Thanks for reporting the issue, can you post the output of `ceph device ls`?
Looks there is a problem when getting serial number of disks.

Actions #3

Updated by Xen Gi over 3 years ago

Sure, here it is:

$ ceph device ls
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
WARNING: The same type, major and minor should not be used for multiple devices.
DEVICE                                HOST:DEV       DAEMONS    LIFE EXPECTANCY
0x855ef43d                            alpha:mmcblk0  mon.alpha
0x855ef460                            beta:mmcblk0   mon.beta
EFRX-68EUZN0_235678D4BE20             alpha:sdb      osd.3
Hitachi_HUA723020ALA641_YGHVJYXA      alpha:sda      osd.0
ST31000333AS_9TE1QTWR                 beta:sda       osd.2
WDC_WD20EARX-00PASB0_WD-WCAZAE138541  beta:sdc       osd.1
Actions #4

Updated by Neha Ojha over 3 years ago

  • Assignee set to Yaarit Hatuka

Yaarit, could you please take a look?

Actions #5

Updated by Yaarit Hatuka over 3 years ago

Sure, thanks!

Actions #6

Updated by Yaarit Hatuka over 3 years ago

  • Status changed from New to Fix Under Review
  • Backport set to octopus, nautilus
  • Pull request ID set to 36855

Anonymizing the serial number in the device id string fails in rare cases where 'vendor' and 'model' are missing from the device id string.
Ideally device id is generated (in blkdev.cc) as 'vendor_model_serial', in case all fields were successfully retrieved from the device. In cases where they were not, device id can also be generated as 'model_serial' or 'serial'. Splitting by '_' fails in the latter case, since 'serial' is the only element in the string.

In order to anonymize the serial number in smartctl reports we now rely on the serial number value as retrieved from the raw smartctl report itself (as opposed to the one in device id). That's in order to prevent possible inconsistencies between the serial retrieved from device id and the one in the report.

Actions #7

Updated by Yaarit Hatuka over 3 years ago

  • Subject changed from octopus: mgr/dashboard: telemetry module throws error "list index out of range" to mgr/dashboard: telemetry module throws error "list index out of range"
  • Target version deleted (v15.2.4)
Actions #8

Updated by Yaarit Hatuka over 3 years ago

Hi backport team,

The backports to Octopus and Nautilus will have conflicts. Please assign the backport tickets to me and I'll fix them.

Thanks,
Yaarit

Actions #9

Updated by Nathan Cutler over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #10

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47192: octopus: mgr/dashboard: telemetry module throws error "list index out of range" added
Actions #11

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47193: nautilus: mgr/dashboard: telemetry module throws error "list index out of range" added
Actions #12

Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Fix Under Review
Actions #13

Updated by Kefu Chai over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #14

Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #15

Updated by Ernesto Puerta about 3 years ago

  • Project changed from mgr to Dashboard
  • Category deleted (telemetry module)
Actions

Also available in: Atom PDF