Project

General

Profile

Actions

Bug #53538

closed

mgr/stats: ZeroDivisionError

Added by Sebastian Wagner over 2 years ago. Updated 4 months ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
low-hanging-fruit backport_processed
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

root@service-01-08020:~# ceph osd status storage-01-08002
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1623, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 416, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/status/module.py", line 338, in handle_osd_status
    wr_ops_rate = (self.get_rate("osd", osd_id.__str__(), "osd.op_w") +
  File "/usr/share/ceph/mgr/status/module.py", line 28, in get_rate
    return (data[-1][1] - data[-2][1]) // int(data[-1][0] - data[-2][0])
ZeroDivisionError: integer division or modulo by zero

Since those PRs:

no one had the patience to look into this all over again.


Related issues 4 (0 open4 closed)

Related to mgr - Feature #40365: mgr: Add get_rates_from_data from the dashboard to the mgr_util.pyResolvedStephan Müller

Actions
Has duplicate mgr - Bug #54213: ceph osd status - ZeroDivisionError: integer division or modulo by zeroDuplicate

Actions
Copied to mgr - Backport #54281: pacific: mgr/stats: ZeroDivisionErrorResolvedNitzan MordechaiActions
Copied to mgr - Backport #54282: quincy: mgr/stats: ZeroDivisionErrorResolvedNitzan MordechaiActions
Actions #1

Updated by Sebastian Wagner over 2 years ago

  • Related to Feature #40365: mgr: Add get_rates_from_data from the dashboard to the mgr_util.py added
Actions #2

Updated by Sebastian Wagner over 2 years ago

  • Description updated (diff)
Actions #3

Updated by Neha Ojha over 2 years ago

  • Tags set to low-hanging-fruit
Actions #4

Updated by Neha Ojha over 2 years ago

  • Priority changed from Normal to Urgent
[ubuntu@gibba001 ~]$ sudo ceph osd status|grep gibba043|wc -l
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1648, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 434, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/status/module.py", line 338, in handle_osd_status
    wr_ops_rate = (self.get_rate("osd", osd_id.__str__(), "osd.op_w") +
  File "/usr/share/ceph/mgr/status/module.py", line 28, in get_rate
    return (data[-1][1] - data[-2][1]) // int(data[-1][0] - data[-2][0])
ZeroDivisionError: integer division or modulo by zero
Actions #5

Updated by Neha Ojha about 2 years ago

  • Assignee set to Nitzan Mordechai
Actions #6

Updated by Nitzan Mordechai about 2 years ago

From my understanding, the status update interval by mgr_stats_period configuration value, the division is failing when that statement int(data[-1]0 - data[-2]0) = 0 , data[-1]0 and data[-2]0 holding timestamps of the 2 last stats that were updated, to get 0 in that Subtraction - (data[-1]0 == data[-2]0) or (data[-1]0 - data[-2]0 < 1) we are checking only for the first condition and both cases can only happen when we update the stats fast enough - that means we had mgr_stats_period = 1 or something else caused the stats to be updated in range of less then 1 second.

Actions #7

Updated by Sebastian Wagner about 2 years ago

Nitzan Mordechai wrote:

From my understanding, the status update interval by mgr_stats_period configuration value, the division is failing when that statement int(data[-1]0 - data[-2]0) = 0 , data[-1]0 and data[-2]0 holding timestamps of the 2 last stats that were updated, to get 0 in that Subtraction - (data[-1]0 == data[-2]0) or (data[-1]0 - data[-2]0 < 1) we are checking only for the first condition and both cases can only happen when we update the stats fast enough - that means we had mgr_stats_period = 1 or something else caused the stats to be updated in range of less then 1 second.

Yes and we had this is problem in the dashbaord already. Our solution was https://github.com/ceph/ceph/pull/28603 . If you just use the same function also in the stats module, things should work properly

Actions #8

Updated by Neha Ojha about 2 years ago

  • Status changed from New to Fix Under Review
  • Backport set to pacific,quincy
  • Pull request ID set to 44752
Actions #9

Updated by Neha Ojha about 2 years ago

  • Has duplicate Bug #54213: ceph osd status - ZeroDivisionError: integer division or modulo by zero added
Actions #10

Updated by Neha Ojha about 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #11

Updated by Backport Bot about 2 years ago

Actions #12

Updated by Backport Bot about 2 years ago

Actions #13

Updated by Backport Bot over 1 year ago

  • Tags changed from low-hanging-fruit to low-hanging-fruit backport_processed
Actions #14

Updated by Konstantin Shalygin 4 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF