Project

General

Profile

Bug #44186

Module 'pg_autoscaler' has failed: division by zero

Added by Sage Weil about 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
low-hanging-fruit
Backport:
nautilus,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-02-18T15:25:54.452+0000 7fe0f37fa700 10 module pg_autoscaler health checks:
{
    "severity": "HEALTH_WARN",
    "summary": {
        "message": "1 pools have both target_size_bytes and target_size_ratio set",
        "count": 1
    },
    "detail": [
        {
            "message": "Pool a has target_size_bytes and target_size_ratio set" 
        }
    ]
}

2020-02-18T15:25:54.452+0000 7fe0f37fa700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'pg_autoscaler' while running on mgr.x: division by zero
2020-02-18T15:25:54.452+0000 7fe0f37fa700 -1 pg_autoscaler.serve:
2020-02-18T15:25:54.452+0000 7fe0f37fa700 -1 ZeroDivisionError: division by zero

/a/sage-2020-02-18_14:47:43-rados-wip-sage2-testing-2020-02-17-2124-distro-basic-smithi/4777440

description: rados/singleton/{all/pg-autoscaler.yaml msgr-failures/many.yaml msgr/async.yaml
objectstore/bluestore-avl.yaml rados.yaml supported-random-distro$/{ubuntu_latest.yaml}}


Related issues

Related to mgr - Bug #46487: pybind/mgr/pg_autoscaler/module.py: do not update event if ev.pg_num== ev.pg_num_target Resolved
Copied to mgr - Backport #44219: nautilus: Module 'pg_autoscaler' has failed: division by zero Resolved
Copied to mgr - Backport #46196: octopus: Module 'pg_autoscaler' has failed: division by zero Resolved

History

#1 Updated by Sage Weil about 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 33402

#2 Updated by Sage Weil about 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to nautilus

#3 Updated by Konstantin Shalygin about 4 years ago

  • Copied to Backport #44219: nautilus: Module 'pg_autoscaler' has failed: division by zero added

#4 Updated by Nathan Cutler almost 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

#5 Updated by Sage Weil almost 4 years ago

  • Status changed from Resolved to Need More Info

hrm, another instance: /a/sage-2020-04-01_21:31:45-rados-wip-sage3-testing-2020-04-01-1428-distro-basic-smithi/4914981

#6 Updated by Josh Durgin almost 4 years ago

  • Tags set to low-hanging-fruit

#7 Updated by Kefu Chai almost 4 years ago

  • Status changed from Need More Info to New
  • Backport changed from nautilus to nautilus,octopus
2020-04-13T03:24:57.574 INFO:tasks.ceph.mgr.x.smithi052.stderr:  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 207, in serve
2020-04-13T03:24:57.574 INFO:tasks.ceph.mgr.x.smithi052.stderr:    self._update_progress_events()
2020-04-13T03:24:57.574 INFO:tasks.ceph.mgr.x.smithi052.stderr:  File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 415, in _update_progress_events
2020-04-13T03:24:57.574 INFO:tasks.ceph.mgr.x.smithi052.stderr:    ev.update(self, (ev.pg_num - pool_data['pg_num']) / (ev.pg_num - ev.pg_num_target))
2020-04-13T03:24:57.575 INFO:tasks.ceph.mgr.x.smithi052.stderr:ZeroDivisionError: division by zero

/a/kchai-2020-04-10_10:07:46-rados-wip-kefu-testing-2020-04-10-1430-distro-basic-smithi/4942579

#8 Updated by Neha Ojha almost 4 years ago

/a/teuthology-2020-06-05_07:01:02-rados-master-distro-basic-smithi/5119405

#9 Updated by Brad Hubbard almost 4 years ago

/a/yuriw-2020-05-29_15:51:00-rados-wip-yuri-testing-2020-05-28-2238-octopus-distro-basic-smithi/5103378

#10 Updated by Neha Ojha almost 4 years ago

  • Assignee set to Neha Ojha

/a/nojha-2020-06-17_16:38:44-rados:singleton-master-distro-basic-smithi/5158406

#11 Updated by Neha Ojha almost 4 years ago

  • Status changed from New to Fix Under Review

#12 Updated by Kefu Chai almost 4 years ago

  • Status changed from Fix Under Review to Pending Backport

#13 Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #46196: octopus: Module 'pg_autoscaler' has failed: division by zero added

#14 Updated by Nathan Cutler over 3 years ago

  • Status changed from Pending Backport to Resolved

#15 Updated by Nathan Cutler over 3 years ago

  • Related to Bug #46487: pybind/mgr/pg_autoscaler/module.py: do not update event if ev.pg_num== ev.pg_num_target added

#16 Updated by Nathan Cutler over 3 years ago

@Neha - reopening an old ticket that has already been backported makes it difficult to backport the second round of fixes. It's better to open a new ticket and mark it as "Related" to the old one. I went ahead and did that in this case so your fix can be backported via the usual backporting workflows.

Also available in: Atom PDF