Project

General

Profile

Bug #9052

ceph-mon crashes with *** Caught signal (Floating point exception) **

Added by Jamin Collins over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
firefly
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

I've found that I can crash ceph-mon by attempting to change pool values (such as pg_num) before adding OSDs to the cluster. Examples of the crash and command:

Crash: http://pastebin.com/LpF0gHNY
Command: http://pastebin.com/8jJ80MK2

Associated revisions

Revision 239401db (diff)
Added by Sage Weil over 5 years ago

mon: fix divide by zero when pg_num adjusted and no osds

Fixes: #9052
Backport: firefly, dumpling
Signed-off-by: Sage Weil <>

Revision 38c3a3c0 (diff)
Added by Sage Weil over 5 years ago

mon: fix divide by zero when pg_num adjusted and no osds

Fixes: #9052
Backport: firefly, dumpling
Signed-off-by: Sage Weil <>
(cherry picked from commit 239401db7b51541a57c59a261b89e0f05347c32d)

Revision aaeebceb (diff)
Added by Sage Weil over 5 years ago

mon: fix divide by zero when pg_num adjusted and no osds

Fixes: #9052
Backport: firefly, dumpling
Signed-off-by: Sage Weil <>

Manual backport of 239401db7b51541a57c59a261b89e0f05347c32d

History

#1 Updated by Dan Mick over 5 years ago

  • Category set to Monitor
  • Priority changed from Normal to High
  • Target version set to 0.84
  • Source changed from other to Community (user)
  • Backport set to firefly

#2 Updated by Dan Mick over 5 years ago

With no OSDs in the cluster, the calculations for pgs_per_osd can divide by zero (integer, but that still causes the FPE).

    int expected_osds = MIN(p.get_pg_num(), osdmap.get_num_osds());
    int64_t new_pgs = n - p.get_pg_num();
    int64_t pgs_per_osd = new_pgs / expected_osds;

expected_osds can be zero.

Looking briefly, there are a few other places in OSDMonitor where /0 looks possible:

float up_ratio = (float)up / (float)osdmap.get_num_osds();

float in_ratio = (float)in / (float)osdmap.get_num_osds();

two instances of:

double halflife = (double)g_conf->mon_osd_laggy_halflife;
double decay_k = ::log(.5) / halflife;

It might be good to review Coverity and maybe increase the priority of such warnings.

#3 Updated by Sage Weil over 5 years ago

  • Priority changed from High to Urgent

#4 Updated by Sage Weil over 5 years ago

  • Assignee set to Sage Weil

#5 Updated by Sage Weil over 5 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF