Project

General

Profile

Actions

Bug #9052

closed

ceph-mon crashes with *** Caught signal (Floating point exception) **

Added by Jamin Collins over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
firefly
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've found that I can crash ceph-mon by attempting to change pool values (such as pg_num) before adding OSDs to the cluster. Examples of the crash and command:

Crash: http://pastebin.com/LpF0gHNY
Command: http://pastebin.com/8jJ80MK2

Actions #1

Updated by Dan Mick over 9 years ago

  • Category set to Monitor
  • Priority changed from Normal to High
  • Target version set to 0.84
  • Source changed from other to Community (user)
  • Backport set to firefly
Actions #2

Updated by Dan Mick over 9 years ago

With no OSDs in the cluster, the calculations for pgs_per_osd can divide by zero (integer, but that still causes the FPE).

    int expected_osds = MIN(p.get_pg_num(), osdmap.get_num_osds());
    int64_t new_pgs = n - p.get_pg_num();
    int64_t pgs_per_osd = new_pgs / expected_osds;

expected_osds can be zero.

Looking briefly, there are a few other places in OSDMonitor where /0 looks possible:

float up_ratio = (float)up / (float)osdmap.get_num_osds();

float in_ratio = (float)in / (float)osdmap.get_num_osds();

two instances of:

double halflife = (double)g_conf->mon_osd_laggy_halflife;
double decay_k = ::log(.5) / halflife;

It might be good to review Coverity and maybe increase the priority of such warnings.

Actions #3

Updated by Sage Weil over 9 years ago

  • Priority changed from High to Urgent
Actions #4

Updated by Sage Weil over 9 years ago

  • Assignee set to Sage Weil
Actions #5

Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF