Bug #9052
ceph-mon crashes with *** Caught signal (Floating point exception) **
Description
I've found that I can crash ceph-mon by attempting to change pool values (such as pg_num) before adding OSDs to the cluster. Examples of the crash and command:
Crash: http://pastebin.com/LpF0gHNY
Command: http://pastebin.com/8jJ80MK2
Associated revisions
mon: fix divide by zero when pg_num adjusted and no osds
Fixes: #9052
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
mon: fix divide by zero when pg_num adjusted and no osds
Fixes: #9052
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 239401db7b51541a57c59a261b89e0f05347c32d)
mon: fix divide by zero when pg_num adjusted and no osds
Fixes: #9052
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
Manual backport of 239401db7b51541a57c59a261b89e0f05347c32d
History
#1 Updated by Dan Mick over 8 years ago
- Category set to Monitor
- Priority changed from Normal to High
- Target version set to 0.84
- Source changed from other to Community (user)
- Backport set to firefly
#2 Updated by Dan Mick over 8 years ago
With no OSDs in the cluster, the calculations for pgs_per_osd
can divide by zero (integer, but that still causes the FPE).
int expected_osds = MIN(p.get_pg_num(), osdmap.get_num_osds()); int64_t new_pgs = n - p.get_pg_num(); int64_t pgs_per_osd = new_pgs / expected_osds;
expected_osds
can be zero.
Looking briefly, there are a few other places in OSDMonitor where /0 looks possible:
float up_ratio = (float)up / (float)osdmap.get_num_osds();
float in_ratio = (float)in / (float)osdmap.get_num_osds();
two instances of:
double halflife = (double)g_conf->mon_osd_laggy_halflife; double decay_k = ::log(.5) / halflife;
It might be good to review Coverity and maybe increase the priority of such warnings.
#3 Updated by Sage Weil over 8 years ago
- Priority changed from High to Urgent
#4 Updated by Sage Weil over 8 years ago
- Assignee set to Sage Weil
#5 Updated by Sage Weil over 8 years ago
- Status changed from New to Resolved