Project

General

Profile

Actions

Bug #36361

open

upmap balancer: crash in calc_pg_upmaps if there's a destroyed osd in the tree

Added by Dan van der Ster over 5 years ago. Updated almost 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
balancer module
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If there is a "destroyed" osd in the crush tree, calc_pg_upmaps will crash:

2018-10-09 15:47:17.800180 7f0637355700  4 mgr[balancer] pools ['cinder-critical', 'volumes', 'test', 'images']
2018-10-09 15:47:18.050334 7f0637355700 -1 *** Caught signal (Segmentation fault) ** in thread 7f0637355700 thread_name:balancer
 ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
 1: (()+0x3f40c1) [0x5562647db0c1]
 2: (()+0xf6d0) [0x7f064fb116d0]
 3: (std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)+0xf) [0x7f064f43ee5f]
 4: (CrushWrapper::_get_take_weight_osd_map(int, std::map<int, float, std::less<int>, std::allocator<std::pair<int const, float> > >*) const+0x20f) [0x556264ab0e6f]
 5: (CrushWrapper::get_rule_weight_osd_map(unsigned int, std::map<int, float, std::less<int>, std::allocator<std::pair<int const, float> > >*) const+0x180) [0x556264ab1150]
 6: (OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set<long, std::less<long>, std::allocator<long> > const&, OSDMap::Incremental*)+0x4da) [0x55626490b18a]
 7: (()+0x2e7974) [0x5562646ce974]
 8: (PyEval_EvalFrameEx()+0x6df0) [0x7f0651a77cf0]
 9: (PyEval_EvalCodeEx()+0x7ed) [0x7f0651a7a03d]
 10: (PyEval_EvalFrameEx()+0x663c) [0x7f0651a7753c]
 11: (PyEval_EvalFrameEx()+0x67bd) [0x7f0651a776bd]
 12: (PyEval_EvalFrameEx()+0x67bd) [0x7f0651a776bd]
 13: (PyEval_EvalCodeEx()+0x7ed) [0x7f0651a7a03d]
 14: (()+0x70978) [0x7f0651a03978]
 15: (PyObject_Call()+0x43) [0x7f06519dea63]

Here is the destroyed osd:

1455   hdd    5.45999                 osd.1455             destroyed        0 1.00000

Actions #1

Updated by Dan van der Ster over 5 years ago

May in fact be related to or caused by http://tracker.ceph.com/issues/36378

Actions #2

Updated by Sebastian Wagner almost 5 years ago

  • Category changed from ceph-mgr to balancer module
Actions #3

Updated by Konstantin Shalygin almost 3 years ago

  • Backport deleted (luminous,mimic)
Actions #4

Updated by Neha Ojha almost 3 years ago

  • Status changed from New to Need More Info

Dan, have you seen this in newer versions?

Actions

Also available in: Atom PDF