Project

General

Profile

Actions

Bug #23386

closed

crush device class: Monitor Crash when moving Bucket into Default root

Added by Warren Jeffs about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When moving prestaged hosts with disks that out side of a root moving them into the root, causes the monitor to crash.

I have tried moving an empty rack into a building under the root and same issues occur, But I can move hosts between racks outside of the root. I have not tried to move stuff inside the root as this is a production cluster currently.

I did raise this via the mailing list but had no replies: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025537.html

Mon crash dump of when moving a rack with a host into the correct building under the default root (names and IPs have been shortened):

 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
 1: (()+0x8f59b1) [0x55f6c06079b1]
 2: (()+0xf5e0) [0x7f51c001c5e0]
 3: (CrushWrapper::device_class_clone(int, int, std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int,
 int, std::less<int>, std::allocator<std::pair<int const, int> > > > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int*, std::map<int, std::map<int, std::vector<int, std::allocator<i
nt> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, std::vector<int, std::allocator<int>
>, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > > > > >*)+0xa87) [0x55f6c057fb27]
 4: (CrushWrapper::device_class_clone(int, int, std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int,
 int, std::less<int>, std::allocator<std::pair<int const, int> > > > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int*, std::map<int, std::map<int, std::vector<int, std::allocator<i
nt> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, std::vector<int, std::allocator<int>
>, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > > > > >*)+0x305) [0x55f6c057f3a5]
 5: (CrushWrapper::device_class_clone(int, int, std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int,
 int, std::less<int>, std::allocator<std::pair<int const, int> > > > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int*, std::map<int, std::map<int, std::vector<int, std::allocator<i
nt> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, std::vector<int, std::allocator<int>
>, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > > > > >*)+0x305) [0x55f6c057f3a5]
 6: (CrushWrapper::populate_classes(std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, int, std::l
ess<int>, std::allocator<std::pair<int const, int> > > > > > const&)+0x1cf) [0x55f6c058012f]
 7: (CrushWrapper::rebuild_roots_with_classes()+0xfe) [0x55f6c05802de]
 8: (CrushWrapper::insert_item(CephContext*, int, float, std::string, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&)+0x7af) [0x
55f6c058203f]
 9: (CrushWrapper::move_bucket(CephContext*, int, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&)+0xc1) [0x55f6c0582b41]
 10: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::ve
ctor<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vect
or<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > >&)+0x4eee) [0x55f6c024a72e]
 11: (OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x647) [0x55f6c0265807]
 12: (OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x39e) [0x55f6c0265f6e]
 13: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xaf8) [0x55f6c01f26a8]
 14: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1d3e) [0x55f6c00cd75e]
 15: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x919) [0x55f6c00d3009]
 16: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55f6c00d428b]
 17: (Monitor::handle_forward(boost::intrusive_ptr<MonOpRequest>)+0xa8d) [0x55f6c00d5b9d]
 18: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xdbd) [0x55f6c00d34ad]
 19: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55f6c00d428b]
 20: (Monitor::ms_dispatch(Message*)+0x23) [0x55f6c01003f3]
 21: (DispatchQueue::entry()+0x792) [0x55f6c05b2d92]
 22: (DispatchQueue::DispatchThread::entry()+0xd) [0x55f6c03aa7fd]
 23: (()+0x7e25) [0x7f51c0014e25]
 24: (clone()+0x6d) [0x7f51bd18c34d]

When I was doing this, The cluster status was Error, due to having backfill_full disks, I have got the cluster back to HEALTH_OK and the problem still persists.

I have tried running this from, 1 admin/mgr node, and 2 different monitors. If I leave the command going it will slowly take down all the monitors.

This error can be reproduced every time.


Related issues 3 (0 open3 closed)

Has duplicate Ceph - Bug #23836: Moving rack bucket to default root is not possibleDuplicate04/24/2018

Actions
Copied to RADOS - Backport #24258: luminous: crush device class: Monitor Crash when moving Bucket into Default rootResolvedPrashant DActions
Copied to RADOS - Backport #24259: mimic: crush device class: Monitor Crash when moving Bucket into Default rootResolvedKefu ChaiActions
Actions

Also available in: Atom PDF