Bug #23386
Updated by John Spray almost 6 years ago
When moving prestaged hosts with disks that out side of a root moving them into the root, causes the monitor to crash. I have tried moving an empty emptry rack into a building under the root and same issues occur, But I can move hosts between racks outside of the root. I have not tried to move stuff inside the root as this is a production cluster currently. I did raise this via the mailing list but had no replies: replys: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025537.html Mon crash dump of when moving a rack with a host into the correct building under the default root (names and IPs have been shortened): <pre> ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) 1: (()+0x8f59b1) [0x55f6c06079b1] 2: (()+0xf5e0) [0x7f51c001c5e0] 3: (CrushWrapper::device_class_clone(int, int, std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > > > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int*, std::map<int, std::map<int, std::vector<int, std::allocator<i nt> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, std::vector<int, std::allocator<int> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > > > > >*)+0xa87) [0x55f6c057fb27] 4: (CrushWrapper::device_class_clone(int, int, std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > > > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int*, std::map<int, std::map<int, std::vector<int, std::allocator<i nt> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, std::vector<int, std::allocator<int> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > > > > >*)+0x305) [0x55f6c057f3a5] 5: (CrushWrapper::device_class_clone(int, int, std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > > > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int*, std::map<int, std::map<int, std::vector<int, std::allocator<i nt> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, std::vector<int, std::allocator<int> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > > > > >*)+0x305) [0x55f6c057f3a5] 6: (CrushWrapper::populate_classes(std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, int, std::l ess<int>, std::allocator<std::pair<int const, int> > > > > > const&)+0x1cf) [0x55f6c058012f] 7: (CrushWrapper::rebuild_roots_with_classes()+0xfe) [0x55f6c05802de] 8: (CrushWrapper::insert_item(CephContext*, int, float, std::string, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&)+0x7af) [0x 55f6c058203f] 9: (CrushWrapper::move_bucket(CephContext*, int, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&)+0xc1) [0x55f6c0582b41] 10: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::ve ctor<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vect or<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > >&)+0x4eee) [0x55f6c024a72e] 11: (OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x647) [0x55f6c0265807] 12: (OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x39e) [0x55f6c0265f6e] 13: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xaf8) [0x55f6c01f26a8] 14: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1d3e) [0x55f6c00cd75e] 15: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x919) [0x55f6c00d3009] 16: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55f6c00d428b] 17: (Monitor::handle_forward(boost::intrusive_ptr<MonOpRequest>)+0xa8d) [0x55f6c00d5b9d] 18: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xdbd) [0x55f6c00d34ad] 19: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55f6c00d428b] 20: (Monitor::ms_dispatch(Message*)+0x23) [0x55f6c01003f3] 21: (DispatchQueue::entry()+0x792) [0x55f6c05b2d92] 22: (DispatchQueue::DispatchThread::entry()+0xd) [0x55f6c03aa7fd] 23: (()+0x7e25) [0x7f51c0014e25] 24: (clone()+0x6d) [0x7f51bd18c34d] </pre> https://pastebin.com/mHfkEp3X When I was doing this, The cluster status was Error, due to having backfill_full disks, I have got the cluster back to HEALTH_OK and the problem still persists. I have tried running this from, 1 admin/mgr node, and 2 different monitors. If I leave the command going it will slowly take down all the monitors. This error can be reproduced every time.