Project

General

Profile

Bug #23386

Updated by John Spray almost 6 years ago

When moving prestaged hosts with disks that out side of a root moving them into the root, causes the monitor to crash. 

 I have tried moving an empty emptry rack into a building under the root and same issues occur, But I can move hosts between racks outside of the root. I have not tried to move stuff inside the root as this is a production cluster currently. 

 I did raise this via the mailing list but had no replies: replys: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025537.html 

 Mon crash dump of when moving a rack with a host into the correct building under the default root (names and IPs have been shortened): 
 <pre> 
  ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) 
  1: (()+0x8f59b1) [0x55f6c06079b1] 
  2: (()+0xf5e0) [0x7f51c001c5e0] 
  3: (CrushWrapper::device_class_clone(int, int, std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, 
  int, std::less<int>, std::allocator<std::pair<int const, int> > > > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int*, std::map<int, std::map<int, std::vector<int, std::allocator<i 
 nt> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, std::vector<int, std::allocator<int> 
 >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > > > > >*)+0xa87) [0x55f6c057fb27] 
  4: (CrushWrapper::device_class_clone(int, int, std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, 
  int, std::less<int>, std::allocator<std::pair<int const, int> > > > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int*, std::map<int, std::map<int, std::vector<int, std::allocator<i 
 nt> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, std::vector<int, std::allocator<int> 
 >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > > > > >*)+0x305) [0x55f6c057f3a5] 
  5: (CrushWrapper::device_class_clone(int, int, std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, 
  int, std::less<int>, std::allocator<std::pair<int const, int> > > > > > const&, std::set<int, std::less<int>, std::allocator<int> > const&, int*, std::map<int, std::map<int, std::vector<int, std::allocator<i 
 nt> >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, std::vector<int, std::allocator<int> 
 >, std::less<int>, std::allocator<std::pair<int const, std::vector<int, std::allocator<int> > > > > > > >*)+0x305) [0x55f6c057f3a5] 
  6: (CrushWrapper::populate_classes(std::map<int, std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >, std::less<int>, std::allocator<std::pair<int const, std::map<int, int, std::l 
 ess<int>, std::allocator<std::pair<int const, int> > > > > > const&)+0x1cf) [0x55f6c058012f] 
  7: (CrushWrapper::rebuild_roots_with_classes()+0xfe) [0x55f6c05802de] 
  8: (CrushWrapper::insert_item(CephContext*, int, float, std::string, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&)+0x7af) [0x 
 55f6c058203f] 
  9: (CrushWrapper::move_bucket(CephContext*, int, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&)+0xc1) [0x55f6c0582b41] 
  10: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::ve 
 ctor<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vect 
 or<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > >&)+0x4eee) [0x55f6c024a72e] 
  11: (OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x647) [0x55f6c0265807] 
  12: (OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x39e) [0x55f6c0265f6e] 
  13: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xaf8) [0x55f6c01f26a8] 
  14: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1d3e) [0x55f6c00cd75e] 
  15: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x919) [0x55f6c00d3009] 
  16: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55f6c00d428b] 
  17: (Monitor::handle_forward(boost::intrusive_ptr<MonOpRequest>)+0xa8d) [0x55f6c00d5b9d] 
  18: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xdbd) [0x55f6c00d34ad] 
  19: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55f6c00d428b] 
  20: (Monitor::ms_dispatch(Message*)+0x23) [0x55f6c01003f3] 
  21: (DispatchQueue::entry()+0x792) [0x55f6c05b2d92] 
  22: (DispatchQueue::DispatchThread::entry()+0xd) [0x55f6c03aa7fd] 
  23: (()+0x7e25) [0x7f51c0014e25] 
  24: (clone()+0x6d) [0x7f51bd18c34d] 
 </pre> 


 https://pastebin.com/mHfkEp3X 

 When I was doing this, The cluster status was Error, due to having backfill_full disks, I have got the cluster back to HEALTH_OK and the problem still persists. 

 I have tried running this from, 1 admin/mgr node, and 2 different monitors. If I leave the command going it will slowly take down all the monitors. 

 This error can be reproduced every time.

Back