Project

General

Profile

Subtask #2615

Updated by Joao Eduardo Luis almost 12 years ago


 MDSMap infos, dumped on MDSMap::get_health() just before the assert is triggered: 

 <pre> 
 epoch          51 
 flags     0 
 created 2012-06-14 08:42:54.627948 
 modified          2012-06-19 16:22:40.450568 
 tableserver       0 
 root      0 
 session_timeout 60 
 session_autoclose         300 
 last_failure      0 
 last_failure_osd_epoch    16 
 compat    compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object} 
 max_mds 3 
 in        0,1,2 
 up        {0=4803,1=4397,2=4701} 
 failed 
 stopped 
 data_pools        [0] 
 metadata_pool     1 
 mds_info.size() 2 
 5408:     127.0.0.1:6800/4834 'a' mds.-1.0 up:standby seq 7335 
 5414:     127.0.0.1:6801/5009 'b' mds.-1.0 up:standby seq 158 
 </pre> 

 The assert: 

 <pre> 
 mds/MDSMap.cc: In function 'void MDSMap::get_health(std::list<std::pair<health_status_t, std::basic_string<char> > >&, std::list<std::pair<health_status_t, std::basic_string<char> > >*) const' thread 7f9271482700 time 2012-06-19 09:09:56.399299 
 mds/MDSMap.cc: 254: FAILED assert(m != m_end) 
  ceph version c36f301faf59ce560059a8039eaad58f083f53e (commit:8c36f301faf59ce560059a8039eaad58f083f53e) 
  1: (MDSMap::get_health(std::list<std::pair<health_status_t, std::string>, std::allocator<std::pair<health_status_t, std::string> > >&, std::list<std::pair<health_status_t, std::string>, std::allocator<std::pair<health_status_t, std::string> > >*) const+0x141f) [0x5912bf] 
  2: (Monitor::get_health(std::string&, ceph::buffer::list*)+0x76) [0x4818b6] 
  3: (Monitor::handle_command(MMonCommand*)+0x936) [0x4831d6] 
  4: (Monitor::_ms_dispatch(Message*)+0x106b) [0x49156b] 
  5: (Monitor::ms_dispatch(Message*)+0x32) [0x4a03c2] 
  6: (SimpleMessenger::dispatch_entry()+0x863) [0x5f6dc3] 
  7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5c900d] 
  8: (()+0x7e9a) [0x7f9276558e9a] 
  9: (clone()+0x6d) [0x7f9274f7e4bd] 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 
 2012-06-19 09:09:56.400566 7f9271482700 -1 mds/MDSMap.cc: In function 'void MDSMap::get_health(std::list<std::pair<health_status_t, std::basic_string<char> > >&, std::list<std::pair<health_status_t, std::basic_string<char> > >*) const' thread 7f9271482700 time 2012-06-19 09:09:56.399299 
 mds/MDSMap.cc: 254: FAILED assert(m != m_end) 
 </pre> 

 So, basically, the problem appears to be that the 'mds_info' map contains two MDSs, with gids 5408 and 5414, but the 'up' map knows of three MDSs and none of the gids in 'up' match those in 'mds_info'. 

Back