Project

General

Profile

Actions

Subtask #2615

closed

Feature #2611: mon: Single-Paxos

mon: Single-Paxos: MDSMap::get_health() asserting

Added by Joao Eduardo Luis almost 12 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

MDSMap infos, dumped on MDSMap::get_health() just before the assert is triggered:

epoch        51
flags   0
created 2012-06-14 08:42:54.627948
modified        2012-06-19 16:22:40.450568
tableserver     0
root    0
session_timeout 60
session_autoclose       300
last_failure    0
last_failure_osd_epoch  16
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object}
max_mds 3
in      0,1,2
up      {0=4803,1=4397,2=4701}
failed
stopped
data_pools      [0]
metadata_pool   1
mds_info.size() 2
5408:   127.0.0.1:6800/4834 'a' mds.-1.0 up:standby seq 7335
5414:   127.0.0.1:6801/5009 'b' mds.-1.0 up:standby seq 158

The assert:

mds/MDSMap.cc: In function 'void MDSMap::get_health(std::list<std::pair<health_status_t, std::basic_string<char> > >&, std::list<std::pair<health_status_t, std::basic_string<char> > >*) const' thread 7f9271482700 time 2012-06-19 09:09:56.399299
mds/MDSMap.cc: 254: FAILED assert(m != m_end)
 ceph version c36f301faf59ce560059a8039eaad58f083f53e (commit:8c36f301faf59ce560059a8039eaad58f083f53e)
 1: (MDSMap::get_health(std::list<std::pair<health_status_t, std::string>, std::allocator<std::pair<health_status_t, std::string> > >&, std::list<std::pair<health_status_t, std::string>, std::allocator<std::pair<health_status_t, std::string> > >*) const+0x141f) [0x5912bf]
 2: (Monitor::get_health(std::string&, ceph::buffer::list*)+0x76) [0x4818b6]
 3: (Monitor::handle_command(MMonCommand*)+0x936) [0x4831d6]
 4: (Monitor::_ms_dispatch(Message*)+0x106b) [0x49156b]
 5: (Monitor::ms_dispatch(Message*)+0x32) [0x4a03c2]
 6: (SimpleMessenger::dispatch_entry()+0x863) [0x5f6dc3]
 7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5c900d]
 8: (()+0x7e9a) [0x7f9276558e9a]
 9: (clone()+0x6d) [0x7f9274f7e4bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2012-06-19 09:09:56.400566 7f9271482700 -1 mds/MDSMap.cc: In function 'void MDSMap::get_health(std::list<std::pair<health_status_t, std::basic_string<char> > >&, std::list<std::pair<health_status_t, std::basic_string<char> > >*) const' thread 7f9271482700 time 2012-06-19 09:09:56.399299
mds/MDSMap.cc: 254: FAILED assert(m != m_end)

So, basically, the problem appears to be that the 'mds_info' map contains two MDSs, with gids 5408 and 5414, but the 'up' map knows of three MDSs and none of the gids in 'up' match those in 'mds_info'.

Actions #1

Updated by Joao Eduardo Luis almost 12 years ago

  • Description updated (diff)

This issue stopped popping up after we changed the criteria to propose queued proposals and restarted testing with a fresh store.

My suspicion is that we didn't have a valid map by the time the MDSMap::get_health() was issued. If it happens to pop up again, this issue shall be updated.

Actions #2

Updated by Joao Eduardo Luis almost 12 years ago

  • Status changed from In Progress to Closed
Actions

Also available in: Atom PDF