Bug #42289
openmds: rejoin_gather_finish() core
0%
Description
rejoin_ack_gather is empty, when rejoin_gather_finish() running, assert(rejoin_ack_gather.count(mds->get_nodeid())) cause a core.
Updated by guodong xiao over 4 years ago
Recently we met the core during switching/restarting mds frequently, I found an osd fetch cost 5 minutes, rejoin_ack_gather may be cleared exceptionally before rejoin_gather_finish() runnig.
Updated by Patrick Donnelly over 4 years ago
- Status changed from New to Need More Info
Can you share a coredump or backtrace?
Logs would be helpful too.
Updated by guodong xiao over 4 years ago
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x5646b7223935]
2: (MDCache::rejoin_gather_finish()+0x2b7) [0x5646b6e3a8e7]
3: (FunctionContext::finish(int)+0x2a) [0x5646b6cc65aa]
4: (Context::complete(int)+0x9) [0x5646b6c87ec9]
5: (MDSInternalContextBase::complete(int)+0x1d3) [0x5646b6fe7c33]
6: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::sub_finish(MDSInternalContextBase*, int)+0x2df) [0x5646b6cd16cf]
7: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::finish(int)+0x12) [0x5646b6cd17c2]
8: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::complete(int)+0x9) [0x5646b6cc1dd9]
9: (MDSRank::_advance_queues()+0x502) [0x5646b6ca9b92]
10: (MDSRank::ProgressThread::entry()+0x4a) [0x5646b6ca9f9a]
11: (()+0x7e25) [0x7f93f9db8e25]
12: (clone()+0x6d) [0x7f93f7ef934d]