Project

General

Profile

Bug #42289

mds: rejoin_gather_finish() core

Added by guodong xiao 3 months ago. Updated 3 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature:

Description

rejoin_ack_gather is empty, when rejoin_gather_finish() running, assert(rejoin_ack_gather.count(mds->get_nodeid())) cause a core.

History

#1 Updated by guodong xiao 3 months ago

Recently we met the core during switching/restarting mds frequently, I found an osd fetch cost 5 minutes, rejoin_ack_gather may be cleared exceptionally before rejoin_gather_finish() runnig.

#2 Updated by Patrick Donnelly 3 months ago

  • Status changed from New to Need More Info

Can you share a coredump or backtrace?

Logs would be helpful too.

#3 Updated by Zheng Yan 3 months ago

  • Assignee set to Zheng Yan

#4 Updated by guodong xiao 3 months ago

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x5646b7223935]
2: (MDCache::rejoin_gather_finish()+0x2b7) [0x5646b6e3a8e7]
3: (FunctionContext::finish(int)+0x2a) [0x5646b6cc65aa]
4: (Context::complete(int)+0x9) [0x5646b6c87ec9]
5: (MDSInternalContextBase::complete(int)+0x1d3) [0x5646b6fe7c33]
6: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::sub_finish(MDSInternalContextBase*, int)+0x2df) [0x5646b6cd16cf]
7: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::finish(int)+0x12) [0x5646b6cd17c2]
8: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::complete(int)+0x9) [0x5646b6cc1dd9]
9: (MDSRank::_advance_queues()+0x502) [0x5646b6ca9b92]
10: (MDSRank::ProgressThread::entry()+0x4a) [0x5646b6ca9f9a]
11: (()+0x7e25) [0x7f93f9db8e25]
12: (clone()+0x6d) [0x7f93f7ef934d]

Also available in: Atom PDF