Bug #42289: mds: rejoin_gather_finish() core - CephFS - Ceph

Actions

Copy link

Bug #42289

open

mds: rejoin_gather_finish() core

Added by guodong xiao over 4 years ago. Updated over 3 years ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

rejoin_ack_gather is empty, when rejoin_gather_finish() running, assert(rejoin_ack_gather.count(mds->get_nodeid())) cause a core.

Actions

Copy link

Updated by guodong xiao over 4 years ago

Recently we met the core during switching/restarting mds frequently, I found an osd fetch cost 5 minutes, rejoin_ack_gather may be cleared exceptionally before rejoin_gather_finish() runnig.

Actions

Copy link

Updated by Patrick Donnelly over 4 years ago

Status changed from New to Need More Info

Can you share a coredump or backtrace?

Logs would be helpful too.

Actions

Copy link

Updated by Zheng Yan over 4 years ago

Assignee set to Zheng Yan

Actions

Copy link

Updated by guodong xiao over 4 years ago

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x5646b7223935]
2: (MDCache::rejoin_gather_finish()+0x2b7) [0x5646b6e3a8e7]
3: (FunctionContext::finish(int)+0x2a) [0x5646b6cc65aa]
4: (Context::complete(int)+0x9) [0x5646b6c87ec9]
5: (MDSInternalContextBase::complete(int)+0x1d3) [0x5646b6fe7c33]
6: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::sub_finish(MDSInternalContextBase*, int)+0x2df) [0x5646b6cd16cf]
7: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::finish(int)+0x12) [0x5646b6cd17c2]
8: (C_GatherBase<MDSInternalContextBase, MDSInternalContextGather>::C_GatherSub::complete(int)+0x9) [0x5646b6cc1dd9]
9: (MDSRank::_advance_queues()+0x502) [0x5646b6ca9b92]
10: (MDSRank::ProgressThread::entry()+0x4a) [0x5646b6ca9f9a]
11: (()+0x7e25) [0x7f93f9db8e25]
12: (clone()+0x6d) [0x7f93f7ef934d]

Actions

Copy link

Updated by Patrick Donnelly over 3 years ago

Assignee deleted (~~Zheng Yan~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #42289

mds: rejoin_gather_finish() core

Updated by guodong xiao over 4 years ago

Updated by Patrick Donnelly over 4 years ago

Updated by Zheng Yan over 4 years ago

Updated by guodong xiao over 4 years ago

Updated by Patrick Donnelly over 3 years ago