Project

General

Profile

Actions

Bug #58489

closed

mds stuck in 'up:replay' and crashed.

Added by Kotresh Hiremath Ravishankar over 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (user)
Tags:
backport_processed
Backport:
reef,pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The issue is reported by upstream community user.

The cluster had two filesystems and the active mds of both the filesystems were stuck in 'up:replay'.
This was the case for around 2 days. Later, one of the active mds (stuck in up:replay) state crashed
with below stack trace.

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/mds/journal.cc:
In function 'void EMetaBlob::replay(MDSRank*, LogSegment*,
MDPeerUpdate*)' thread 7fccc7153700 time 2023-01-17T10:05:15.420191+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/mds/journal.cc:
1625: FAILED ceph_assert(g_conf()->mds_wipe_sessions)

  ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy
(stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x135) [0x7fccd759943f]
  2: /usr/lib64/ceph/libceph-common.so.2(+0x269605) [0x7fccd7599605]
  3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*)+0x5e5c)
[0x55fb2b98e89c]
  4: (EUpdate::replay(MDSRank*)+0x40) [0x55fb2b98f5a0]
  5: (MDLog::_replay_thread()+0x9b3) [0x55fb2b915443]
  6: (MDLog::ReplayThread::entry()+0x11) [0x55fb2b5d1e31]
  7: /lib64/libpthread.so.0(+0x81ca) [0x7fccd65891ca]
  8: clone()

The upstream communication can be found at https://www.spinics.net/lists/ceph-users/msg75472.html


Files

mds01.ceph04.logaa.bz2 (879 KB) mds01.ceph04.logaa.bz2 Thomas Widhalm, 01/19/2023 01:15 PM
mds01.ceph04.logab.bz2 (756 KB) mds01.ceph04.logab.bz2 Thomas Widhalm, 01/19/2023 01:15 PM
mds01.ceph06.log.bz2 (681 KB) mds01.ceph06.log.bz2 Thomas Widhalm, 01/19/2023 01:15 PM

Related issues 6 (1 open5 closed)

Related to CephFS - Bug #59768: crash: void EMetaBlob::replay(MDSRank*, LogSegment*, MDPeerUpdate*): assert(g_conf()->mds_wipe_sessions)DuplicateNeeraj Pratap Singh

Actions
Related to CephFS - Bug #61009: crash: void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]: assert(p->first <= start)Fix Under ReviewVenky Shankar

Actions
Related to CephFS - Bug #63103: mds: disable delegating inode ranges to clientsRejectedVenky Shankar

Actions
Copied to CephFS - Backport #59006: quincy: mds stuck in 'up:replay' and crashed.ResolvedXiubo LiActions
Copied to CephFS - Backport #59007: pacific: mds stuck in 'up:replay' and crashed.ResolvedXiubo LiActions
Copied to CephFS - Backport #59404: reef: mds stuck in 'up:replay' and crashed.ResolvedXiubo LiActions
Actions

Also available in: Atom PDF