Project

General

Profile

Actions

Bug #51589

closed

mds: crash when journaling during replay

Added by 伟杰 谭 almost 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
backport_processed
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

MDS version: ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0) nautilus (stable)

Using 200 clients, mds crashed after writing for many days.

But I don’t know what caused the mds to crash.

[twj@xxxxxxxxx-MN-001.sn.cn ~]$ sudo ceph fs status
cephfs - 200 clients
======
+------+----------------+------------------------+----------+-------+-------+
| Rank |     State      |          MDS           | Activity |  dns  |  inos |
+------+----------------+------------------------+----------+-------+-------+
|  0   |    resolve     | xxxxxxxxxxMN-002.sn.cn |          |    0  |    3  |
|  1   | resolve(laggy) | xxxxxxxxxxMN-003.sn.cn |          |    0  |    0  |
+------+----------------+------------------------+----------+-------+-------+
+----------------------+----------+-------+-------+
|         Pool         |   type   |  used | avail |
+----------------------+----------+-------+-------+
| cephfs.metadata.pool | metadata | 70.5G |  793G |
|  cephfs.data.pool1   |   data   |  183T | 1115T |
|  cephfs.data.pool2   |   data   |  299T | 1042T |
+----------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0) nautilus (stable)

All mds crashed for this reason:

    -1> 2021-07-08 15:14:13.283 7f3804255700 -1 /builddir/build/BUILD/ceph-14.2.20/src/mds/MDLog.cc: In function 'void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)' thread 7f3804255700 time 2021-07-08 15:14:13.283719
/builddir/build/BUILD/ceph-14.2.20/src/mds/MDLog.cc: 288: FAILED ceph_assert(!segments.empty())

 ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f380d72cfe7]
 2: (()+0x25d1af) [0x7f380d72d1af]
 3: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x599) [0x557471ec5959]
 4: (Server::journal_close_session(Session*, int, Context*)+0x9ed) [0x557471c7e02d]
 5: (Server::kill_session(Session*, Context*)+0x234) [0x557471c81914]
 6: (Server::apply_blacklist(std::set<entity_addr_t, std::less<entity_addr_t>, std::allocator<entity_addr_t> > const&)+0x14d) [0x557471c8449d]
 7: (MDSRank::reconnect_start()+0xcf) [0x557471c49c5f]
 8: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x1c29) [0x557471c57979]
 9: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xa9b) [0x557471c3091b]
 10: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0xed) [0x557471c3216d]
 11: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xc3) [0x557471c32983]
 12: (DispatchQueue::entry()+0x1699) [0x7f380d952b79]
 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f380da008ed]
 14: (()+0x7ea5) [0x7f380b5eeea5]
 15: (clone()+0x6d) [0x7f380a29e96d]

Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #52952: pacific: mds: crash when journaling during replayResolvedActions
Copied to CephFS - Backport #52953: octopus: mds: crash when journaling during replayResolvedActions
Actions

Also available in: Atom PDF