Bug #51589
Updated by Patrick Donnelly almost 3 years ago
MDS version: ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0) nautilus (stable) Using 200 clients, mds crashed after writing for many days. But I don’t know what caused the mds to crash. <pre> [twj@xxxxxxxxx-MN-001.sn.cn ~]$ sudo ceph fs status cephfs - 200 clients ====== +------+----------------+------------------------+----------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+----------------+------------------------+----------+-------+-------+ | 0 | resolve | xxxxxxxxxxMN-002.sn.cn | | 0 | 3 | | 1 | resolve(laggy) | xxxxxxxxxxMN-003.sn.cn | | 0 | 0 | +------+----------------+------------------------+----------+-------+-------+ +----------------------+----------+-------+-------+ | Pool | type | used | avail | +----------------------+----------+-------+-------+ | cephfs.metadata.pool | metadata | 70.5G | 793G | | cephfs.data.pool1 | data | 183T | 1115T | | cephfs.data.pool2 | data | 299T | 1042T | +----------------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ +-------------+ MDS version: ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0) nautilus (stable) </pre> All mds crashed for this reason: <pre> -1> 2021-07-08 15:14:13.283 7f3804255700 -1 /builddir/build/BUILD/ceph-14.2.20/src/mds/MDLog.cc: In function 'void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)' thread 7f3804255700 time 2021-07-08 15:14:13.283719 /builddir/build/BUILD/ceph-14.2.20/src/mds/MDLog.cc: 288: FAILED ceph_assert(!segments.empty()) ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f380d72cfe7] 2: (()+0x25d1af) [0x7f380d72d1af] 3: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x599) [0x557471ec5959] 4: (Server::journal_close_session(Session*, int, Context*)+0x9ed) [0x557471c7e02d] 5: (Server::kill_session(Session*, Context*)+0x234) [0x557471c81914] 6: (Server::apply_blacklist(std::set<entity_addr_t, std::less<entity_addr_t>, std::allocator<entity_addr_t> > const&)+0x14d) [0x557471c8449d] 7: (MDSRank::reconnect_start()+0xcf) [0x557471c49c5f] 8: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x1c29) [0x557471c57979] 9: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xa9b) [0x557471c3091b] 10: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0xed) [0x557471c3216d] 11: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xc3) [0x557471c32983] 12: (DispatchQueue::entry()+0x1699) [0x7f380d952b79] 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f380da008ed] 14: (()+0x7ea5) [0x7f380b5eeea5] 15: (clone()+0x6d) [0x7f380a29e96d] </pre>