Project

General

Profile

Actions

Bug #47833

closed

mds FAILED ceph_assert(sessions != 0) in function 'void SessionMap::hit_session(Session*)'

Added by Dan van der Ster over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
nautilus,octopus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are not able to decrease from max_mds=2 to 1 on our cephfs cluster.

As soon as we decrease max_mds, the mds goes to up:stopping, then we see the migrator start to export a few dirs, and then this assert:

2020-10-12 15:53:18.353 7ffae5ee4700 -1 /builddir/build/BUILD/ceph-14.2.11/src/mds/SessionMap.cc: In function 'void SessionMap::hit_session(Session*)' thread 7ffae5ee4700 time 2020-10-12 15:53:18.349619
/builddir/build/BUILD/ceph-14.2.11/src/mds/SessionMap.cc: 1019: FAILED ceph_assert(sessions != 0)

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7ffaef177025]
 2: (()+0x25c1ed) [0x7ffaef1771ed]
 3: (()+0x3c2bfa) [0x559ca2efebfa]
 4: (Server::reply_client_request(boost::intrusive_ptr<MDRequestImpl>&, boost::intrusive_ptr<MClientReply> const&)+0xb45) [0x559ca2ce3035]
 5: (Server::respond_to_request(boost::intrusive_ptr<MDRequestImpl>&, int)+0x1c9) [0x559ca2ce32e9]
 6: (MDSContext::complete(int)+0x74) [0x559ca2f0a054]
 7: (MDCache::_do_find_ino_peer(MDCache::find_ino_peer_info_t&)+0x516) [0x559ca2d753d6]
 8: (MDCache::handle_find_ino_reply(boost::intrusive_ptr<MMDSFindInoReply const> const&)+0x502) [0x559ca2d810f2]
 9: (MDCache::dispatch(boost::intrusive_ptr<Message const> const&)+0x1a7) [0x559ca2db6b07]
 10: (MDSRank::handle_deferrable_message(boost::intrusive_ptr<Message const> const&)+0x48a) [0x559ca2c9692a]
 11: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x7ea) [0x559ca2c98fda]
 12: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x41) [0x559ca2c99441]
 13: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x108) [0x559ca2c86508]
 14: (DispatchQueue::entry()+0x1699) [0x7ffaef397d69]
 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ffaef4451ed]
 16: (()+0x7ea5) [0x7ffaed02dea5]
 17: (clone()+0x6d) [0x7ffaebcdb8dd]

The standby gets the same assert.

The full mds log is at ceph-post-file: 12f9e692-2727-4311-944f-9aa7b8da4499
And the log of the standby mds is at ceph-post-file: 0037c9fd-de03-4018-8433-455f1c94c456

We set max_mds back to 2 and the crashes stop.

Maybe relevant: on this cluster, we have directories manually pinned to rank 0 and 1.


Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #47935: nautilus: mds FAILED ceph_assert(sessions != 0) in function 'void SessionMap::hit_session(Session*)'ResolvedNathan CutlerActions
Copied to CephFS - Backport #47936: octopus: mds FAILED ceph_assert(sessions != 0) in function 'void SessionMap::hit_session(Session*)'ResolvedNathan CutlerActions
Actions

Also available in: Atom PDF