Bug #56282
closedBug #53741: crash just after MDS become active
crash: void Locker::file_recover(ScatterLock*): assert(lock->get_state() == LOCK_PRE_SCAN)
0%
856add216d1d5c19b711e57e00a3e46cd2607a6c0531c2253972b4511ad8f43f
Description
Assert condition: lock->get_state() == LOCK_PRE_SCAN
Assert function: void Locker::file_recover(ScatterLock*)
Sanitized backtrace:
pthread_kill() raise() Locker::file_recover(ScatterLock*) MDCache::start_files_to_recover() MDSRank::recovery_done(int) MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&) MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&) MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&) MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&) DispatchQueue::entry() DispatchQueue::DispatchThread::entry()
Crash dump sample:
{ "assert_condition": "lock->get_state() == LOCK_PRE_SCAN", "assert_file": "mds/Locker.cc", "assert_func": "void Locker::file_recover(ScatterLock*)", "assert_line": 5685, "assert_msg": "mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7f36927fc640 time 2022-05-03T17:08:12.576384-0400\nmds/Locker.cc: 5685: FAILED ceph_assert(lock->get_state() == LOCK_PRE_SCAN)", "assert_thread_name": "ms_dispatch", "backtrace": [ "/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f36a4943520]", "pthread_kill()", "raise()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x182) [0x7f36a50fa3c3]", "/usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2(+0x257525) [0x7f36a50fa525]", "(Locker::file_recover(ScatterLock*)+0x1d8) [0x55de4b048048]", "(MDCache::start_files_to_recover()+0xe4) [0x55de4af4ee24]", "(MDSRank::recovery_done(int)+0x168) [0x55de4ae6a478]", "(MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x202a) [0x55de4ae7459a]", "(MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xc54) [0x55de4ae48544]", "(MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x311) [0x55de4ae4b541]", "(MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x152) [0x55de4ae4bb22]", "(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x450) [0x7f36a5345fe0]", "(DispatchQueue::entry()+0x5ff) [0x7f36a53433cf]", "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f36a5406361]", "/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f36a4995b43]", "/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f36a4a27a00]" ], "ceph_version": "17.1.0", "crash_id": "2022-05-03T21:08:12.628409Z_044282a2-52d2-4b76-a5ca-5115490464be", "entity_name": "mds.68fbb5af681d621f13431b4a83c75ba54371499b", "os_id": "22.04", "os_name": "Ubuntu 22.04 LTS", "os_version": "22.04 (Jammy Jellyfish)", "os_version_id": "22.04", "process_name": "ceph-mds", "stack_sig": "856add216d1d5c19b711e57e00a3e46cd2607a6c0531c2253972b4511ad8f43f", "timestamp": "2022-05-03T21:08:12.628409Z", "utsname_machine": "x86_64", "utsname_release": "5.15.0-25-generic", "utsname_sysname": "Linux", "utsname_version": "#25-Ubuntu SMP Wed Mar 30 15:54:22 UTC 2022" }
Updated by Telemetry Bot almost 2 years ago
Updated by Venky Shankar almost 2 years ago
- Category set to Correctness/Safety
- Assignee set to Xiubo Li
- Target version set to v18.0.0
- Backport set to quincy, pacific
- Component(FS) MDS added
- Labels (FS) crash added
Xiubo, please take a look.
Updated by Xiubo Li almost 2 years ago
Venky Shankar wrote:
Xiubo, please take a look.
Sure.
Updated by Xiubo Li almost 2 years ago
- Status changed from Triaged to In Progress
Updated by Xiubo Li almost 2 years ago
- Status changed from In Progress to Duplicate
- Parent task set to #53741
This is a known bug and have been fixed in upstream. And the backport PR is still under reviewing https://tracker.ceph.com/issues/56015.
Updated by Yaarit Hatuka 12 months ago
Since this issue is marked as "Duplicate" it needs to specify what issue it duplicates in the "Related Issues" field.
Tracker throws this error when trying to populate the "Related Issue" field:
An issue cannot be linked to one of its subtasks
since the Parent Task here is set to https://tracker.ceph.com/issues/53741.