Project

General

Profile

Bug #56282

Bug #53741: crash just after MDS become active

crash: void Locker::file_recover(ScatterLock*): assert(lock->get_state() == LOCK_PRE_SCAN)

Added by Telemetry Bot 5 months ago. Updated 5 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Telemetry
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):

856add216d1d5c19b711e57e00a3e46cd2607a6c0531c2253972b4511ad8f43f


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9e99e620a470c067176ebf0e315183a95d9f2bbec28c29b8192bec180cb56890

Assert condition: lock->get_state() == LOCK_PRE_SCAN
Assert function: void Locker::file_recover(ScatterLock*)

Sanitized backtrace:

    pthread_kill()
    raise()
    Locker::file_recover(ScatterLock*)
    MDCache::start_files_to_recover()
    MDSRank::recovery_done(int)
    MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)
    MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)
    MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)
    MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)
    DispatchQueue::entry()
    DispatchQueue::DispatchThread::entry()

Crash dump sample:
{
    "assert_condition": "lock->get_state() == LOCK_PRE_SCAN",
    "assert_file": "mds/Locker.cc",
    "assert_func": "void Locker::file_recover(ScatterLock*)",
    "assert_line": 5685,
    "assert_msg": "mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7f36927fc640 time 2022-05-03T17:08:12.576384-0400\nmds/Locker.cc: 5685: FAILED ceph_assert(lock->get_state() == LOCK_PRE_SCAN)",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f36a4943520]",
        "pthread_kill()",
        "raise()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x182) [0x7f36a50fa3c3]",
        "/usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2(+0x257525) [0x7f36a50fa525]",
        "(Locker::file_recover(ScatterLock*)+0x1d8) [0x55de4b048048]",
        "(MDCache::start_files_to_recover()+0xe4) [0x55de4af4ee24]",
        "(MDSRank::recovery_done(int)+0x168) [0x55de4ae6a478]",
        "(MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x202a) [0x55de4ae7459a]",
        "(MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xc54) [0x55de4ae48544]",
        "(MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x311) [0x55de4ae4b541]",
        "(MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x152) [0x55de4ae4bb22]",
        "(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x450) [0x7f36a5345fe0]",
        "(DispatchQueue::entry()+0x5ff) [0x7f36a53433cf]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f36a5406361]",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f36a4995b43]",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f36a4a27a00]" 
    ],
    "ceph_version": "17.1.0",
    "crash_id": "2022-05-03T21:08:12.628409Z_044282a2-52d2-4b76-a5ca-5115490464be",
    "entity_name": "mds.68fbb5af681d621f13431b4a83c75ba54371499b",
    "os_id": "22.04",
    "os_name": "Ubuntu 22.04 LTS",
    "os_version": "22.04 (Jammy Jellyfish)",
    "os_version_id": "22.04",
    "process_name": "ceph-mds",
    "stack_sig": "856add216d1d5c19b711e57e00a3e46cd2607a6c0531c2253972b4511ad8f43f",
    "timestamp": "2022-05-03T21:08:12.628409Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.0-25-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#25-Ubuntu SMP Wed Mar 30 15:54:22 UTC 2022" 
}

History

#1 Updated by Telemetry Bot 5 months ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v17.1.0 added

#2 Updated by Venky Shankar 5 months ago

  • Category set to Correctness/Safety
  • Assignee set to Xiubo Li
  • Target version set to v18.0.0
  • Backport set to quincy, pacific
  • Component(FS) MDS added
  • Labels (FS) crash added

Xiubo, please take a look.

#3 Updated by Venky Shankar 5 months ago

  • Status changed from New to Triaged

#4 Updated by Xiubo Li 5 months ago

Venky Shankar wrote:

Xiubo, please take a look.

Sure.

#5 Updated by Xiubo Li 5 months ago

  • Status changed from Triaged to In Progress

#6 Updated by Xiubo Li 5 months ago

  • Status changed from In Progress to Duplicate
  • Parent task set to #53741

This is a known bug and have been fixed in upstream. And the backport PR is still under reviewing https://tracker.ceph.com/issues/56015.

Also available in: Atom PDF