Project

General

Profile

Actions

Bug #56282

closed

Bug #53741: crash just after MDS become active

crash: void Locker::file_recover(ScatterLock*): assert(lock->get_state() == LOCK_PRE_SCAN)

Added by Telemetry Bot almost 2 years ago. Updated 12 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Telemetry
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):

856add216d1d5c19b711e57e00a3e46cd2607a6c0531c2253972b4511ad8f43f


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=9e99e620a470c067176ebf0e315183a95d9f2bbec28c29b8192bec180cb56890

Assert condition: lock->get_state() == LOCK_PRE_SCAN
Assert function: void Locker::file_recover(ScatterLock*)

Sanitized backtrace:

    pthread_kill()
    raise()
    Locker::file_recover(ScatterLock*)
    MDCache::start_files_to_recover()
    MDSRank::recovery_done(int)
    MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)
    MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)
    MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)
    MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)
    DispatchQueue::entry()
    DispatchQueue::DispatchThread::entry()

Crash dump sample:
{
    "assert_condition": "lock->get_state() == LOCK_PRE_SCAN",
    "assert_file": "mds/Locker.cc",
    "assert_func": "void Locker::file_recover(ScatterLock*)",
    "assert_line": 5685,
    "assert_msg": "mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7f36927fc640 time 2022-05-03T17:08:12.576384-0400\nmds/Locker.cc: 5685: FAILED ceph_assert(lock->get_state() == LOCK_PRE_SCAN)",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f36a4943520]",
        "pthread_kill()",
        "raise()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x182) [0x7f36a50fa3c3]",
        "/usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2(+0x257525) [0x7f36a50fa525]",
        "(Locker::file_recover(ScatterLock*)+0x1d8) [0x55de4b048048]",
        "(MDCache::start_files_to_recover()+0xe4) [0x55de4af4ee24]",
        "(MDSRank::recovery_done(int)+0x168) [0x55de4ae6a478]",
        "(MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x202a) [0x55de4ae7459a]",
        "(MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xc54) [0x55de4ae48544]",
        "(MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x311) [0x55de4ae4b541]",
        "(MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x152) [0x55de4ae4bb22]",
        "(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x450) [0x7f36a5345fe0]",
        "(DispatchQueue::entry()+0x5ff) [0x7f36a53433cf]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f36a5406361]",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f36a4995b43]",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f36a4a27a00]" 
    ],
    "ceph_version": "17.1.0",
    "crash_id": "2022-05-03T21:08:12.628409Z_044282a2-52d2-4b76-a5ca-5115490464be",
    "entity_name": "mds.68fbb5af681d621f13431b4a83c75ba54371499b",
    "os_id": "22.04",
    "os_name": "Ubuntu 22.04 LTS",
    "os_version": "22.04 (Jammy Jellyfish)",
    "os_version_id": "22.04",
    "process_name": "ceph-mds",
    "stack_sig": "856add216d1d5c19b711e57e00a3e46cd2607a6c0531c2253972b4511ad8f43f",
    "timestamp": "2022-05-03T21:08:12.628409Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.0-25-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#25-Ubuntu SMP Wed Mar 30 15:54:22 UTC 2022" 
}

Actions #1

Updated by Telemetry Bot almost 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v17.1.0 added
Actions #2

Updated by Venky Shankar almost 2 years ago

  • Category set to Correctness/Safety
  • Assignee set to Xiubo Li
  • Target version set to v18.0.0
  • Backport set to quincy, pacific
  • Component(FS) MDS added
  • Labels (FS) crash added

Xiubo, please take a look.

Actions #3

Updated by Venky Shankar almost 2 years ago

  • Status changed from New to Triaged
Actions #4

Updated by Xiubo Li almost 2 years ago

Venky Shankar wrote:

Xiubo, please take a look.

Sure.

Actions #5

Updated by Xiubo Li almost 2 years ago

  • Status changed from Triaged to In Progress
Actions #6

Updated by Xiubo Li almost 2 years ago

  • Status changed from In Progress to Duplicate
  • Parent task set to #53741

This is a known bug and have been fixed in upstream. And the backport PR is still under reviewing https://tracker.ceph.com/issues/56015.

Actions #7

Updated by Yaarit Hatuka 12 months ago

Since this issue is marked as "Duplicate" it needs to specify what issue it duplicates in the "Related Issues" field.

Tracker throws this error when trying to populate the "Related Issue" field:

An issue cannot be linked to one of its subtasks

since the Parent Task here is set to https://tracker.ceph.com/issues/53741.

Actions

Also available in: Atom PDF