Project

General

Profile

Bug #53741

crash just after MDS become active

Added by 玮文 胡 about 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

100%

Source:
Tags:
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

FAILED ceph_assert(lock->get_state() == LOCK_PRE_SCAN) at mds/Locker.cc:5682

   -21> 2021-12-28T16:16:00.058+0000 7f2bcdc30700  1 mds.cephfs.gpu024.rpfbnh Updating MDS map to version 83164 from mon.1
   -20> 2021-12-28T16:16:00.058+0000 7f2bcdc30700  1 mds.1.83152 handle_mds_map i am now mds.1.83152
   -19> 2021-12-28T16:16:00.058+0000 7f2bcdc30700  1 mds.1.83152 handle_mds_map state change up:rejoin --> up:active
   -18> 2021-12-28T16:16:00.058+0000 7f2bcdc30700  1 mds.1.83152 recovery_done -- successful recovery!
   -17> 2021-12-28T16:16:00.058+0000 7f2bd0c36700 10 monclient: handle_auth_request added challenge on 0x564e62589400
   -16> 2021-12-28T16:16:00.058+0000 7f2bd0c36700 10 monclient: handle_auth_request added challenge on 0x564e5aa34800
   -15> 2021-12-28T16:16:00.062+0000 7f2bd0435700  5 mds.beacon.cephfs.gpu024.rpfbnh received beacon reply up:active seq 3491 rtt 0.644012
   -14> 2021-12-28T16:16:00.158+0000 7f2bccc2e700 10 monclient: tick
   -13> 2021-12-28T16:16:00.158+0000 7f2bccc2e700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-12-28T16:15:30.162996+0000)
   -12> 2021-12-28T16:16:00.698+0000 7f2bc6c22700  5 mds.1.log _submit_thread 3269020692085~6713 : EOpen [metablob 0x10000000001, 7 dirs], 1 open files
   -11> 2021-12-28T16:16:01.158+0000 7f2bccc2e700 10 monclient: tick
   -10> 2021-12-28T16:16:01.158+0000 7f2bccc2e700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-12-28T16:15:31.163197+0000)
    -9> 2021-12-28T16:16:01.166+0000 7f2bc6c22700  5 mds.1.log _submit_thread 3269020698818~7668 : EOpen [metablob 0x10000000001, 8 dirs], 1 open files
    -8> 2021-12-28T16:16:02.158+0000 7f2bccc2e700 10 monclient: tick
    -7> 2021-12-28T16:16:02.158+0000 7f2bccc2e700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-12-28T16:15:32.163403+0000)
    -6> 2021-12-28T16:16:02.630+0000 7f2bcdc30700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.7/rpm/el8/BUILD/ceph-16.2.7/src/mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7f2bcdc30700 time 2021-12-28T16:16:02.632125+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.7/rpm/el8/BUILD/ceph-16.2.7/src/mds/Locker.cc: 5682: FAILED ceph_assert(lock->get_state() == LOCK_PRE_SCAN)
 ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f2bd6446b52]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x276d6c) [0x7f2bd6446d6c]
 3: (Locker::file_recover(ScatterLock*)+0x1bf) [0x564e571a4ecf]
 4: (MDCache::start_files_to_recover()+0x10b) [0x564e570a2c3b]
 5: (MDSRank::recovery_done(int)+0x6f) [0x564e56fca61f]
 6: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x207d) [0x564e56fdbb2d]
 7: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0xeee) [0x564e56faf27e]
 8: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0xcd) [0x564e56fb2a3d]
 9: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xc3) [0x564e56fb3593]
 10: (DispatchQueue::entry()+0x126a) [0x7f2bd668aaba]
 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f2bd673c5d1]
 12: /lib64/libpthread.so.0(+0x817a) [0x7f2bd542a17a]
 13: clone()

Subtasks

Bug #56282: crash: void Locker::file_recover(ScatterLock*): assert(lock->get_state() == LOCK_PRE_SCAN)DuplicateXiubo Li


Related issues

Copied to CephFS - Backport #56015: quincy: crash just after MDS become active Resolved
Copied to CephFS - Backport #56016: pacific: crash just after MDS become active Resolved

History

#1 Updated by Venky Shankar about 2 years ago

  • Category set to Correctness/Safety
  • Status changed from New to Triaged
  • Assignee set to Xiubo Li
  • Target version set to v17.0.0
  • Backport set to pacific,octopus

Xiubo, please take a look.

#2 Updated by Xiubo Li about 2 years ago

Venky Shankar wrote:

Xiubo, please take a look.

Sure.

#3 Updated by Xiubo Li about 2 years ago

  • Status changed from Triaged to In Progress

#4 Updated by Xiubo Li about 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 44655

#5 Updated by Venky Shankar about 2 years ago

  • Backport changed from pacific,octopus to pacific,quincy

#6 Updated by Venky Shankar almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56015: quincy: crash just after MDS become active added

#8 Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56016: pacific: crash just after MDS become active added

#9 Updated by Xiubo Li over 1 year ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF