Bug #54833
opencrash: void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&): assert(lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2)
100%
0253f38e2bd872a2f74cdd3714f166d569c2c8e87301d000df3d58bdfa526c70
4133b924526414f2bdd07b685dca1b5221ec81576d91e562bb6e4df3a2b561c4
Description
Assert condition: lock->get_state() LOCK_LOCK || lock->get_state() LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2
Assert function: void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&)
Sanitized backtrace:
pthread_kill() raise() Locker::handle_file_lock(ScatterLock*, boost::intrusive_ptr<MLock const> const&) Locker::dispatch(boost::intrusive_ptr<Message const> const&) MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool) MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&) MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&) DispatchQueue::entry() DispatchQueue::DispatchThread::entry()
Crash dump sample:
{ "assert_condition": "lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2", "assert_file": "mds/Locker.cc", "assert_func": "void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&)", "assert_line": 5732, "assert_msg": "mds/Locker.cc: In function 'void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&)' thread 7f6645486640 time 2022-02-20T22:30:46.711103-0500\nmds/Locker.cc: 5732: FAILED ceph_assert(lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2)", "assert_thread_name": "ms_dispatch", "backtrace": [ "/lib/x86_64-linux-gnu/libc.so.6(+0x46520) [0x7f664afac520]", "pthread_kill()", "raise()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x172) [0x7f664b736487]", "/usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.2(+0x24b5e6) [0x7f664b7365e6]", "(Locker::handle_file_lock(ScatterLock*, boost::intrusive_ptr<MLock const> const&)+0x18b3) [0x556fb61a5393]", "(Locker::dispatch(boost::intrusive_ptr<Message const> const&)+0x64) [0x556fb61ad254]", "(MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x5b7) [0x556fb5fba477]", "(MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x53) [0x556fb5fbaa83]", "(MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x1bf) [0x556fb5fa643f]", "(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x450) [0x7f664b968420]", "(DispatchQueue::entry()+0x647) [0x7f664b965807]", "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f664ba26561]", "/lib/x86_64-linux-gnu/libc.so.6(+0x98927) [0x7f664affe927]", "clone()" ], "ceph_version": "16.2.7", "crash_id": "2022-02-21T03:30:46.752836Z_d902f215-af3f-4fbb-89d9-7ad51c405b67", "entity_name": "mds.dc0af19b7efe873f7993738fb083813d67ff43e2", "os_id": "22.04", "os_name": "Ubuntu Jammy Jellyfish (development branch)", "os_version": "22.04 (Jammy Jellyfish)", "os_version_id": "22.04", "process_name": "ceph-mds", "stack_sig": "0253f38e2bd872a2f74cdd3714f166d569c2c8e87301d000df3d58bdfa526c70", "timestamp": "2022-02-21T03:30:46.752836Z", "utsname_machine": "x86_64", "utsname_release": "5.15.0-17-generic", "utsname_sysname": "Linux", "utsname_version": "#17-Ubuntu SMP Thu Jan 13 16:27:23 UTC 2022" }
Updated by Telemetry Bot about 2 years ago
Updated by Telemetry Bot almost 2 years ago
- Project changed from RADOS to CephFS
- Crash signature (v1) updated (diff)
Updated by Milind Changire 10 months ago
- Related to Bug #54834: crash: void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&): assert(lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2) added
Updated by Venky Shankar 9 months ago
- Category set to Correctness/Safety
- Assignee set to Xiubo Li
- Target version set to v19.0.0
- Backport set to reef,quincy,pacific
- Component(FS) MDS added
- Labels (FS) crash added
Updated by Xiubo Li 9 months ago
There is no logs and I just went through MDS locker code, it seems buggy here:
4740 void Locker::simple_excl(SimpleLock *lock, bool *need_issue) 4741 { 4742 dout(7) << "simple_excl on " << *lock << " on " << *lock->get_parent() << dendl; 4743 ceph_assert(lock->get_parent()->is_auth()); 4744 ceph_assert(lock->is_stable()); 4745 4746 CInode *in = 0; 4747 if (lock->get_cap_shift()) 4748 in = static_cast<CInode *>(lock->get_parent()); 4749 4750 switch (lock->get_state()) { 4751 case LOCK_LOCK: lock->set_state(LOCK_LOCK_EXCL); break; 4752 case LOCK_SYNC: lock->set_state(LOCK_SYNC_EXCL); break; 4753 case LOCK_XSYN: lock->set_state(LOCK_XSYN_EXCL); break; 4754 default: ceph_abort(); 4755 } 4756 4757 int gather = 0; 4758 if (lock->is_rdlocked()) 4759 gather++; 4760 if (lock->is_wrlocked()) 4761 gather++; 4762 if (gather && lock->is_cached()) 4763 invalidate_lock_caches(lock); 4764 4765 if (lock->get_parent()->is_replicated() && 4766 lock->get_state() != LOCK_LOCK_EXCL && 4767 lock->get_state() != LOCK_XSYN_EXCL) { 4768 send_lock_message(lock, LOCK_AC_LOCK); 4769 lock->init_gather(); 4770 gather++; 4771 }
when the old lock state is LOCK_SYNC and the simple_excl() will set the lock state to LOCK_SYNC_EXCL directly in current MDS. While from mds/lock.c:
84 const struct sm_state_t filelock[LOCK_MAX] = { 85 // stable loner rep state r rp rd wr fwr l x caps(any,loner,xlocker,replica) 86 [LOCK_SYNC] = { 0, false, LOCK_SYNC, ANY, 0, ANY, 0, 0, ANY, 0, CEPH_CAP_GSHARED|CEPH_CAP_GCACHE|CEPH_CAP_GRD|CEPH_CAP_GLAZYIO,0,0,CEPH_CAP_GSHARED|CEPH_CAP_GCACHE|CEPH_CAP_GRD },
We can see that if the auth MDS's lock old state is LOCK_SYNC, then the replicas' lock state should be LOCK_SYCN.
So in simple_excl() in Line#4752 and Line#4758 it will send the LOCK_AC_LOCK lock request all the replica MDSs and in replicas' MDS it could trigger the assert:
5744 void Locker::handle_file_lock(ScatterLock *lock, const cref_t<MLock> &m) 5745 { 5746 CInode *in = static_cast<CInode*>(lock->get_parent()); 5747 int from = m->get_asker(); 5748 5749 if (mds->is_rejoin()) { 5750 if (in->is_rejoining()) { 5751 dout(7) << "handle_file_lock still rejoining " << *in 5752 << ", dropping " << *m << dendl; 5753 return; 5754 } 5755 } 5756 5757 dout(7) << "handle_file_lock a=" << lock->get_lock_action_name(m->get_action()) 5758 << " on " << *lock 5759 << " from mds." << from << " " 5760 << *in << dendl; 5761 5762 bool caps = lock->get_cap_shift(); 5763 5764 switch (m->get_action()) { 5765 // -- replica -- 5766 case LOCK_AC_SYNC: 5767 ceph_assert(lock->get_state() == LOCK_LOCK || 5768 lock->get_state() == LOCK_MIX || 5769 lock->get_state() == LOCK_MIX_SYNC2); 5770 ...
Updated by Venky Shankar 9 months ago
If the auth is sending `LOCK_AC_LOCK` then in the replica it should be handled here:
case LOCK_AC_LOCK: switch (lock->get_state()) { case LOCK_SYNC: lock->set_state(LOCK_SYNC_LOCK); break; case LOCK_MIX: lock->set_state(LOCK_MIX_LOCK); break; default: ceph_abort(); }
Updated by Xiubo Li 6 months ago
Found another case:
Firstly in auth mds the simple_sync() just transmits the filelock state LOCK_MIX --> LOCK_MIX_SYNC, and then sends a LOCK_AC_SYNC message to a replicate mds:
4636 bool Locker::simple_sync(SimpleLock *lock, bool *need_issue) 4637 { ... 4650 switch (lock->get_state()) { 4651 case LOCK_MIX: lock->set_state(LOCK_MIX_SYNC); break; 4652 case LOCK_LOCK: lock->set_state(LOCK_LOCK_SYNC); break; 4653 case LOCK_XSYN: lock->set_state(LOCK_XSYN_SYNC); break; 4654 case LOCK_EXCL: lock->set_state(LOCK_EXCL_SYNC); break; 4655 default: ceph_abort(); 4656 } ... 4676 if (lock->get_parent()->is_replicated() && old_state == LOCK_MIX) { 4677 send_lock_message(lock, LOCK_AC_SYNC); 4678 lock->init_gather(); 4679 gather++; 4680 } ... 4708 if (gather) { 4709 lock->get_parent()->auth_pin(lock); 4710 if (need_recover) 4711 mds->mdcache->do_file_recover(); 4712 return false; 4713 } 4715 4716 if (lock->get_parent()->is_replicated()) { // FIXME 4717 bufferlist data; 4718 lock->encode_locked_state(data); 4719 send_lock_message(lock, LOCK_AC_SYNC, data); 4720 } 4721 lock->set_state(LOCK_SYNC); 4722 lock->finish_waiters(SimpleLock::WAIT_RD|SimpleLock::WAIT_STABLE); 4723 if (in && in->is_head()) { 4724 if (need_issue) 4725 *need_issue = true; 4726 else 4727 issue_caps(in); 4728 } 4729 return true; 4730 }
While dues to some reasons just before the replicate mds receiving the LOCK_AC_SYNC msg the active mds restarts or failover, and the replicated mds just detects that the auth mds state is less than MDSMap::STATE_REJOIN, then in handle_file_lock() set the filelock state to LOCK_MIX_SYNC and then in eval_gather() will miss push it further to set the state to LOCK_MIX_SYNC2:
5743 void Locker::handle_file_lock(ScatterLock *lock, const cref_t<MLock> &m) 5744 { 5745 CInode *in = static_cast<CInode*>(lock->get_parent()); 5746 int from = m->get_asker(); 5747 5748 if (mds->is_rejoin()) { 5749 if (in->is_rejoining()) { 5750 dout(7) << "handle_file_lock still rejoining " << *in 5751 << ", dropping " << *m << dendl; 5752 return; 5753 } 5754 } 5755 5756 dout(7) << "handle_file_lock a=" << lock->get_lock_action_name(m->get_action()) 5757 << " on " << *lock 5758 << " from mds." << from << " " 5759 << *in << dendl; 5760 5761 bool caps = lock->get_cap_shift(); 5762 5763 switch (m->get_action()) { 5764 // -- replica -- 5765 case LOCK_AC_SYNC: 5766 ceph_assert(lock->get_state() == LOCK_LOCK || 5767 lock->get_state() == LOCK_MIX || 5768 lock->get_state() == LOCK_MIX_SYNC2); 5769 5770 if (lock->get_state() == LOCK_MIX) { 5771 lock->set_state(LOCK_MIX_SYNC); ===> set the state to LOCK_MIX_SYNC 5772 eval_gather(lock, true); ===> then call eval_gather() 5773 if (lock->is_unstable_and_locked()) { 5774 if (lock->is_cached()) 5775 invalidate_lock_caches(lock); 5776 mds->mdlog->flush(); 5777 } 5778 break; 5779 } 5780 ...
1066 void Locker::eval_gather(SimpleLock *lock, bool first, bool *pneed_issue, MDSContext::vec *pfinishers) 1067 { ... 1141 if (!mds->is_cluster_degraded() || 1142 mds->mdsmap->get_state(auth) >= MDSMap::STATE_REJOIN) { ===> cluster is degraded and auth MDS is not ready 1143 switch (lock->get_state()) { 1144 case LOCK_SYNC_LOCK: 1145 mds->send_message_mds(make_message<MLock>(lock, LOCK_AC_LOCKACK, mds->get_nodeid()), auth); 1146 break; 1147 1148 case LOCK_MIX_SYNC: 1149 { 1150 auto reply = make_message<MLock>(lock, LOCK_AC_SYNCACK, mds->get_nodeid()); 1151 lock->encode_locked_state(reply->get_data()); 1152 mds->send_message_mds(reply, auth); 1153 next = LOCK_MIX_SYNC2; 1154 (static_cast<ScatterLock *>(lock))->start_flush(); 1155 } 1156 break; ...
And then what if just after the active mds recovers and then sends a new LOCK_AC_SYNC msg ? Then in handle_file_lock() it will fail:
5743 void Locker::handle_file_lock(ScatterLock *lock, const cref_t<MLock> &m) 5744 { ... 5763 switch (m->get_action()) { 5764 // -- replica -- 5765 case LOCK_AC_SYNC: 5766 ceph_assert(lock->get_state() == LOCK_LOCK || 5767 lock->get_state() == LOCK_MIX || 5768 lock->get_state() == LOCK_MIX_SYNC2); 5769 5770 if (lock->get_state() == LOCK_MIX) { 5771 lock->set_state(LOCK_MIX_SYNC); 5772 eval_gather(lock, true); 5773 if (lock->is_unstable_and_locked()) { 5774 if (lock->is_cached()) 5775 invalidate_lock_caches(lock); 5776 mds->mdlog->flush(); 5777 } 5778 break; 5779 } ...
Updated by Venky Shankar 2 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from reef,quincy,pacific to reef,quincy,squid
Updated by Backport Bot 2 months ago
- Copied to Backport #64584: reef: crash: void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&): assert(lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2) added
Updated by Backport Bot 2 months ago
- Copied to Backport #64585: squid: crash: void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&): assert(lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2) added
Updated by Backport Bot 2 months ago
- Copied to Backport #64586: quincy: crash: void Locker::handle_file_lock(ScatterLock*, ceph::cref_t<MLock>&): assert(lock->get_state() == LOCK_LOCK || lock->get_state() == LOCK_MIX || lock->get_state() == LOCK_MIX_SYNC2) added