Actions
Bug #65606
openworkload fails due to slow ops, assert in logs mds/Locker.cc: 551 FAILED ceph_assert(!lock->is_waiter_for(SimpleLock::WAIT_WR) || lock->is_waiter_for(SimpleLock::WAIT_XLOCK))
% Done:
0%
Source:
Q/A
Tags:
backport_processed
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, quiesce
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
The first instance of a slow op reported by teuthology:
2024-04-21T12:18:20.317 INFO:journalctl@ceph.mon.c.smithi129.stdout:Apr 21 12:18:20 smithi129 ceph-mon[31790]: 1 slow requests, 1 included below; oldest blocked for > 199.712049 secs 2024-04-21T12:18:20.318 INFO:journalctl@ceph.mon.c.smithi129.stdout:Apr 21 12:18:20 smithi129 ceph-mon[31790]: slow request 199.712049 seconds old, received at 2024-04-21T12:15:00.098184+0000: peer_request:mds.0:1981 currently dispatched 2024-04-21T12:18:20.348 INFO:journalctl@ceph.mon.b.smithi073.stdout:Apr 21 12:18:20 smithi073 ceph-mon[31345]: 1 slow requests, 1 included below; oldest blocked for > 199.712049 secs 2024-04-21T12:18:20.348 INFO:journalctl@ceph.mon.b.smithi073.stdout:Apr 21 12:18:20 smithi073 ceph-mon[31345]: slow request 199.712049 seconds old, received at 2024-04-21T12:15:00.098184+0000: peer_request:mds.0:1981 currently dispatched 2024-04-21T12:18:20.443 INFO:journalctl@ceph.mon.a.smithi046.stdout:Apr 21 12:18:20 smithi046 ceph-mon[27983]: 1 slow requests, 1 included below; oldest blocked for > 199.712049 secs 2024-04-21T12:18:20.444 INFO:journalctl@ceph.mon.a.smithi046.stdout:Apr 21 12:18:20 smithi046 ceph-mon[27983]: slow request 199.712049 seconds old, received at 2024-04-21T12:15:00.098184+0000: peer_request:mds.0:1981 currently dispatched
leonidus@teuthology:/a/leonidus-2024-04-21_11:37:13-fs-wip-lusov-quiescer-distro-default-smithi/7666598 $ zgrep slow remote/smithi073/log/88391cde-ffd5-11ee-bc93-c7b262605968/ceph-mds.b.log.gz
2024-04-21T12:18:24.468+0000 7fa9f195d700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 210.002867 secs 2024-04-21T12:18:24.468+0000 7fa9f195d700 0 log_channel(cluster) log [WRN] : slow request 210.002867 seconds old, received at 2024-04-21T12:14:54.467714+0000: internal op exportdir:mds.0:1981 currently failed to acquire quiesce lock
The same log shows traces of a crash:
2024-04-21T12:18:40.865+0000 7fa9ec953700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/19.0.0-3154-ga630765c/rpm/el8/BUILD/ceph -19.0.0-3154-ga630765c/src/mds/Locker.cc: In function 'void Locker::local_wrlock_finish(const lock_iterator&, MutationImpl*)' thread 7fa9ec953700 time 2024-04-21T12:18:40.864289+0000 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/19.0.0-3154-ga630765c/rpm/el8/BUILD/ceph-19.0.0-3154-ga630765c/src/mds/Locker.cc: 551 3: FAILED ceph_assert(!lock->is_waiter_for(SimpleLock::WAIT_WR) || lock->is_waiter_for(SimpleLock::WAIT_XLOCK)) ceph version 19.0.0-3154-ga630765c (a630765c1f93c6e61d9b40c4b76437611d5ef29e) squid (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7fa9fd7893dd] 2: /usr/lib64/ceph/libceph-common.so.2(+0x2d15a3) [0x7fa9fd7895a3] 3: (Locker::local_wrlock_finish(std::_Rb_tree_const_iterator<MutationImpl::LockOp> const&, MutationImpl*)+0x7b4) [0x55cad974fec4] 4: (Locker::_drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*, bool)+0x1ca) [0x55cad977063a] 5: (Locker::drop_non_rdlocks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x5f) [0x55cad97764cf] 6: (Server::reply_client_request(boost::intrusive_ptr<MDRequestImpl> const&, boost::intrusive_ptr<MClientReply> const&)+0x4af) [0x55cad95c3a5f] 7: (Server::respond_to_request(boost::intrusive_ptr<MDRequestImpl> const&, int)+0x3b8) [0x55cad95c5458] 8: (Server::_link_remote_finish(boost::intrusive_ptr<MDRequestImpl> const&, bool, CDentry*, CInode*, unsigned long)+0x165) [0x55cad95c87f5] 9: (MDSContext::complete(int)+0x5f) [0x55cad98b05ff] 10: (MDSIOContextBase::complete(int)+0x3b4) [0x55cad98b0b04] 11: (MDSLogContextBase::complete(int)+0x5b) [0x55cad98b0cfb] 12: (Finisher::finisher_thread_entry()+0x19d) [0x7fa9fd829d1d] 13: /lib64/libpthread.so.0(+0x81ca) [0x7fa9fc5071ca] 14: clone()
Actions