Project

General

Profile

Actions

Bug #65606

open

workload fails due to slow ops, assert in logs mds/Locker.cc: 551 FAILED ceph_assert(!lock->is_waiter_for(SimpleLock::WAIT_WR) || lock->is_waiter_for(SimpleLock::WAIT_XLOCK))

Added by Leonid Usov 26 days ago. Updated 16 days ago.

Status:
Pending Backport
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
backport_processed
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, quiesce
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/leonidus-2024-04-21_11:37:13-fs-wip-lusov-quiescer-distro-default-smithi/7666598/

The first instance of a slow op reported by teuthology:

2024-04-21T12:18:20.317 INFO:journalctl@ceph.mon.c.smithi129.stdout:Apr 21 12:18:20 smithi129 ceph-mon[31790]: 1 slow requests, 1 included below; oldest blocked for > 199.712049 secs
2024-04-21T12:18:20.318 INFO:journalctl@ceph.mon.c.smithi129.stdout:Apr 21 12:18:20 smithi129 ceph-mon[31790]: slow request 199.712049 seconds old, received at 2024-04-21T12:15:00.098184+0000: peer_request:mds.0:1981 currently dispatched
2024-04-21T12:18:20.348 INFO:journalctl@ceph.mon.b.smithi073.stdout:Apr 21 12:18:20 smithi073 ceph-mon[31345]: 1 slow requests, 1 included below; oldest blocked for > 199.712049 secs
2024-04-21T12:18:20.348 INFO:journalctl@ceph.mon.b.smithi073.stdout:Apr 21 12:18:20 smithi073 ceph-mon[31345]: slow request 199.712049 seconds old, received at 2024-04-21T12:15:00.098184+0000: peer_request:mds.0:1981 currently dispatched
2024-04-21T12:18:20.443 INFO:journalctl@ceph.mon.a.smithi046.stdout:Apr 21 12:18:20 smithi046 ceph-mon[27983]: 1 slow requests, 1 included below; oldest blocked for > 199.712049 secs
2024-04-21T12:18:20.444 INFO:journalctl@ceph.mon.a.smithi046.stdout:Apr 21 12:18:20 smithi046 ceph-mon[27983]: slow request 199.712049 seconds old, received at 2024-04-21T12:15:00.098184+0000: peer_request:mds.0:1981 currently dispatched

leonidus@teuthology:/a/leonidus-2024-04-21_11:37:13-fs-wip-lusov-quiescer-distro-default-smithi/7666598
$ zgrep slow remote/smithi073/log/88391cde-ffd5-11ee-bc93-c7b262605968/ceph-mds.b.log.gz
2024-04-21T12:18:24.468+0000 7fa9f195d700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 210.002867 secs
2024-04-21T12:18:24.468+0000 7fa9f195d700  0 log_channel(cluster) log [WRN] : slow request 210.002867 seconds old, received at 2024-04-21T12:14:54.467714+0000: internal op exportdir:mds.0:1981 currently failed to acquire quiesce lock

The same log shows traces of a crash:

2024-04-21T12:18:40.865+0000 7fa9ec953700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/19.0.0-3154-ga630765c/rpm/el8/BUILD/ceph
-19.0.0-3154-ga630765c/src/mds/Locker.cc: In function 'void Locker::local_wrlock_finish(const lock_iterator&, MutationImpl*)' thread 7fa9ec953700 time 2024-04-21T12:18:40.864289+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/19.0.0-3154-ga630765c/rpm/el8/BUILD/ceph-19.0.0-3154-ga630765c/src/mds/Locker.cc: 551
3: FAILED ceph_assert(!lock->is_waiter_for(SimpleLock::WAIT_WR) || lock->is_waiter_for(SimpleLock::WAIT_XLOCK))

 ceph version 19.0.0-3154-ga630765c (a630765c1f93c6e61d9b40c4b76437611d5ef29e) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7fa9fd7893dd]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x2d15a3) [0x7fa9fd7895a3]
 3: (Locker::local_wrlock_finish(std::_Rb_tree_const_iterator<MutationImpl::LockOp> const&, MutationImpl*)+0x7b4) [0x55cad974fec4]
 4: (Locker::_drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*, bool)+0x1ca) [0x55cad977063a]
 5: (Locker::drop_non_rdlocks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x5f) [0x55cad97764cf]
 6: (Server::reply_client_request(boost::intrusive_ptr<MDRequestImpl> const&, boost::intrusive_ptr<MClientReply> const&)+0x4af) [0x55cad95c3a5f]
 7: (Server::respond_to_request(boost::intrusive_ptr<MDRequestImpl> const&, int)+0x3b8) [0x55cad95c5458]
 8: (Server::_link_remote_finish(boost::intrusive_ptr<MDRequestImpl> const&, bool, CDentry*, CInode*, unsigned long)+0x165) [0x55cad95c87f5]
 9: (MDSContext::complete(int)+0x5f) [0x55cad98b05ff]
 10: (MDSIOContextBase::complete(int)+0x3b4) [0x55cad98b0b04]
 11: (MDSLogContextBase::complete(int)+0x5b) [0x55cad98b0cfb]
 12: (Finisher::finisher_thread_entry()+0x19d) [0x7fa9fd829d1d]
 13: /lib64/libpthread.so.0(+0x81ca) [0x7fa9fc5071ca]
 14: clone()

Related issues 1 (1 open0 closed)

Copied to CephFS - Backport #65710: squid: workload fails due to slow ops, assert in logs mds/Locker.cc: 551 FAILED ceph_assert(!lock->is_waiter_for(SimpleLock::WAIT_WR) || lock->is_waiter_for(SimpleLock::WAIT_XLOCK))Fix Under ReviewLeonid UsovActions
Actions

Also available in: Atom PDF