Project

General

Profile

Bug #45268

[librbd]assert at Notifier::notify's aio_notify_locker

Added by haitao chen 4 months ago. Updated 20 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus,octopus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

ceph version 14.2.5
OS: centos 7.6
platform: aarch64
Procedure:
    Librbd return ETIMEOUT to tcmu-runner, tcmu-runner will close and reopen the rbd image in recovery thread.
Assert backtrace list below:

   Thread 1 (Thread 0xffeaaa75f0c0 (LWP 384891)):
#0  0x0000ffff8df050e8 in raise () from /lib64/libc.so.6
#1  0x0000ffff8df06760 in abort () from /lib64/libc.so.6
#2  0x0000ffff776def38 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib64/ceph/libceph-common.so.0
#3  0x0000ffff776df0b0 in ceph::__ceph_assert_fail(ceph::assert_data const&) () from /usr/lib64/ceph/libceph-common.so.0
#4  0x0000ffff77753290 in Mutex::lock(bool) () from /usr/lib64/ceph/libceph-common.so.0
#5  0x0000ffff872fa464 in lock_guard (__m=..., this=<synthetic pointer>) at /usr/src/debug/ceph-14.2.5-1.0.3/src/librbd/watcher/Notifier.cc:64
#6  librbd::watcher::Notifier::notify (this=this@entry=0xffcd0c005660, bl=..., response=response@entry=0x0, on_finish=on_finish@entry=0xffccd06c3920) at /usr/src/debug/ceph-14.2.5-1.0.3/src/librbd/watcher/Notifier.cc:64
#7  0x0000ffff871646e0 in librbd::Watcher::send_notify (this=this@entry=0xffcd0c0055e0, payload=..., response=response@entry=0x0, on_finish=on_finish@entry=0xffccd06c3920) at /usr/src/debug/ceph-14.2.5-1.0.3/src/librbd/Watcher.cc:346
#8  0x0000ffff8710af80 in librbd::ImageWatcher<librbd::ImageCtx>::send_notify (this=this@entry=0xffcd0c0055e0, payload=..., ctx=ctx@entry=0xffccd06c3920) at /usr/src/debug/ceph-14.2.5-1.0.3/src/librbd/WatchNotifyTypes.h:393
#9  0x0000ffff8710fab4 in librbd::ImageWatcher<librbd::ImageCtx>::notify_async_complete (this=0xffcd0c0055e0, request=..., r=-85) at /opt/rh/devtoolset-8/root/usr/include/c++/8/new:169
#10 0x0000ffff870ff798 in operator() (a0=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-14.2.5-1.0.3/build/boost/include/boost/function/function_base.hpp:614
#11 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /usr/src/debug/ceph-14.2.5-1.0.3/src/include/Context.h:487
#12 0x0000ffff870e62e4 in Context::complete (this=0xffea840061f0, r=<optimized out>) at /usr/src/debug/ceph-14.2.5-1.0.3/src/include/Context.h:77
#13 0x0000ffff77721038 in Finisher::finisher_thread_entry() () from /usr/lib64/ceph/libceph-common.so.0
#14 0x0000ffff8e147c48 in start_thread () from /lib64/libpthread.so.0
#15 0x0000ffff8dfaf600 in thread_start () from /lib64/libc.so.6

#0  0x0000ffff8e14c008 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1  0x0000ffff86a90ca4 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x0000ffff77720bdc in Finisher::wait_for_empty() () from /usr/lib64/ceph/libceph-common.so.0
#3  0x0000ffff87106ff4 in librbd::TaskFinisherSingleton::~TaskFinisherSingleton (this=0xffcd0c005c20, __in_chrg=<optimized out>) at /usr/src/debug/ceph-14.2.5-1.0.3/src/librbd/TaskFinisher.h:36
#4  0x0000ffff777b09d8 in std::_Rb_tree<std::pair<std::string, std::type_index>, std::pair<std::pair<std::string, std::type_index> const, ceph::immobile_any<576ul> >, std::_Select1st<std::pair<std::pair<std::string, std::type_index> const, ceph::immobile_any<576ul> > >, CephContext::associated_objs_cmp, std::allocator<std::pair<std::pair<std::string, std::type_index> const, ceph::immobile_any<576ul> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::pair<std::string, std::type_index> const, ceph::immobile_any<576ul> > >*) () from /usr/lib64/ceph/libceph-common.so.0
#5  0x0000ffff777b09c0 in std::_Rb_tree<std::pair<std::string, std::type_index>, std::pair<std::pair<std::string, std::type_index> const, ceph::immobile_any<576ul> >, std::_Select1st<std::pair<std::pair<std::string, std::type_index> const, ceph::immobile_any<576ul> > >, CephContext::associated_objs_cmp, std::allocator<std::pair<std::pair<std::string, std::type_index> const, ceph::immobile_any<576ul> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::pair<std::string, std::type_index> const, ceph::immobile_any<576ul> > >*) () from /usr/lib64/ceph/libceph-common.so.0
#6  0x0000ffff777acf50 in CephContext::~CephContext() () from /usr/lib64/ceph/libceph-common.so.0
#7  0x0000ffff777ad2e8 in CephContext::put() () from /usr/lib64/ceph/libceph-common.so.0
#8  0x0000ffff86f78da0 in operator() (__args#0=<optimized out>, this=0xffe238ac8fa8) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
#9  ~unique_ptr (this=0xffe238ac8fa8, __in_chrg=<optimized out>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/unique_ptr.h:274
#10 librados::v14_2_0::RadosClient::~RadosClient (this=0xffe238ac8f90, __in_chrg=<optimized out>) at /usr/src/debug/ceph-14.2.5-1.0.3/src/librados/RadosClient.cc:483
#11 0x0000ffff86f78e1c in librados::v14_2_0::RadosClient::~RadosClient (this=0xffe238ac8f90, __in_chrg=<optimized out>) at /usr/src/debug/ceph-14.2.5-1.0.3/src/librados/RadosClient.cc:483
#12 0x0000ffff86f15820 in _rados_shutdown (cluster=0xffe238ac8f90) at /usr/src/debug/ceph-14.2.5-1.0.3/src/librados/librados_c.cc:189
#13 0x0000ffff875225b8 in tcmu_rbd_image_close (dev=0x34682890) at /usr/src/debug/tcmu-runner-1.4.1/rbd.c:365
#14 0x0000ffff87523ca8 in tcmu_rbd_close (dev=0x34682890) at /usr/src/debug/tcmu-runner-1.4.1/rbd.c:982
#15 0x000000000040e928 in __tcmu_reopen_dev (dev=0x34682890, retries=-1) at /usr/src/debug/tcmu-runner-1.4.1/tcmur_device.c:98
#16 0x000000000040fe34 in tgt_port_grp_recovery_thread_fn (arg=0xffcd0c071210) at /usr/src/debug/tcmu-runner-1.4.1/target.c:254
#17 0x0000ffff8e147c48 in start_thread () from /lib64/libpthread.so.0
#18 0x0000ffff8dfaf600 in thread_start () from /lib64/libc.so.6

Maybe cause:
    The class Notifier's m_aio_notify_lock will be destory in rbd close flow.
    But the ImageWatcher.cc's m_task_finisher's m_finisher destroied on rados close flow.
    If some task in m_finisher queue, but the rbd close flow is completed. It will strike the mutex.cc lock's assert, because the lock has been destroyed.


Related issues

Copied to rbd - Backport #46719: octopus: [librbd]assert at Notifier::notify's aio_notify_locker New
Copied to rbd - Backport #46720: nautilus: [librbd]assert at Notifier::notify's aio_notify_locker New

History

#1 Updated by Greg Farnum 3 months ago

  • Project changed from Ceph to rbd
  • Category deleted (librbd)

#2 Updated by Mykola Golub about 1 month ago

  • Status changed from New to In Progress
  • Assignee set to Mykola Golub

#3 Updated by Mykola Golub about 1 month ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 35981

#4 Updated by Jason Dillaman 20 days ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to nautilus,octopus

#5 Updated by Nathan Cutler 14 days ago

  • Copied to Backport #46719: octopus: [librbd]assert at Notifier::notify's aio_notify_locker added

#6 Updated by Nathan Cutler 14 days ago

  • Copied to Backport #46720: nautilus: [librbd]assert at Notifier::notify's aio_notify_locker added

Also available in: Atom PDF