Project

General

Profile

Actions

Bug #19276

closed

librbd: multithread of 'class ThreadPoolSingleton' can result in assertion failure

Added by xin mycho about 7 years ago. Updated over 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

librbd coredump by assertion failure if let ThreadPoolSingleton construct class ThreadPool whith 3 or more threads(hard code is 1)
the class ThreadPoolSingleton starts one thread by hard code the 4th parameter of ThreadPool constructor, like:
explicit ThreadPoolSingleton(CephContext *cct)
: ThreadPool(cct, "librbd::thread_pool", "tp_librbd", 1,
"rbd_op_threads") {
start();
}

in order to improve the io performance, i modified it to multithread(here is 3 threads), like:
explicit ThreadPoolSingleton(CephContext *cct)
: ThreadPool(cct, "librbd::thread_pool", "tp_librbd", 3,
"rbd_op_threads") {
start();
}

i run 3 fio, and i stop one fio by Ctrl-c and restart it and then stop it...do this loop, this make result in coredump:
common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fffc1ffb700 time 2017-03-14 20:23:13.003766
common/Mutex.cc: 113: FAILED assert(r 0)
ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
1: (()+0x436c31) [0x7fffee3e0c31]
2: (()+0x3b8b36) [0x7fffee362b36]
3: (()+0x95131) [0x7fffee03f131]
4: (()+0xcb4e5) [0x7fffee0754e5]
5: (()+0xca7df) [0x7fffee0747df]
6: (()+0x1b24e4) [0x7fffee15c4e4]
7: (()+0x1b2267) [0x7fffee15c267]
8: (()+0x1b4d9e) [0x7fffee15ed9e]
9: (()+0x8d92b) [0x7fffee03792b]
10: (()+0xdb72e) [0x7fffee08572e]
11: (()+0xe605e) [0x7fffee09005e]
12: (()+0x4241e1) [0x7fffee3ce1e1]
13: (()+0x428693) [0x7fffee3d2693]
14: (()+0x417723) [0x7fffee3c1723]
15: (()+0x417658) [0x7fffee3c1658]
16: (()+0x8184) [0x7fffe32cc184]
17: (clone()+0x6d) [0x7fffe2df537d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Program received signal SIGABRT, Aborted.

==========================================
the frame:
#0 0x00007fffe2d31c37 in GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fffe2d35028 in _GI_abort () at abort.c:89
#2 0x00007fffee3e0eb4 in ceph::
_ceph_assert_fail (assertion=0x7fffee81040b "r == 0", file=0x7fffee8103fb "common/Mutex.cc", line=113, func=0x7fffee8104b0 <Mutex::Lock(bool)::__PRETTY_FUNC
TION
> "void Mutex::Lock(bool)") at common/assert.cc:78
#3 0x00007fffee362b36 in Mutex::Lock (this=0x7fffa0001768, no_lockdep=false) at common/Mutex.cc:113
#4 0x00007fffee03f131 in Mutex::Locker::Locker (this=0x7fffc1ff8370, m=...) at ./common/Mutex.h:115
#5 0x00007fffee0754e5 in librbd::ExclusiveLock<librbd::ImageCtx>::handle_peer_notification (this=0x7fffa0001760, r=0) at librbd/ExclusiveLock.cc:237
#6 0x00007fffee0747df in librbd::ExclusiveLock<librbd::ImageCtx>::shut_down (this=0x7fffa0001760, on_shut_down=0x7fffbc000be0) at librbd/ExclusiveLock.cc:141
#7 0x00007fffee15c4e4 in librbd::image::CloseRequest<librbd::ImageCtx>::send_shut_down_exclusive_lock (this=0x838b70) at librbd/image/CloseRequest.cc:132
#8 0x00007fffee15c267 in librbd::image::CloseRequest<librbd::ImageCtx>::handle_shut_down_aio_queue (this=0x838b70, r=0) at librbd/image/CloseRequest.cc:105
#9 0x00007fffee15ed9e in librbd::util::detail::C_CallbackAdapter<librbd::image::CloseRequest<librbd::ImageCtx>, &librbd::image::CloseRequest<librbd::ImageCtx>::handle_shut_down_aio_queue>:
:finish (this=0x7fffbc000ba0, r=0) at ./librbd/Utils.h:53
#10 0x00007fffee03792b in Context::complete (this=0x7fffbc000ba0, r=0) at ./include/Context.h:64
#11 0x00007fffee08572e in ContextWQ::process (this=0x838430, ctx=0x7fffbc000ba0) at ./common/WorkQueue.h:608
#12 0x00007fffee09005e in ThreadPool::PointerWQ<Context>::_void_process (this=0x838430, item=0x7fffbc000ba0, handle=...) at ./common/WorkQueue.h:396
#13 0x00007fffee3ce1e1 in ThreadPool::worker (this=0x837840, wt=0x837f90) at common/WorkQueue.cc:128
#14 0x00007fffee3d2693 in ThreadPool::WorkThread::entry (this=0x837f90) at common/WorkQueue.h:445
#15 0x00007fffee3c1723 in Thread::entry_wrapper (this=0x837f90) at common/Thread.cc:87
#16 0x00007fffee3c1658 in Thread::_entry_func (arg=0x837f90) at common/Thread.cc:67
#17 0x00007fffe32cc184 in start_thread (arg=0x7fffc1ffb700) at pthread_create.c:312
#18 0x00007fffe2df537d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Actions #1

Updated by Nathan Cutler about 7 years ago

  • Tracker changed from Tasks to Bug
  • Project changed from Stable releases to rbd
  • Regression set to No
  • Severity set to 3 - minor
Actions #2

Updated by Jason Dillaman about 7 years ago

  • Assignee deleted (Sage Weil)
  • Priority changed from Immediate to Normal
Actions #3

Updated by Jason Dillaman about 7 years ago

  • Status changed from New to Fix Under Review
Actions #4

Updated by Jason Dillaman about 7 years ago

  • Release deleted (jewel)
Actions #5

Updated by Jason Dillaman over 6 years ago

  • Status changed from Fix Under Review to Duplicate

Tracking multi-thread issues under #17379

Actions

Also available in: Atom PDF