Project

General

Profile

Bug #13636

rbd: pure virtual method called

Added by Josh Durgin almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Target version:
-
Start date:
10/29/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
infernalis,hammer,firefly
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:

Description

From http://qa-proxy.ceph.com/teuthology/teuthology-2015-10-23_23:00:08-rbd-master---basic-multi/1122358/teuthology.log :

2015-10-26T01:48:55.719 INFO:tasks.workunit.client.0.mira071.stderr:+ rbd rm image.32
2015-10-26T01:48:55.778 INFO:tasks.workunit.client.0.mira071.stderr:pure virtual method called
2015-10-26T01:48:55.778 INFO:tasks.workunit.client.0.mira071.stderr:terminate called without an active exception
2015-10-26T01:48:55.778 INFO:tasks.workunit.client.0.mira071.stderr:*** Caught signal (Aborted) **
2015-10-26T01:48:55.778 INFO:tasks.workunit.client.0.mira071.stderr: in thread 7f0c9a7fc700
2015-10-26T01:48:55.810 INFO:tasks.workunit.client.0.mira071.stderr: ceph version 9.1.0-313-g252ffe0 (252ffe0155b2f3b91aecc9eab5c785ad6f652dbe)
2015-10-26T01:48:55.811 INFO:tasks.workunit.client.0.mira071.stderr: 1: (()+0x1ab86a) [0x7f0cad50186a]
2015-10-26T01:48:55.811 INFO:tasks.workunit.client.0.mira071.stderr: 2: (()+0x10340) [0x7f0ca7b76340]
2015-10-26T01:48:55.811 INFO:tasks.workunit.client.0.mira071.stderr: 3: (gsignal()+0x39) [0x7f0ca6634cc9]
2015-10-26T01:48:55.811 INFO:tasks.workunit.client.0.mira071.stderr: 4: (abort()+0x148) [0x7f0ca66380d8]
2015-10-26T01:48:55.811 INFO:tasks.workunit.client.0.mira071.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f0ca6f3f535]
2015-10-26T01:48:55.811 INFO:tasks.workunit.client.0.mira071.stderr: 6: (()+0x5e6d6) [0x7f0ca6f3d6d6]
2015-10-26T01:48:55.811 INFO:tasks.workunit.client.0.mira071.stderr: 7: (()+0x5e703) [0x7f0ca6f3d703]
2015-10-26T01:48:55.812 INFO:tasks.workunit.client.0.mira071.stderr: 8: (()+0x5f1bf) [0x7f0ca6f3e1bf]
2015-10-26T01:48:55.812 INFO:tasks.workunit.client.0.mira071.stderr: 9: (()+0x79fe4) [0x7f0caa7acfe4]
2015-10-26T01:48:55.812 INFO:tasks.workunit.client.0.mira071.stderr: 10: (()+0x177e84) [0x7f0caa8aae84]
2015-10-26T01:48:55.812 INFO:tasks.workunit.client.0.mira071.stderr: 11: (()+0x179220) [0x7f0caa8ac220]
2015-10-26T01:48:55.812 INFO:tasks.workunit.client.0.mira071.stderr: 12: (()+0x8182) [0x7f0ca7b6e182]
2015-10-26T01:48:55.812 INFO:tasks.workunit.client.0.mira071.stderr: 13: (clone()+0x6d) [0x7f0ca66f847d]
2015-10-26T01:48:55.812 INFO:tasks.workunit.client.0.mira071.stderr:2015-10-26 01:48:55.790939 7f0c9a7fc700 -1 *** Caught signal (Aborted) **
2015-10-26T01:48:55.813 INFO:tasks.workunit.client.0.mira071.stderr: in thread 7f0c9a7fc700

Related issues

Copied to rbd - Backport #13757: rbd: pure virtual method called Rejected
Copied to rbd - Backport #13758: rbd: pure virtual method called Resolved
Copied to rbd - Backport #13759: rbd: pure virtual method called Resolved

Associated revisions

Revision 3e78b18b (diff)
Added by Jason Dillaman almost 3 years ago

WorkQueue: new PointerWQ base class for ContextWQ

The existing work queues do not properly function if added to a running
thread pool. librbd uses a singleton thread pool which requires
dynamically adding/removing work queues as images are opened and closed.

Fixes: #13636
Signed-off-by: Jason Dillaman <>

Revision 112c686f (diff)
Added by Jason Dillaman almost 3 years ago

WorkQueue: new PointerWQ base class for ContextWQ

The existing work queues do not properly function if added to a running
thread pool. librbd uses a singleton thread pool which requires
dynamically adding/removing work queues as images are opened and closed.

Fixes: #13636
Signed-off-by: Jason Dillaman <>
(cherry picked from commit 3e78b18b09d75626ca2599bac3b9f9c9889507a5)

Conflicts:
src/common/WorkQueue.h
Trivial merge conflict at class `ContextWQ` initialization

Revision ad84753a (diff)
Added by Jason Dillaman over 2 years ago

WorkQueue: new PointerWQ base class for ContextWQ

The existing work queues do not properly function if added to a running
thread pool. librbd uses a singleton thread pool which requires
dynamically adding/removing work queues as images are opened and closed.

Fixes: #13636
Signed-off-by: Jason Dillaman <>
(cherry picked from commit 3e78b18b09d75626ca2599bac3b9f9c9889507a5)

History

#1 Updated by Jason Dillaman almost 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman

#2 Updated by Jason Dillaman almost 3 years ago

#0  0x00007f0ca7b7620b in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x00007f0cad50192d in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:109
#3  <signal handler called>
#4  0x00007f0ca6634cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5  0x00007f0ca66380d8 in __GI_abort () at abort.c:89
#6  0x00007f0ca6f3f535 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f0ca6f3d6d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f0ca6f3d703 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f0ca6f3e1bf in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007f0caa7acfe4 in ThreadPool::WorkQueueVal<std::pair<Context*, int>, std::pair<Context*, int> >::_void_dequeue (
    this=0x7f0cb09d6360) at ./common/WorkQueue.h:197
#11 0x00007f0caa8aae84 in ThreadPool::worker (this=0x7f0cb09cfdf0, wt=0x7f0cb09c1970) at common/WorkQueue.cc:120
#12 0x00007f0caa8ac220 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:362
#13 0x00007f0ca7b6e182 in start_thread (arg=0x7f0c9a7fc700) at pthread_create.c:312
#14 0x00007f0ca66f847d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

#3 Updated by Jason Dillaman almost 3 years ago

ThreadPool::WorkQueueVal @ 0x7f0cb09d6360 is the ImageCtx::op_work_queue from the active image. Thread 4 shows that the image is still open:

Thread 4 (Thread 0x7f0cad7597c0 (LWP 7008)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f0ca8251559 in Wait (mutex=..., this=0x7ffec3688ec0) at ./common/Cond.h:55
#2  librados::IoCtxImpl::operate_read (this=0x7f0cb09cf930, oid=..., o=<optimized out>, pbl=pbl@entry=0x7ffec36890c0, flags=flags@entry=0)
    at librados/IoCtxImpl.cc:560
#3  0x00007f0ca82187b6 in librados::IoCtx::operate (this=this@entry=0x7f0cb09cf1d0, oid=..., o=o@entry=0x7ffec3689030, 
    pbl=pbl@entry=0x7ffec36890c0) at librados/librados.cc:1355
#4  0x00007f0caab224a3 in librbd::cls_client::get_stripe_unit_count (ioctx=ioctx@entry=0x7f0cb09cf1d0, oid=..., 
    stripe_unit=stripe_unit@entry=0x7f0cb09cf5b0, stripe_count=stripe_count@entry=0x7f0cb09cf5b8) at cls/rbd/cls_rbd_client.cc:574
#5  0x00007f0caa7abbaf in librbd::ImageCtx::init (this=this@entry=0x7f0cb09cf050) at librbd/ImageCtx.cc:174
#6  0x00007f0caa7d7bfa in librbd::open_image (ictx=ictx@entry=0x7f0cb09cf050) at librbd/internal.cc:2679
#7  0x00007f0caa7d98a7 in librbd::remove (io_ctx=..., imgname=imgname@entry=0x7f0cb09c0cc0 "image.32", prog_ctx=...)
    at librbd/internal.cc:1813
#8  0x00007f0caa7805f0 in librbd::RBD::remove_with_progress (this=<optimized out>, io_ctx=..., name=0x7f0cb09c0cc0 "image.32", pctx=...)
    at librbd/librbd.cc:346
#9  0x00007f0cad4215c0 in do_delete (rbd=..., io_ctx=..., imgname=<optimized out>) at rbd.cc:691
#10 0x00007f0cad42f7ea in main (argc=<optimized out>, argv=<optimized out>) at rbd.cc:3796

Dumping the vtable from the aborted thread indicates that the vtable is accurate for the "_empty" method at the time of core generation:

p /a (*(void ***)this)[0]@10
$11 = {0x7f0caa7ade40 <ContextWQ::~ContextWQ()>, 0x7f0caa7ad910 <ContextWQ::~ContextWQ()>, 
  0x7f0caa7acb10 <ThreadPool::WorkQueueVal<std::pair<Context*, int>, std::pair<Context*, int> >::_clear()>, 
  0x7f0caa7acaa0 <ContextWQ::_empty()>, 
  0x7f0caa7acfc0 <ThreadPool::WorkQueueVal<std::pair<Context*, int>, std::pair<Context*, int> >::_void_dequeue()>, 
  0x7f0caa7acde0 <ThreadPool::WorkQueueVal<std::pair<Context*, int>, std::pair<Context*, int> >::_void_process(void*, ThreadPool::TPHandle&)>, 0x7f0caa7acd50 <ThreadPool::WorkQueueVal<std::pair<Context*, int>, std::pair<Context*, int> >::_void_process_finish(void*)>, 
  0x7f0caa7aced0 <ContextWQ::_enqueue(std::pair<Context*, int>)>, 0x7f0caa7acf20 <ContextWQ::_enqueue_front(std::pair<Context*, int>)>, 
  0x7f0caa7acb90 <ContextWQ::_dequeue()>}

Therefore, the vtable must not have been initialized when it was registered with the ThreadPool. Turns out that the parent class of ContextWQ improperly adds itself to the ThreadPool within its constructor -- resulting in a race.

#4 Updated by Jason Dillaman almost 3 years ago

  • Backport set to infernalis,hammer,firefly

#5 Updated by Nathan Cutler almost 3 years ago

Hi - if you target infernalis with the fix, we would only have to backport to hammer and firefly. AFAIK master is still being regularly synched with infernalis. Just an idea.

#6 Updated by Jason Dillaman almost 3 years ago

infernalis is effectively closed pending its eminent release. After its release, a jewel branch will be created from master and all bug fixes should be targeted to it.

#7 Updated by Jason Dillaman almost 3 years ago

  • Status changed from In Progress to Need Review

#8 Updated by Loic Dachary almost 3 years ago

#9 Updated by Loic Dachary almost 3 years ago

#10 Updated by Loic Dachary almost 3 years ago

#11 Updated by Jason Dillaman over 2 years ago

  • Status changed from Need Review to Pending Backport

#12 Updated by Loic Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF