Project

General

Profile

Bug #8885

SIGABRT in TrackedOp::dump() via dump_ops_in_flight()

Added by Sage Weil almost 3 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
07/20/2014
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-07-19_23:15:03-rados-wip-sage-testing-testing-basic-plana/370540

 -2631> 2014-07-20 03:37:26.447023 7f7ff5f15700 -1 *** Caught signal (Aborted) **
 in thread 7f7ff5f15700

 ceph version 0.82-697-g4438ab1 (4438ab1b7ea41df6e18e9cd17d9a69f34c925c83)
 1: ceph-osd() [0x9aa88a]
 2: (()+0xfcb0) [0x7f7ff8c25cb0]
 3: (gsignal()+0x35) [0x7f7ff736d425]
 4: (abort()+0x17b) [0x7f7ff7370b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f7ff7cc069d]
 6: (()+0xb5846) [0x7f7ff7cbe846]
 7: (()+0xb5873) [0x7f7ff7cbe873]
 8: (()+0xb628f) [0x7f7ff7cbf28f]
 9: (TrackedOp::dump(utime_t, ceph::Formatter*) const+0x1d0) [0x6cfd60]
 10: (OpTracker::dump_ops_in_flight(ceph::Formatter*)+0xa7) [0x6d07e7]
 11: (OSD::asok_command(std::string, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::varia
nt::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::va
riant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::voi
d_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::
void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > >&, std::string, std::ostream&)+0x2f0) [0x6262e0]
 12: (OSDSocketHook::call(std::string, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::var
iant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::
variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::v
oid_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > >&, std::string, ceph::buffer::list&)+0x7d) [0x68b90d]
 13: (AdminSocket::do_accept()+0x1084) [0xa88674]
 14: (AdminSocket::entry()+0x130) [0xa89d70]
 15: (()+0x7e9a) [0x7f7ff8c1de9a]
 16: (clone()+0x6d) [0x7f7ff742b3fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

this is the second time i've seen this


Related issues

Related to Bug #14598: osd: pure virtual method called, abrt in TrackedOp::dump Duplicate 02/01/2016
Copied to Backport #16225: hammer: SIGABRT in TrackedOp::dump() via dump_ops_in_flight() Resolved

Associated revisions

Revision ad13e054 (diff)
Added by David Zafman about 1 year ago

common/TrackedOp: Handle dump racing with constructor

Use is_tracked to prevent TrackedOp::dump() from trying to call
virtual function while still in OpRequest constructor.

Fixes: #8885

Signed-off-by: David Zafman <>

Revision c7220ccc (diff)
Added by David Zafman 6 months ago

common/TrackedOp: Handle dump racing with constructor

Use is_tracked to prevent TrackedOp::dump() from trying to call
virtual function while still in OpRequest constructor.

Fixes: #8885

Signed-off-by: David Zafman <>
(cherry picked from commit ad13e05499669a79bde9a219ba1089f929e0388e)

Conflicts:
src/common/TrackedOp.cc
src/common/TrackedOp.h
Hammer can't use atomic<bool> so use atomic_t instead

History

#1 Updated by Sage Weil almost 3 years ago

  • Assignee set to Greg Farnum

#2 Updated by Greg Farnum almost 3 years ago

  • Assignee changed from Greg Farnum to Sage Weil

Have you seen this on a mainstream branch?

Looking at the dump() function, the only plausible candidate for failing is that the OpRequest::_dump_op_descriptor() function calls the Message::print() function. You've got a lot of Messenger and OSD changes running around which could corrupt that. ;)

If it is on mainline, I guess we are clearing the Message's payload and data when the Op is unregistered, but that's not the scenario we're in here (dump_ops_in_flight) and I don't think any Messages depend on that for their printing?

#3 Updated by Sage Weil almost 3 years ago

  • Status changed from Verified to Need More Info
  • Assignee deleted (Sage Weil)

let's wait for this to trigger on master or next

#4 Updated by Sage Weil almost 3 years ago

  • Priority changed from Urgent to High

#5 Updated by Sage Weil almost 3 years ago

  • Status changed from Need More Info to Can't reproduce

#6 Updated by Sage Weil over 2 years ago

  • Status changed from Can't reproduce to Verified

on master this time: description: rados/thrash/{clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml
thrashers/default.yaml workloads/ec-rados-plugin=jerasure-k=3-m=1.yaml}

ubuntu@teuthology:/a/sage-2014-08-30_20:39:25-rados-wip-sage-testing-testing-basic-multi/462360

2014-08-30 22:16:23.688881 7f520521c700 -1 *** Caught signal (Aborted) **
 in thread 7f520521c700

 ceph version 0.84-933-g4c4f35e (4c4f35e9099313ca5f86dec4f1ea045513d9648b)
 1: ceph-osd() [0x9b5b6a]
 2: (()+0xfcb0) [0x7f5208d0dcb0]
 3: (gsignal()+0x35) [0x7f52075f9425]
 4: (abort()+0x17b) [0x7f52075fcb8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f5207f4c69d]
 6: (()+0xb5846) [0x7f5207f4a846]
 7: (()+0xb5873) [0x7f5207f4a873]
 8: (()+0xb628f) [0x7f5207f4b28f]
 9: (TrackedOp::dump(utime_t, ceph::Formatter*) const+0x1d0) [0x6c8520]
 10: (OpTracker::dump_ops_in_flight(ceph::Formatter*)+0xaf) [0x6c8f6f]
 11: (OSD::asok_command(std::string, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > >&, std::string, std::ostream&)+0x300) [0x61e950]
 12: (OSDSocketHook::call(std::string, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > >&, std::string, ceph::buffer::list&)+0x7d) [0x67fcdd]
 13: (AdminSocket::do_accept()+0x109a) [0xa95f2a]
 14: (AdminSocket::entry()+0x130) [0xa97550]
 15: (()+0x7e9a) [0x7f5208d05e9a]
 16: (clone()+0x6d) [0x7f52076b73fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#7 Updated by Sage Weil over 2 years ago

  • Assignee set to Somnath Roy

#8 Updated by Somnath Roy over 2 years ago

It's the same issue as #9384

Here is the pull request for the same.

https://github.com/ceph/ceph/pull/2440

#9 Updated by Somnath Roy over 2 years ago

  • Status changed from Verified to Resolved

#10 Updated by Samuel Just over 2 years ago

  • Status changed from Resolved to Verified
  • Priority changed from High to Urgent

/a/samuelj-2014-10-21_16:45:57-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/564093/remote

#11 Updated by Samuel Just over 2 years ago

  • Status changed from Verified to Can't reproduce

#12 Updated by Brad Hubbard over 1 year ago

Seeing this on 0.94.3.3

 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
 1: /usr/bin/ceph-osd() [0xa20592]
 2: (()+0xf100) [0x7f4f5e2e4100]
 3: (gsignal()+0x37) [0x7f4f5ccfc5f7]
 4: (abort()+0x148) [0x7f4f5ccfdce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4f5d6009d5]
 6: (()+0x5e946) [0x7f4f5d5fe946]
 7: (()+0x5e973) [0x7f4f5d5fe973]
 8: (()+0x5f4df) [0x7f4f5d5ff4df]
 9: (TrackedOp::dump(utime_t, ceph::Formatter*) const+0x238) [0x6e8278]
 10: (OpTracker::dump_ops_in_flight(ceph::Formatter*)+0xa7) [0x6e8d17]
 11: (OSD::asok_command(std::string, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > >&, std::string, std::ostream&)+0x216) [0x63bb96]
 12: (OSDSocketHook::call(std::string, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::less<std::string>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> > > >&, std::string, ceph::buffer::list&)+0x25b) [0x6a764b]
 13: (AdminSocket::do_accept()+0xf36) [0xb13e36]
 14: (AdminSocket::entry()+0x280) [0xb15960]
 15: (()+0x7dc5) [0x7f4f5e2dcdc5]
 16: (clone()+0x6d) [0x7f4f5cdbd21d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

$ git describe --contains a8e6de307f2e4f4ff5e7faad3809ed708e05fdde
v0.88~171

# gdb -q /usr/bin/ceph-osd
Reading symbols from /usr/bin/ceph-osd...Reading symbols from /usr/lib/debug/usr/bin/ceph-osd.debug...done.
done.
(gdb) disass /m 0x6e8278
Dump of assembler code for function TrackedOp::dump(utime_t, ceph::Formatter*) const:
322     {
   0x00000000006e8040 <+0>:     push   %r15
   0x00000000006e8042 <+2>:     push   %r14
   0x00000000006e8044 <+4>:     push   %r13
   0x00000000006e8046 <+6>:     push   %r12
   0x00000000006e8048 <+8>:     mov    %rdi,%r12
   0x00000000006e804b <+11>:    push   %rbp
   0x00000000006e804c <+12>:    mov    %rdx,%rbp
   0x00000000006e804f <+15>:    push   %rbx
   0x00000000006e8050 <+16>:    sub    $0x248,%rsp
   0x00000000006e8057 <+23>:    mov    %fs:0x28,%rax
   0x00000000006e8060 <+32>:    mov    %rax,0x238(%rsp)
   0x00000000006e8068 <+40>:    xor    %eax,%eax
   0x00000000006e807a <+58>:    mov    %rsi,0x18(%rsp)
   0x00000000006e807f <+63>:    mov    %esi,0x30(%rsp)

323       stringstream name;
324       _dump_op_descriptor_unlocked(name);
   0x00000000006e8265 <+549>:   mov    0x10(%rsp),%rcx
   0x00000000006e826a <+554>:   mov    (%r12),%rax
   0x00000000006e826e <+558>:   mov    %r12,%rdi
   0x00000000006e8271 <+561>:   lea    0x10(%rcx),%rsi
   0x00000000006e8275 <+565>:   callq  *0x10(%rax)

325       f->dump_string("description", name.str().c_str()); // this TrackedOp
   0x00000000006e8278 <+568>:   mov    0x0(%rbp),%rax     <----------HERE

Breakpoint 1, TrackedOp::dump (this=0x623fc00, now=now@entry=..., f=f@entry=0x5614680) at common/TrackedOp.cc:322
322     {
(gdb) n
...
(gdb) n
325       f->dump_string("description", name.str().c_str()); // this TrackedOp
(gdb) i r rbp
rbp            0x5614680        0x5614680
(gdb) p f
$4 = (ceph::Formatter *) 0x5614680

So this is a problem with the formatter being invalid IMHO.

#13 Updated by Samuel Just over 1 year ago

  • Status changed from Can't reproduce to New
  • Assignee changed from Somnath Roy to David Zafman

David, can you take a look?

#14 Updated by David Zafman over 1 year ago

I wasn't able to reproduce.

Based on the location of what I assume is a segfault the value of register bp was corrupted when getting popped off the stack coming back from OpRequest::_dump_op_descriptor_unlocked().

Another way to look at the latest stack is that we actually crashed while still still in _dump_op_descriptor_unlocked() which peeks at the value of Message * without holding locks. If a Message were to get reused and some random Message * was attempting to call print() that could cause the problem Maybe gdb is unable to interpret the stack there so it isn't showing the other function names in frames 6, 7 and 8.

5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7f4f5d6009d5]
6: (()+0x5e946) [0x7f4f5d5fe946]
7: (()+0x5e973) [0x7f4f5d5fe973]
8: (()+0x5f4df) [0x7f4f5d5ff4df]
9: (TrackedOp::dump(utime_t, ceph::Formatter*) const+0x238) [0x6e8278]

#15 Updated by David Zafman about 1 year ago

  • Status changed from New to In Progress

I have been able to reproduce and found an additional locking issue. Still trying to find and fix the issue described here.

#16 Updated by David Zafman about 1 year ago

#0 0x00007f5353fe3cc9 in _GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f5353fe70d8 in __GI_abort () at abort.c:89
#2 0x00007f53548ee535 in __gnu_cxx::
_verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f53548ec6d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f53548ec703 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007f53548ed1bf in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007f535821525d in TrackedOp::dump (this=0x7f52fc1aafe0, now=..., now@entry=..., f=f@entry=0x7f5344000e50) at common/TrackedOp.cc:319
#7 0x00007f5358216410 in OpTracker::dump_ops_in_flight (this=0x7f5361c68638, f=0x7f5344000e50, print_only_blocked=false) at common/TrackedOp.cc:110
#8 0x00007f5357c36811 in OSD::asok_command (this=0x7f5361c675d0, command="dump_ops_in_flight", cmdmap=std::map with 1 elements = {...}, format="json-pretty", ss=...) at osd/OSD.cc:1715
#9 0x00007f5357ca50e8 in OSDSocketHook::call (this=0x7f5361cba1d0, command="dump_ops_in_flight", cmdmap=std::map with 1 elements = {...}, format=..., out=...) at osd/OSD.cc:1683
#10 0x00007f535833f254 in AdminSocket::do_accept (this=0x7f5361bdc040) at common/admin_socket.cc:399
#11 0x00007f535833e358 in AdminSocket::entry (this=0x7f5361bdc040) at common/admin_socket.cc:280
#12 0x00007f5358329096 in Thread::entry_wrapper (this=0x7f5361bdc040) at common/Thread.cc:84
#13 0x00007f5358328fe4 in Thread::_entry_func (arg=0x7f5361bdc040) at common/Thread.cc:66
#14 0x00007f5355f79182 in start_thread (arg=0x7f5351ff6700) at pthread_create.c:312
#15 0x00007f53540a747d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

This shows that a racing unregister for 0x7f52fc1aafe0 which is waiting correctly behind a lock. However, at this point the Message print() method can't be called because presumably the destructor has been called.

#0 lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f5355f7b657 in L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f5355f7b480 in __GI
_pthread_mutex_lock (mutex=0x7f5361c6f4a0) at ../nptl/pthread_mutex_lock.c:79
#3 0x00007f53582f3bb8 in Mutex::Lock (this=0x7f5361c6f490, no_lockdep=false) at common/Mutex.cc:110
#4 0x00007f5358212335 in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:115
#5 OpTracker::unregister_inflight_op (this=0x7f5361c68638, i=i@entry=0x7f52fc1aafe0) at common/TrackedOp.cc:151
#6 0x00007f53582127c5 in OpTracker::RemoveOnDelete::operator() (this=0x7f52fc16e978, op=0x7f52fc1aafe0) at common/TrackedOp.cc:297
#7 0x00007f5357c20ce9 in std::_Sp_counted_base<(
_gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f52fc16e960) at /usr/include/c++/4.8/bits/shared_ptr_base.h:144
#8 0x00007f5357c91946 in ~_shared_count (this=0x7f52b8ff7968, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:546
#9 ~
_shared_ptr (this=0x7f52b8ff7960, _in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr_base.h:781
#10 ~shared_ptr (this=0x7f52b8ff7960, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/shared_ptr.h:93
#11 internal_visit<std::shared_ptr<OpRequest> > (this=<optimized out>, operand=std::shared_ptr (expired, weak 1) 0x7f52fc1aafe0) at /usr/include/boost/variant/variant.hpp:315
#12 visitation_impl_invoke_impl<boost::detail::variant::destroyer, void*, std::shared_ptr<OpRequest> > (visitor=..., storage=0x7f52b8ff7960) at /usr/include/boost/variant/detail/visitation_impl.hpp:130
#13 visitation_impl_invoke<boost::detail::variant::destroyer, void*, std::shared_ptr<OpRequest>, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub>::has_fallback_type_> (internal_which=<optimized out>, visitor=..., t=0x0, storage=0x7f52b8ff7960) at /usr/include/boost/variant/detail/visitation_impl.hpp:173
#14 visitation_impl<mpl
::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<3l>, std::shared_ptr<OpRequest>, boost::mpl::l_item<mpl_::long_<2l>, PGSnapTrim, boost::mpl::l_item<mpl_::long_<1l>, PGScrub, boost::mpl::l_end> > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::destroyer, void*, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub>::has_fallback_type_> (no_backup_flag=..., storage=0x7f52b8ff7960, visitor=<synthetic pointer>, logical_which=<optimized out>, internal_which=<optimized out>) at /usr/include/boost/variant/detail/visitation_impl.hpp:256
#15 internal_apply_visitor_impl<boost::detail::variant::destroyer, void*> (storage=0x7f52b8ff7960, visitor=<synthetic pointer>, logical_which=<optimized out>, internal_which=<optimized out>) at /usr/include/boost/variant/variant.hpp:2326
#16 internal_apply_visitor<boost::detail::variant::destroyer> (visitor=<synthetic pointer>, this=0x7f52b8ff7958) at /usr/include/boost/variant/variant.hpp:2337
#17 destroy_content (this=0x7f52b8ff7958) at /usr/include/boost/variant/variant.hpp:1373
#18 ~variant (this=0x7f52b8ff7958, __in_chrg=<optimized out>) at /usr/include/boost/variant/variant.hpp:1380
#19 ~PGQueueable (this=0x7f52b8ff7958, __in_chrg=<optimized out>) at osd/OSD.h:347
#20 std::pair<boost::intrusive_ptr<PG>, PGQueueable>::~pair (this=0x7f52b8ff7950, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/stl_pair.h:96
#21 0x00007f5357c57288 in OSD::ShardedOpWQ::_process (this=0x7f5361c687b0, thread_index=<optimized out>, hb=<optimized out>) at osd/OSD.cc:8529
#22 0x00007f53583380dd in ShardedThreadPool::shardedthreadpool_worker (this=0x7f5361c67d38, thread_index=7) at common/WorkQueue.cc:340
#23 0x00007f5358339a9f in ShardedThreadPool::WorkThreadSharded::entry (this=0x7f5361c962b0) at common/WorkQueue.h:679
#24 0x00007f5358329096 in Thread::entry_wrapper (this=0x7f5361c962b0) at common/Thread.cc:84
#25 0x00007f5358328fe4 in Thread::_entry_func (arg=0x7f5361c962b0) at common/Thread.cc:66
#26 0x00007f5355f79182 in start_thread (arg=0x7f52b8ff9700) at pthread_create.c:312
#27 0x00007f53540a747d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

#17 Updated by David Zafman about 1 year ago

Found it! It isn't a destructor race, but rather a register_inflight_op() being called out of the OpRequest constructor. After the register_inflight_op() completes, but before the constructor OpRequest::OpRequest() finishes, a dump won't be able to call the virtual function OpRequest:: _dump_op_descriptor_unlocked().

This is a breakpoint on OpTracker::register_inflight_op() just to show the how things are getting constructed.

#0 OpTracker::register_inflight_op (this=0x7f7bd8831d98, i=i@entry=0x7f7b7402cac8) at common/TrackedOp.cc:125
#1 0x00007f7bce3bcb3f in TrackedOp (initiated=..., _tracker=0x7f7bd8831d98, this=0x7f7b7402cac0) at ./common/TrackedOp.h:173
#2 OpRequest::OpRequest (this=0x7f7b7402cac0, req=0x7f7b74354090, tracker=0x7f7bd8831d98) at osd/OpRequest.cc:31
#3 0x00007f7bce34744f in create_request<OpRequest, Message*> (params=0x7f7b74354090, this=0x7f7bd8831d98) at ./common/TrackedOp.h:141
#4 OSD::ms_fast_dispatch (this=0x7f7bd8830d30, m=0x7f7b74354090) at osd/OSD.cc:5776
#5 0x00007f7bcea3d226 in ms_fast_dispatch (m=0x7f7b74354090, this=0x7f7bd87d66f0) at ./msg/Messenger.h:555
#6 DispatchQueue::fast_dispatch (this=0x7f7bd87d68b8, m=0x7f7b74354090) at msg/simple/DispatchQueue.cc:71
#7 0x00007f7bcea6cf59 in Pipe::reader (this=0x7f7bbc00a140) at msg/simple/Pipe.cc:1650
#8 0x00007f7bcea7460d in Pipe::Reader::entry (this=<optimized out>) at msg/simple/Pipe.h:52
#9 0x00007f7bccec9182 in start_thread (arg=0x7f7b28aed700) at pthread_create.c:312
#10 0x00007f7bcaff747d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

#18 Updated by David Zafman about 1 year ago

  • Status changed from In Progress to Need Review

#19 Updated by Sage Weil about 1 year ago

  • Related to Bug #14598: osd: pure virtual method called, abrt in TrackedOp::dump added

#20 Updated by Sage Weil about 1 year ago

  • Status changed from Need Review to Resolved

#21 Updated by David Zafman 12 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to hammer

#22 Updated by David Zafman 12 months ago

  • Copied to Backport #16225: hammer: SIGABRT in TrackedOp::dump() via dump_ops_in_flight() added

#25 Updated by David Zafman 6 months ago

  • Needs Doc set to No

bc3dcd22bb2c6ce0370632baf99f644636afb05b

https://github.com/ceph/ceph/pull/8044

#26 Updated by Nathan Cutler 5 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF