Project

General

Profile

Actions

Bug #49963

closed

Crash in OSD::ms_fast_dispatch due to call to null vtable function

Added by Brad Hubbard about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Yes
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2021-03-24_06:13:24-upgrade:octopus-x-wip-sage-testing-2021-03-23-2309-distro-basic-smithi/5993446

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x000055e42b5b53ca in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:87
#2  handle_fatal_signal (signum=11) at ./src/global/signal_handler.cc:332
#3  <signal handler called>
#4  0x0000000000000000 in ?? ()
#5  0x000055e42af9f587 in OSD::ms_fast_dispatch (this=0x55e42fefa000, m=0x55e43026cb00) at ./src/osd/OSD.cc:7186
#6  0x000055e42b954323 in Dispatcher::ms_fast_dispatch2 (m=..., this=0x55e42fefa000) at ./src/msg/Dispatcher.h:84
#7  Messenger::ms_fast_dispatch (m=..., this=<optimized out>) at ./src/msg/Messenger.h:685
#8  DispatchQueue::fast_dispatch (this=0x55e42fd72c28, m=...) at ./src/msg/DispatchQueue.cc:74
#9  0x000055e42b98758b in DispatchQueue::fast_dispatch (m=0x55e43026cb00, this=<optimized out>) at ./src/msg/DispatchQueue.h:203
#10 ProtocolV2::handle_message (this=this@entry=0x55e43257c500) at ./src/msg/async/ProtocolV2.cc:1482
#11 0x000055e42b998d90 in ProtocolV2::handle_read_frame_dispatch (this=this@entry=0x55e43257c500) at ./src/msg/async/ProtocolV2.cc:1140
#12 0x000055e42b998ee9 in ProtocolV2::_handle_read_frame_epilogue_main (this=this@entry=0x55e43257c500) at ./src/msg/async/ProtocolV2.cc:1328
#13 0x000055e42b99a77e in ProtocolV2::handle_read_frame_epilogue_main (this=0x55e43257c500, buffer=..., r=0) at ./src/msg/async/ProtocolV2.cc:1303
#14 0x000055e42b982a74 in ProtocolV2::run_continuation (this=0x55e43257c500, continuation=...) at ./src/msg/async/ProtocolV2.cc:47
#15 0x000055e42b95bda8 in std::function<void (char*, long)>::operator()(char*, long) const (__args#1=<optimized out>, __args#0=<optimized out>, this=0x55e4324457a0) at /usr/include/c++/7/bits/std_function.h:706
#16 AsyncConnection::process (this=0x55e432445400) at ./src/msg/async/AsyncConnection.cc:454
#17 0x000055e42b7a29ed in EventCenter::process_events (this=this@entry=0x55e42f0cf4c0, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f40f17de668)
    at ./src/msg/async/Event.cc:422
#18 0x000055e42b7a7b80 in NetworkStack::<lambda()>::operator() (__closure=0x55e42f1a3238) at ./src/msg/async/Stack.cc:52
#19 std::_Function_handler<void(), NetworkStack::add_thread(Worker*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316
#20 0x00007f40f53536df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#21 0x00007f40f5c706db in start_thread (arg=0x7f40f17e1700) at pthread_create.c:463
#22 0x00007f40f4a1071f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
#4  0x0000000000000000 in ?? ()

That's a telltale that we called a null function.

7185      // note sender epoch, min req's epoch
7186      op->sent_epoch = static_cast<MOSDFastDispatchOp*>(m)->get_map_epoch();
   0x000055e42af9f57e <+670>:   mov    (%rbx),%rax
   0x000055e42af9f581 <+673>:   mov    %rbx,%rdi
   0x000055e42af9f584 <+676>:   callq  *0x48(%rax) <--------------- HERE

./obj-x86_64-linux-gnu/boost/include/boost/smart_ptr/intrusive_ptr.hpp:
199     ./obj-x86_64-linux-gnu/boost/include/boost/smart_ptr/intrusive_ptr.hpp: No such file or directory.
=> 0x000055e42af9f587 <+679>:   mov    -0xa8(%rbp),%rdx
   0x000055e42af9f58e <+686>:   test   %rdx,%rdx
   0x000055e42af9f591 <+689>:   je     0x55e42af9f93d <OSD::ms_fast_dispatch(Message*)+1629>

The issue is happening in the instruction before the instruction pointer at +676 which is common in my experience.

(gdb) info reg rax
rax            0x55e42c879cf8   94438487989496
(gdb) p/x 0x55e42c879cf8+0x48
$10 = 0x55e42c879d40
(gdb) x/a 0x55e42c879d40
0x55e42c879d40 <_ZTV12MOSDPGCreate>:    0x0
$ c++filt _ZTV12MOSDPGCreate
vtable for MOSDPGCreate

So at least that section of the vtable for the message, which was an MOSDPGCreate, was zeroed out.

It's curious that $rax hints at MOSDPGInfo.

(gdb) x/a $rax
0x55e42c879cf8 <_ZTV10MOSDPGInfo+16>:   0x55e42b747710 <MOSDPGInfo::~MOSDPGInfo()>
Actions

Also available in: Atom PDF