Project

General

Profile

Actions

Bug #16908

closed

InvalidRead in OSD

Added by John Spray over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seen on master today:

<error>
  <unique>0x0</unique>
  <tid>49</tid>
  <threadname>tp_osd_tp</threadname>
  <kind>InvalidRead</kind>
  <what>Invalid read of size 8</what>
  <stack>
    <frame>
      <ip>0x8EF764</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_op(std::shared_ptr&lt;OpRequest&gt;&amp;)</fn>
    </frame>
    <frame>
      <ip>0x8AB73B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_request(std::shared_ptr&lt;OpRequest&gt;&amp;, ThreadPool::TPHandle&amp;)</fn>
    </frame>
    <frame>
      <ip>0x763FF4</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&amp;)</fn>
    </frame>
    <frame>
      <ip>0x76421C</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PGQueueable::RunVis::operator()(std::shared_ptr&lt;OpRequest&gt; const&amp;)</fn>
    </frame>
    <frame>
      <ip>0x78523B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)</fn>
    </frame>
    <frame>
      <ip>0xD5DD64</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::shardedthreadpool_worker(unsigned int)</fn>
    </frame>
    <frame>
      <ip>0xD5FEBF</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::WorkThreadSharded::entry()</fn>
    </frame>
    <frame>
      <ip>0xBD6F181</ip>
      <obj>/lib/x86_64-linux-gnu/libpthread-2.19.so</obj>
      <fn>start_thread</fn>
      <dir>/build/buildd/eglibc-2.19/nptl</dir>
      <file>pthread_create.c</file>
      <line>312</line>
    </frame>
    <frame>
      <ip>0xD22247C</ip>
      <obj>/lib/x86_64-linux-gnu/libc-2.19.so</obj>
      <fn>clone</fn>
      <dir>/build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64</dir>
      <file>clone.S</file>
      <line>111</line>
    </frame>
  </stack>
  <auxwhat>Address 0x353022b8 is 1,464 bytes inside a block of size 1,632 free'd</auxwhat>
  <stack>
    <frame>
      <ip>0xA22C2BC</ip>
      <obj>/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator delete(void*)</fn>
    </frame>
    <frame>
      <ip>0x8EB9E6</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)</fn>
    </frame>
    <frame>
      <ip>0x8EF6FE</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_op(std::shared_ptr&lt;OpRequest&gt;&amp;)</fn>
    </frame>
    <frame>
      <ip>0x8AB73B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_request(std::shared_ptr&lt;OpRequest&gt;&amp;, ThreadPool::TPHandle&amp;)</fn>
    </frame>
    <frame>
      <ip>0x763FF4</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&amp;)</fn>
    </frame>
    <frame>
      <ip>0x76421C</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PGQueueable::RunVis::operator()(std::shared_ptr&lt;OpRequest&gt; const&amp;)</fn>
    </frame>
    <frame>
      <ip>0x78523B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)</fn>
    </frame>
    <frame>
      <ip>0xD5DD64</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::shardedthreadpool_worker(unsigned int)</fn>
    </frame>
    <frame>
      <ip>0xD5FEBF</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::WorkThreadSharded::entry()</fn>
    </frame>
    <frame>
      <ip>0xBD6F181</ip>
      <obj>/lib/x86_64-linux-gnu/libpthread-2.19.so</obj>
      <fn>start_thread</fn>
      <dir>/build/buildd/eglibc-2.19/nptl</dir>
      <file>pthread_create.c</file>
      <line>312</line>
    </frame>
    <frame>
      <ip>0xD22247C</ip>
      <obj>/lib/x86_64-linux-gnu/libc-2.19.so</obj>
      <fn>clone</fn>
      <dir>/build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64</dir>
      <file>clone.S</file>
      <line>111</line>
    </frame>
  </stack>
</error>

http://qa-proxy.ceph.com/teuthology/jspray-2016-08-03_04:47:50-fs:verify-master-distro-basic-mira/347556/

Actions #1

Updated by Samuel Just over 7 years ago

sjust@teuthology:/a/dzafman-2016-08-03_16:16:17-rados-wip-zafman-testing-distro-basic-smithi/348187/remote

Actions #2

Updated by Samuel Just over 7 years ago

  • Priority changed from High to Urgent
Actions #3

Updated by Josh Durgin over 7 years ago

Still occurring - with a longer backtrace in /a/yuriw-2016-08-20_15:36:43-rados-master_2016_08_19-distro-basic-smithi/375963/remote/smithi026/log/valgrind/osd.0.log.gz:


<error>
  <unique>0x1</unique>
  <tid>55</tid>
  <threadname>tp_osd_tp</threadname>
  <kind>InvalidRead</kind>
  <what>Invalid read of size 8</what>
  <stack>
    <frame>
      <ip>0x6607A2</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_op(std::shared_ptr&lt;OpRequest&gt;&amp;)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>2145</line>
    </frame>
    <frame>
      <ip>0x61BD26</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_request(std::shared_ptr&lt;OpRequest&gt;&amp;, ThreadPool::TPHandle&amp;)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>1480</line>
    </frame>
    <frame>
      <ip>0x4CADFC</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&amp;)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>OSD.cc</file>
      <line>8867</line>
    </frame>
    <frame>
      <ip>0x4CB04C</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PGQueueable::RunVis::operator()(std::shared_ptr&lt;OpRequest&gt; const&amp;)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>OSD.cc</file>
      <line>163</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>internal_visit&lt;std::shared_ptr&lt;OpRequest&gt; &gt;</fn>
      <dir>/usr/include/boost/variant</dir>
      <file>variant.hpp</file>
      <line>1017</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>visitation_impl_invoke_impl&lt;boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt;, void*, std::shared_ptr&lt;OpRequest&gt; &gt;</fn>
      <dir>/usr/include/boost/variant/detail</dir>
      <file>visitation_impl.hpp</file>
      <line>130</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>visitation_impl_invoke&lt;boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt;, void*, std::shared_ptr&lt;OpRequest&gt;, boost::variant&lt;std::shared_ptr&lt;OpRequest&gt;, PGSnapTrim, PGScrub, PGRecovery&gt;::has_fallback_type_&gt;</fn>
      <dir>/usr/include/boost/variant/detail</dir>
      <file>visitation_impl.hpp</file>
      <line>173</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>visitation_impl&lt;mpl_::int_&lt;0&gt;, boost::detail::variant::visitation_impl_step&lt;boost::mpl::l_iter&lt;boost::mpl::l_item&lt;mpl_::long_&lt;4l&gt;, std::shared_ptr&lt;OpRequest&gt;, boost::mpl::l_item&lt;mpl_::long_&lt;3l&gt;, PGSnapTrim, boost::mpl::l_item&lt;mpl_::long_&lt;2l&gt;, PGScrub, boost::mpl::l_item&lt;mpl_::long_&lt;1l&gt;, PGRecovery, boost::mpl::l_end&gt; &gt; &gt; &gt; &gt;, boost::mpl::l_iter&lt;boost::mpl::l_end&gt; &gt;, boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt;, void*, boost::variant&lt;std::shared_ptr&lt;OpRequest&gt;, PGSnapTrim, PGScrub, PGRecovery&gt;::has_fallback_type_&gt;</fn>
      <dir>/usr/include/boost/variant/detail</dir>
      <file>visitation_impl.hpp</file>
      <line>256</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>internal_apply_visitor_impl&lt;boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt;, void*&gt;</fn>
      <dir>/usr/include/boost/variant</dir>
      <file>variant.hpp</file>
      <line>2326</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>internal_apply_visitor&lt;boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt; &gt;</fn>
      <dir>/usr/include/boost/variant</dir>
      <file>variant.hpp</file>
      <line>2337</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>apply_visitor&lt;PGQueueable::RunVis&gt;</fn>
      <dir>/usr/include/boost/variant</dir>
      <file>variant.hpp</file>
      <line>2360</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>apply_visitor&lt;PGQueueable::RunVis, boost::variant&lt;std::shared_ptr&lt;OpRequest&gt;, PGSnapTrim, PGScrub, PGRecovery&gt; &gt;</fn>
      <dir>/usr/include/boost/variant/detail</dir>
      <file>apply_visitor_unary.hpp</file>
      <line>60</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>run</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>OSD.h</file>
      <line>410</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>OSD.cc</file>
      <line>8748</line>
    </frame>
    <frame>
      <ip>0xAF94E6</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::shardedthreadpool_worker(unsigned int)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/common</dir>
      <file>WorkQueue.cc</file>
      <line>356</line>
    </frame>
    <frame>
      <ip>0xAFB63F</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::WorkThreadSharded::entry()</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/common</dir>
      <file>WorkQueue.h</file>
      <line>685</line>
    </frame>
    <frame>
      <ip>0xD0A7DC4</ip>
      <obj>/usr/lib64/libpthread-2.17.so</obj>
      <fn>start_thread</fn>
    </frame>
    <frame>
      <ip>0xE1F821C</ip>
      <obj>/usr/lib64/libc-2.17.so</obj>
      <fn>clone</fn>
    </frame>
  </stack>
  <auxwhat>Address 0x3ab72588 is 1,464 bytes inside a block of size 1,632 free'd</auxwhat>
  <stack>
    <frame>
      <ip>0x9FD9131</ip>
      <obj>/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator delete(void*)</fn>
    </frame>
    <frame>
      <ip>0x65D9C3</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>3015</line>
    </frame>
    <frame>
      <ip>0x66073F</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_op(std::shared_ptr&lt;OpRequest&gt;&amp;)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>2139</line>
    </frame>
    <frame>
      <ip>0x61BD26</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ReplicatedPG::do_request(std::shared_ptr&lt;OpRequest&gt;&amp;, ThreadPool::TPHandle&amp;)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>ReplicatedPG.cc</file>
      <line>1480</line>
    </frame>
    <frame>
      <ip>0x4CADFC</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, std::shared_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&amp;)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>OSD.cc</file>
      <line>8867</line>
    </frame>
    <frame>
      <ip>0x4CB04C</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PGQueueable::RunVis::operator()(std::shared_ptr&lt;OpRequest&gt; const&amp;)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>OSD.cc</file>
      <line>163</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>internal_visit&lt;std::shared_ptr&lt;OpRequest&gt; &gt;</fn>
      <dir>/usr/include/boost/variant</dir>
      <file>variant.hpp</file>
      <line>1017</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>visitation_impl_invoke_impl&lt;boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt;, void*, std::shared_ptr&lt;OpRequest&gt; &gt;</fn>
      <dir>/usr/include/boost/variant/detail</dir>
      <file>visitation_impl.hpp</file>
      <line>130</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>visitation_impl_invoke&lt;boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt;, void*, std::shared_ptr&lt;OpRequest&gt;, boost::variant&lt;std::shared_ptr&lt;OpRequest&gt;, PGSnapTrim, PGScrub, PGRecovery&gt;::has_fallback_type_&gt;</fn>
      <dir>/usr/include/boost/variant/detail</dir>
      <file>visitation_impl.hpp</file>
      <line>173</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>visitation_impl&lt;mpl_::int_&lt;0&gt;, boost::detail::variant::visitation_impl_step&lt;boost::mpl::l_iter&lt;boost::mpl::l_item&lt;mpl_::long_&lt;4l&gt;, std::shared_ptr&lt;OpRequest&gt;, boost::mpl::l_item&lt;mpl_::long_&lt;3l&gt;, PGSnapTrim, boost::mpl::l_item&lt;mpl_::long_&lt;2l&gt;, PGScrub, boost::mpl::l_item&lt;mpl_::long_&lt;1l&gt;, PGRecovery, boost::mpl::l_end&gt; &gt; &gt; &gt; &gt;, boost::mpl::l_iter&lt;boost::mpl::l_end&gt; &gt;, boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt;, void*, boost::variant&lt;std::shared_ptr&lt;OpRequest&gt;, PGSnapTrim, PGScrub, PGRecovery&gt;::has_fallback_type_&gt;</fn>
      <dir>/usr/include/boost/variant/detail</dir>
      <file>visitation_impl.hpp</file>
      <line>256</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>internal_apply_visitor_impl&lt;boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt;, void*&gt;</fn>
      <dir>/usr/include/boost/variant</dir>
      <file>variant.hpp</file>
      <line>2326</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>internal_apply_visitor&lt;boost::detail::variant::invoke_visitor&lt;PGQueueable::RunVis&gt; &gt;</fn>
      <dir>/usr/include/boost/variant</dir>
      <file>variant.hpp</file>
      <line>2337</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>apply_visitor&lt;PGQueueable::RunVis&gt;</fn>
      <dir>/usr/include/boost/variant</dir>
      <file>variant.hpp</file>
      <line>2360</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>apply_visitor&lt;PGQueueable::RunVis, boost::variant&lt;std::shared_ptr&lt;OpRequest&gt;, PGSnapTrim, PGScrub, PGRecovery&gt; &gt;</fn>
      <dir>/usr/include/boost/variant/detail</dir>
      <file>apply_visitor_unary.hpp</file>
      <line>60</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>run</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>OSD.h</file>
      <line>410</line>
    </frame>
    <frame>
      <ip>0x4EC66B</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/osd</dir>
      <file>OSD.cc</file>
      <line>8748</line>
    </frame>
    <frame>
      <ip>0xAF94E6</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::shardedthreadpool_worker(unsigned int)</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/common</dir>
      <file>WorkQueue.cc</file>
      <line>356</line>
    </frame>
    <frame>
      <ip>0xAFB63F</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::WorkThreadSharded::entry()</fn>
      <dir>/usr/src/debug/ceph-11.0.0/src/common</dir>
      <file>WorkQueue.h</file>
      <line>685</line>
    </frame>
    <frame>
      <ip>0xD0A7DC4</ip>
      <obj>/usr/lib64/libpthread-2.17.so</obj>
      <fn>start_thread</fn>
    </frame>
    <frame>
      <ip>0xE1F821C</ip>
      <obj>/usr/lib64/libc-2.17.so</obj>
      <fn>clone</fn>
    </frame>
  </stack>
</error>

Actions #4

Updated by Samuel Just over 7 years ago

The report from Josh seems to indicate that the issue is with the pending_reads access introduced in 52be772788d9d96accaa7af9eaf9f29a3792df49 .

Actions #5

Updated by Samuel Just over 7 years ago

Yeah, in the error path, execute_ctx closes and deletes the OpContext, so that's invalid.

Actions #6

Updated by Samuel Just over 7 years ago

  • Status changed from New to 7
  • Assignee set to Samuel Just
Actions #8

Updated by Samuel Just over 7 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF