Project

General

Profile

Bug #52124

Invalid read of size 8 in handle_recovery_delete()

Added by Neha Ojha over 2 years ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

100%

Source:
Community (dev)
Tags:
medium-hanging-fruit backport_processed
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

<error>
  <unique>0x55a2b5</unique>
  <tid>54</tid>
  <threadname>tp_osd_tp</threadname>
  <kind>InvalidRead</kind>
  <what>Invalid read of size 8</what>
  <stack>
    <frame>
      <ip>0x156A06D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::common::RefCountedObject::put() const</fn>
      <dir>/usr/src/debug/ceph-17.0.0-6641.g626e0d0d.el8.x86_64/src/common</dir>
      <file>RefCountedObj.cc</file>
      <line>18</line>
    </frame>
    <frame>
      <ip>0x1000B60</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>intrusive_ptr_release</fn>
      <dir>/usr/src/debug/ceph-17.0.0-6641.g626e0d0d.el8.x86_64/src/common</dir>
      <file>RefCountedObj.h</file>
      <line>194</line>
    </frame>
    <frame>
      <ip>0x1000B60</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>~intrusive_ptr</fn>
      <dir>/usr/src/debug/ceph-17.0.0-6641.g626e0d0d.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/smart_ptr</dir>
      <file>intrusive_ptr.hpp</file>
      <line>98</line>
    </frame>
    <frame>
      <ip>0x1000B60</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>~&lt;lambda&gt;</fn>
      <dir>/usr/src/debug/ceph-17.0.0-6641.g626e0d0d.el8.x86_64/src/osd</dir>
      <file>PGBackend.cc</file>
      <line>156</line>
...
  <auxwhat>Address 0x17592060 is 16 bytes inside a block of size 376 free'd</auxwhat>
  <stack>
    <frame>
      <ip>0x4C3210C</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>free</fn>
      <dir>/builddir/build/BUILD/valgrind-3.16.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>538</line>
    </frame>
    <frame>
      <ip>0x100F93E</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>MOSDPGRecoveryDeleteReply::~MOSDPGRecoveryDeleteReply()</fn>
      <dir>/usr/src/debug/ceph-17.0.0-6641.g626e0d0d.el8.x86_64/src/messages</dir>
      <file>MOSDPGRecoveryDeleteReply.h</file>
      <line>9</line>

/a/nojha-2021-08-09_20:10:33-rados-wip_gbenhano_ncbz-distro-basic-smithi/6328976/remote/smithi049/log/valgrind


Related issues

Copied to RADOS - Backport #57076: pacific: Invalid read of size 8 in handle_recovery_delete() Resolved
Copied to RADOS - Backport #57496: quincy: Invalid read of size 8 in handle_recovery_delete() Resolved

History

#1 Updated by Neha Ojha over 2 years ago

/a/yuriw-2021-08-26_18:40:53-rados-wip-yuri7-testing-2021-08-26-0841-distro-basic-smithi/6360450/remote/smithi052/log/valgrind

#2 Updated by Neha Ojha over 2 years ago

  • Backport set to pacific

/a/yuriw-2021-08-31_22:30:47-rados-wip-yuri8-testing-2021-08-30-0930-pacific-distro-basic-smithi/6369129/remote/smithi133/log/valgrind

#3 Updated by Neha Ojha over 2 years ago

/a/yuriw-2021-10-21_13:40:38-rados-wip-yuri2-testing-2021-10-20-1700-pacific-distro-basic-smithi/6454961/remote/smithi191/log/valgrind

#4 Updated by Sridhar Seshasayee over 2 years ago

/a/yuriw-2021-12-07_16:02:55-rados-wip-yuri11-testing-2021-12-06-1619-distro-default-smithi/6550873

#5 Updated by Laura Flores over 2 years ago

/a/yuriw-2021-12-22_22:11:35-rados-wip-yuri3-testing-2021-12-22-1047-distro-default-smithi/6580436

#6 Updated by Laura Flores over 2 years ago

/a/yuriw-2021-12-22_22:11:35-rados-wip-yuri3-testing-2021-12-22-1047-distro-default-smithi/6580187

#7 Updated by Sridhar Seshasayee about 2 years ago

/a/yuriw-2022-01-08_17:57:43-rados-wip-yuri8-testing-2022-01-07-1541-distro-default-smithi/6603232

#8 Updated by Laura Flores about 2 years ago

/a/yuriw-2022-01-11_19:17:55-rados-wip-yuri5-testing-2022-01-11-0843-distro-default-smithi/6608445/

#9 Updated by Laura Flores about 2 years ago

/a/yuriw-2022-02-08_17:00:23-rados-wip-yuri5-testing-2022-02-08-0733-pacific-distro-default-smithi/6670360

#10 Updated by Neha Ojha about 2 years ago

  • Backport changed from pacific to pacific,quincy

#11 Updated by Neha Ojha about 2 years ago

/a/yuriw-2022-02-15_22:40:39-rados-wip-yuri7-testing-2022-02-15-1102-quincy-distro-default-smithi/6686655/remote/smithi062/log/valgrind

#12 Updated by Laura Flores about 2 years ago

/a/yuriw-2022-02-16_15:53:49-rados-wip-yuri11-testing-2022-02-15-1643-distro-default-smithi/6688846

#13 Updated by Laura Flores about 2 years ago

/a/yuriw-2022-02-21_18:20:15-rados-wip-yuri11-testing-2022-02-21-0831-quincy-distro-default-smithi/6699270

Happened this time on osd.1, which exited after osd.2:

lflores@teuthology:/a/yuriw-2022-02-21_18:20:15-rados-wip-yuri11-testing-2022-02-21-0831-quincy-distro-default-smithi/6699270$ cat teuthology.log | grep "Exit program on first error" 
2022-02-22T00:25:28.903 INFO:tasks.ceph.osd.2.smithi039.stderr:==00:00:20:56.306 37217== Exit program on first error (--exit-on-first-error=yes)
2022-02-22T00:29:11.442 INFO:tasks.ceph.osd.1.smithi039.stderr:==00:00:19:15.980 122658== Exit program on first error (--exit-on-first-error=yes)
2022-02-22T00:33:38.817 INFO:tasks.ceph.osd.0.smithi039.stderr:==00:00:29:06.358 37214== Exit program on first error (--exit-on-first-error=yes)
2022-02-22T00:33:39.373 INFO:tasks.ceph.osd.6.smithi065.stderr:==00:00:29:06.876 37038== Exit program on first error (--exit-on-first-error=yes)
2022-02-22T00:33:39.436 INFO:tasks.ceph.osd.5.smithi065.stderr:==00:00:29:06.918 37040== Exit program on first error (--exit-on-first-error=yes)

#14 Updated by Laura Flores about 2 years ago

Happened in a dead job.
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698528
/a/yuriw-2022-02-22_16:14:07-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6700753

#15 Updated by Radoslaw Zarzynski about 2 years ago

  • Tags set to low-hanging-fruit

#16 Updated by Laura Flores about 2 years ago

/a/yuriw-2022-02-15_16:22:25-rados-wip-yuri6-testing-2022-02-14-1456-distro-default-smithi/6685226

#17 Updated by Laura Flores about 2 years ago

/a/yuriw-2022-03-01_22:42:19-rados-wip-yuri4-testing-2022-03-01-1206-distro-default-smithi/6715365

#18 Updated by Laura Flores about 2 years ago

/a/yuriw-2022-03-19_14:37:23-rados-wip-yuri6-testing-2022-03-18-1104-distro-default-smithi/6746705

#19 Updated by Laura Flores about 2 years ago

/a/yuriw-2022-03-25_18:42:52-rados-wip-yuri7-testing-2022-03-24-1341-pacific-distro-default-smithi/6761328

#20 Updated by Aishwarya Mathuria almost 2 years ago

/a/yuriw-2022-03-31_21:45:19-rados-wip-yuri5-testing-2022-03-31-1158-quincy-distro-default-smithi/6770388

#21 Updated by Nitzan Mordechai almost 2 years ago

  • Status changed from New to In Progress
  • Assignee set to Nitzan Mordechai

#22 Updated by Aishwarya Mathuria almost 2 years ago

/a/yuriw-2022-04-06_16:35:43-rados-wip-yuri5-testing-2022-04-05-1720-distro-default-smithi/6779876

#23 Updated by Laura Flores almost 2 years ago

/a/yuriw-2022-06-10_03:10:47-rados-wip-yuri4-testing-2022-06-09-1510-quincy-distro-default-smithi/6872050

#24 Updated by Sridhar Seshasayee almost 2 years ago

/a/yuriw-2022-06-15_18:29:33-rados-wip-yuri4-testing-2022-06-15-1000-pacific-distro-default-smithi/6881215

#25 Updated by Aishwarya Mathuria over 1 year ago

/a/yuriw-2022-07-13_19:41:18-rados-wip-yuri7-testing-2022-07-11-1631-distro-default-smithi/6929396/remote/smithi204/log/valgrind/osd.5.log.gz

#26 Updated by Radoslaw Zarzynski over 1 year ago

  • Tags changed from low-hanging-fruit to medium-hanging-fruit

Looks like a race condition. Does our a Context makes a dependency on RefCountedObj (e.g. TrackedOp) but forgets to extend its life-time?

<error>
  <unique>0xe9deb</unique>
  <tid>60</tid>
  <threadname>tp_osd_tp</threadname>
  <kind>InvalidRead</kind>
  <what>Invalid read of size 8</what>
  <stack>
    <frame>
      <ip>0xF53F93</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ceph::common::RefCountedObject::put() const</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/common</dir>
      <file>RefCountedObj.cc</file>
      <line>18</line>
    </frame>
    <frame>
      <ip>0xA2112D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/common</dir>
      <file>RefCountedObj.h</file>
      <line>194</line>
    </frame>
    <frame>
      <ip>0xA2112D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>~intrusive_ptr</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/smart_ptr</dir>
      <file>intrusive_ptr.hpp</file>
      <line>98</line>
    </frame>
    <frame>
      <ip>0xA2112D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>~&lt;lambda&gt;</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/osd</dir>
      <file>PGBackend.cc</file>
      <line>157</line>
    </frame>
    <frame>
      <ip>0xA2112D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>~LambdaContext</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>161</line>
    </frame>
    <frame>
      <ip>0xA20ACA</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>delete_me</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>343</line>
    </frame>
    <frame>
      <ip>0xA20ACA</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>sub_finish</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>338</line>
    </frame>
    <frame>
      <ip>0xA20ACA</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>C_GatherBase&lt;Context, Context&gt;::sub_finish(Context*, int)</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>319</line>
    </frame>
    <frame>
      <ip>0xA20F04</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>finish</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>361</line>
    </frame>
    <frame>
      <ip>0xA20F04</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>99</line>
    </frame>
    <frame>
      <ip>0xA20F04</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>C_GatherBase&lt;Context, Context&gt;::C_GatherSub::complete(int)</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>358</line>
    </frame>
    <frame>
      <ip>0x93AD90</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>PrimaryLogPG::remove_missing_object(hobject_t const&amp;, eversion_t, Context*)::{lambda(int)#2}::operator()(int) const [clone .isra.6767]</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/osd</dir>
      <file>PrimaryLogPG.cc</file>
      <line>12416</line>
    </frame>
    <frame>
      <ip>0x8701BC</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>Context::complete(int)</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>99</line>
    </frame>
    <frame>
      <ip>0x9CD558</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>155</line>
    </frame>
    <frame>
      <ip>0x9CD558</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>destroy&lt;RunOnDelete&gt;</fn>
      <dir>/usr/include/c++/8/ext</dir>
      <file>new_allocator.h</file>
      <line>140</line>
    </frame>
    <frame>
      <ip>0x9CD558</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>destroy&lt;RunOnDelete&gt;</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>alloc_traits.h</file>
      <line>487</line>
    </frame>
    <frame>
      <ip>0x9CD558</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>std::_Sp_counted_ptr_inplace&lt;RunOnDelete, std::allocator&lt;RunOnDelete&gt;, (__gnu_cxx::_Lock_policy)2&gt;::_M_dispose()</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>shared_ptr_base.h</file>
      <line>554</line>
    </frame>
    <frame>
      <ip>0x9D3F86</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>shared_ptr_base.h</file>
      <line>155</line>
    </frame>
    <frame>
      <ip>0x9D3F86</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>shared_ptr_base.h</file>
      <line>148</line>
    </frame>
    <frame>
      <ip>0x9D3F86</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>~__shared_count</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>shared_ptr_base.h</file>
      <line>728</line>
    </frame>
    <frame>
      <ip>0x9D3F86</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>~__shared_ptr</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>shared_ptr_base.h</file>
      <line>1167</line>
    </frame>
    <frame>
      <ip>0x9D3F86</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>~shared_ptr</fn>
      <dir>/usr/include/c++/8/bits</dir>
      <file>shared_ptr.h</file>
      <line>103</line>
    </frame>
    <frame>
      <ip>0x9D3F86</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>~ContainerContext</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>129</line>
    </frame>
    <frame>
      <ip>0x9D3F86</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ContainerContext&lt;std::shared_ptr&lt;RunOnDelete&gt; &gt;::~ContainerContext()</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/include</dir>
      <file>Context.h</file>
      <line>129</line>
    </frame>
    <frame>
      <ip>0x851C23</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>handle_oncommits</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/osd</dir>
      <file>OSD.h</file>
      <line>1671</line>
    </frame>
    <frame>
      <ip>0x851C23</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/osd</dir>
      <file>OSD.cc</file>
      <line>10897</line>
    </frame>
    <frame>
      <ip>0xF701A3</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::shardedthreadpool_worker(unsigned int)</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/common</dir>
      <file>WorkQueue.cc</file>
      <line>313</line>
    </frame>
    <frame>
      <ip>0xF71543</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>ShardedThreadPool::WorkThreadSharded::entry()</fn>
      <dir>/usr/src/debug/ceph-17.0.0-13509.g5b6cadda.el8.x86_64/src/common</dir>
      <file>WorkQueue.h</file>
      <line>643</line>
    </frame>
    <frame>
      <ip>0x6D6D1C9</ip>
      <obj>/usr/lib64/libpthread-2.28.so</obj>
      <fn>start_thread</fn>
    </frame>
    <frame>
      <ip>0x7FBFDD2</ip>
      <obj>/usr/lib64/libc-2.28.so</obj>
      <fn>clone</fn>
    </frame>
  </stack>

#27 Updated by Radoslaw Zarzynski over 1 year ago

Moving to next week's bug scrub.

#28 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943721/remote/smithi042/log/valgrind/osd.3.log.gz

#29 Updated by Nitzan Mordechai over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 47379

#30 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

/a/yuriw-2022-08-04_11:58:29-rados-wip-yuri3-testing-2022-08-03-0828-pacific-distro-default-smithi/6958376

#31 Updated by Kefu Chai over 1 year ago

  • Status changed from Fix Under Review to Pending Backport

#32 Updated by Backport Bot over 1 year ago

  • Copied to Backport #57076: pacific: Invalid read of size 8 in handle_recovery_delete() added

#33 Updated by Backport Bot over 1 year ago

  • Tags changed from medium-hanging-fruit to medium-hanging-fruit backport_processed

#34 Updated by Laura Flores over 1 year ago

/a/yuriw-2022-09-05_13:59:13-rados-wip-yuri10-testing-2022-09-04-0811-quincy-distro-default-smithi/7012481

Needs a Quincy backport.

#35 Updated by Nitzan Mordechai over 1 year ago

  • Copied to Backport #57496: quincy: Invalid read of size 8 in handle_recovery_delete() added

#36 Updated by Konstantin Shalygin 3 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
  • Source set to Community (dev)

Also available in: Atom PDF