Project

General

Profile

Actions

Bug #64935

closed

crimson: heap use after free during ~OSD()

Added by Samuel Just about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/sjust-2024-03-14_07:56:00-crimson-rados-wip-sjust-cb2b48d7-2024-03-12-distro-default-smithi/7599925/ (osd 1)

DEBUG 2024-03-14 08:26:01,444 [shard 1:main] osd - pg_epoch 303 pg[2.3( empty local-lis/les=13/14 n=0 ec=13/13 lis/c=13/13 les/c/f=14/14/0 sis=13) [1,0] r=0 lpr=13 crt=0'0 mlcod 0'0 active+clean ScrubState::~ScrubState: exiting state ScrubMachine/PrimaryActive/AwaitScrub
DEBUG 2024-03-14 08:26:01,444 [shard 1:main] osd - pg_epoch 303 pg[2.3( empty local-lis/les=13/14 n=0 ec=13/13 lis/c=13/13 les/c/f=14/14/0 sis=13) [1,0] r=0 lpr=13 crt=0'0 mlcod 0'0 active+clean ScrubState::~ScrubState: exiting state ScrubMachine/PrimaryActive
DEBUG 2024-03-14 08:26:01,444 [shard 1:main] osd - pg_epoch 303 pg[2.3( empty local-lis/les=13/14 n=0 ec=13/13 lis/c=13/13 les/c/f=14/14/0 sis=13) [1,0] r=0 lpr=13 crt=0'0 mlcod 0'0 active+clean ScrubState::~ScrubState: exiting state ScrubMachine/PrimaryActive
DEBUG 2024-03-14 08:26:01,444 [shard 2:main] osd - pg_epoch 303 pg[2.0( empty local-lis/les=13/14 n=0 ec=13/13 lis/c=13/13 les/c/f=14/14/0 sis=13) [0,1] r=1 lpr=13 crt=0'0 mlcod 0'0 active ScrubState::~ScrubState: exiting state ScrubMachine/ReplicaActive/ReplicaIdle
DEBUG 2024-03-14 08:26:01,444 [shard 2:main] osd - pg_epoch 303 pg[2.0( empty local-lis/les=13/14 n=0 ec=13/13 lis/c=13/13 les/c/f=14/14/0 sis=13) [0,1] r=1 lpr=13 crt=0'0 mlcod 0'0 active ScrubState::~ScrubState: exiting state ScrubMachine/ReplicaActive
INFO 2024-03-14 08:26:01,446 [shard 0:main] ms - [0x611000114800 osd.1(cluster) v2:172.21.15.121:6804/2154232500 >> osd.2 v2:172.21.15.184:6804/2232779362@52965] closing: reset no, replace no
INFO 2024-03-14 08:26:01,446 [shard 0:main] ms - [0x6110000f16c0 osd.1(cluster) v2:172.21.15.121:6804/2154232500 >> osd.0 v2:172.21.15.121:6801/3108143794@58917] closing: reset no, replace no
INFO 2024-03-14 08:26:01,446 [shard 0:main] ms - [0x6110000f1300 osd.1(cluster) v2:172.21.15.121:6804/2154232500@51309 >> osd.3 v2:172.21.15.184:6809/979979926] closing: reset no, replace no
INFO 2024-03-14 08:26:01,446 [shard 0:main] ms - [0x6110000f1300 osd.1(cluster) v2:172.21.15.121:6804/2154232500@51309 >> osd.3 v2:172.21.15.184:6809/979979926] do_out_dispatch: stop(dropped) at drop, no out_exit_dispatching
INFO 2024-03-14 08:26:01,446 [shard 0:main] ms - [0x6110000f1300 osd.1(cluster) v2:172.21.15.121:6804/2154232500@51309 >> osd.3 v2:172.21.15.184:6809/979979926] execute_wait(): protocol aborted at CLOSING -- protocol aborted
INFO 2024-03-14 08:26:01,447 [shard 0:main] ms - [0x6110000f16c0 osd.1(cluster) v2:172.21.15.121:6804/2154232500 >> osd.0 v2:172.21.15.121:6801/3108143794@58917] do_in_dispatch(): fault at drop, io_stat(io_state=drop, in_seq=1816, out_seq=1753, out_pending_msgs_size=0, out_sent_msgs_size=0, need_ack=0, need_keepalive=0, need_keepalive_ack=0) -- read eof
INFO 2024-03-14 08:26:01,447 [shard 0:main] ms - [0x611000114800 osd.1(cluster) v2:172.21.15.121:6804/2154232500 >> osd.2 v2:172.21.15.184:6804/2232779362@52965] do_in_dispatch(): fault at drop, io_stat(io_state=drop, in_seq=1389, out_seq=1449, out_pending_msgs_size=0, out_sent_msgs_size=0, need_ack=0, need_keepalive=0, need_keepalive_ack=0) -- read eof
INFO 2024-03-14 08:26:01,447 [shard 0:main] osd - Heartbeat::Peer: osd.2 removed
INFO 2024-03-14 08:26:01,447 [shard 0:main] osd - Heartbeat::Peer: osd.0 removed =================================================================
28703ERROR: AddressSanitizer: heap-use-after-free on address 0x61b000001648 at pc 0x562e111c2cbb bp 0x7f60df2263c0 sp 0x7f60df2263b0
READ of size 8 at 0x61b000001648 thread T0
Reactor stalled for 65 ms on shard 0. Backtrace: 0x45d5d 0x2f511b12 0x2f2eed00 0x2f4bd104 0x2f2286b2 0x2f2fef70 0x2f437a76 0x54daf 0xf6cf4 0xf978a 0xfa410 0x1962d3 0x2eefec35 0xfa684 0xe9be9 0xe9cb5 0xdb165 0xdcfea 0xd6280 0x32402 0xbd907 0xbd194 0xbdfda 0x21909cba 0x2197aa7d 0x21b36883 0x216f4db1 0x1fda62f8 0x1fde7acc 0x202a7e92 0x1ff422ce 0x20063c03 0x2f7e329b
kernel callstack:
Reactor stalled for 122 ms on shard 0. Backtrace: 0x45d5d 0x2f511b12 0x2f2eed00 0x2f4bd104 0x2f2286b2 0x2f2fef70 0x2f437a76 0x54daf 0xf5dcf 0xf6c5b 0xf978a 0xfa410 0x1962d3 0x2eefec35 0xfa684 0xe9be9 0xe9cb5 0xdb165 0xdcfea 0xd6280 0x32402 0xbd907 0xbd194 0xbdfda 0x21909cba 0x2197aa7d 0x21b36883 0x216f4db1 0x1fda62f8 0x1fde7acc 0x202a7e92 0x1ff422ce 0x20063c03 0x2f7e329b
kernel callstack:
#0 0x562e111c2cba in std::_Rb_tree<unsigned int, std::pair<unsigned int const, std::pair<boost::weak_ptr<OSDMap>, OSDMap*> >, std::_Select1st<std::pair<unsigned int const, std::pair<boost::weak_ptr<OSDMap>, OSDMap*> > >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, std::pair<boost::weak_ptr<OSDMap>, OSDMap*> > > >::equal_range(unsigned int const&) (/usr/bin/ceph-osd+0x21909cba)
#1 0x562e11233a7d in std::_Rb_tree<unsigned int, std::pair<unsigned int const, std::pair<boost::weak_ptr<OSDMap>, OSDMap*> >, std::_Select1st<std::pair<unsigned int const, std::pair<boost::weak_ptr<OSDMap>, OSDMap*> > >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, std::pair<boost::weak_ptr<OSDMap>, OSDMap*> > > >::erase(unsigned int const&) (/usr/bin/ceph-osd+0x2197aa7d)
#2 0x562e113ef883 in boost::detail::sp_counted_impl_pd<OSDMap*, boost::detail::local_sp_deleter<SharedLRU<unsigned int, OSDMap>::Deleter> >::dispose() (/usr/bin/ceph-osd+0x21b36883)
#3 0x562e10faddb1 in boost::detail::local_counted_impl_em::local_cb_destroy() (/usr/bin/ceph-osd+0x216f4db1)
#4 0x562e0f65f2f8 in boost::detail::local_counted_base::release() (/usr/bin/ceph-osd+0x1fda62f8)
#5 0x562e0f6a0acc in seastar::foreign_ptr<boost::local_shared_ptr<OSDMap const> >::~foreign_ptr() (/usr/bin/ceph-osd+0x1fde7acc)
#6 0x562e0fb60e92 in crimson::osd::OSD::~OSD (/usr/bin/ceph-osd+0x202a7e92)
#7 0x562e0f7fb2ce in main::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const [clone .lto_priv.0] (/usr/bin/ceph-osd+0x1ff422ce)
#8 0x562e0f91cc03 in seastar::noncopyable_function<void ()>::direct_vtable_for<seastar::async<main::{lambda()#1}::operator()() const::{lambda()#1}>(seastar::thread_attributes, main::{lambda()#1}::operator()() const::{lambda()#1}&&)::{lambda()#1}>::call(seastar::noncopyable_function<void ()> const*) (/usr/bin/ceph-osd+0x20063c03)
#9 0x562e1f09c29b in seastar::thread_context::main() (/usr/bin/ceph-osd+0x2f7e329b)

0x61b000001648 is located 200 bytes inside of 1584-byte region [0x61b000001580,0x61b000001bb0)
freed by thread T0 here:
#0 0x7f60ef6b73cf in operator delete(void*, unsigned long) (/lib64/libasan.so.6+0xb73cf)
#1 0x562e10172e76 in seastar::shared_ptr_count_for<crimson::osd::OSDSingletonState>::~shared_ptr_count_for() (/usr/bin/ceph-osd+0x208b9e76)

previously allocated by thread T0 here:
#0 0x7f60ef6b6367 in operator new(unsigned long) (/lib64/libasan.so.6+0xb6367)
#1 0x562e0fc51403 in seastar::sharded<crimson::osd::OSDSingletonState>::start_single<int const&, std::reference_wrapper<crimson::net::Messenger>, std::reference_wrapper<crimson::net::Messenger>, std::reference_wrapper<crimson::mon::Client>, std::reference_wrapper<crimson::mgr:
:Client> >(int const&, std::reference_wrapper<crimson::net::Messenger>&&, std::reference_wrapper<crimson::net::Messenger>&&, std::reference_wrapper<crimson::mon::Client>&&, std::reference_wrapper<crimson::mgr::Client>&&)::{lambda()#1}::operator()() (/usr/bin/ceph-osd+0x20398403)

SUMMARY: AddressSanitizer: heap-use-after-free (/usr/bin/ceph-osd+0x21909cba) in std::_Rb_tree<unsigned int, std::pair<unsigned int const, std::pair<boost::weak_ptr<OSDMap>, OSDMap*> >, std::_Select1st<std::pair<unsigned int const, std::pair<boost::weak_ptr<OSDMap>, OSDMap*> > >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, std::pair<boost::weak_ptr<OSDMap>, OSDMap*> > > >::equal_range(unsigned int const&)
Shadow bytes around the buggy address:
0x0c367fff8270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c367fff8280: 00 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c367fff8290: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c367fff82a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c367fff82b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c367fff82c0: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd
0x0c367fff82d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c367fff82e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c367fff82f0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c367fff8300: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c367fff8310: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1

Actions #1

Updated by Samuel Just about 2 months ago

  • Description updated (diff)
Actions #2

Updated by Samuel Just about 2 months ago

  • Assignee set to Samuel Just
  • Priority changed from Normal to Urgent
Actions #3

Updated by Samuel Just about 1 month ago

  • Status changed from New to Fix Under Review
Actions #4

Updated by Samuel Just about 1 month ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF