Project

General

Profile

Actions

Bug #52400

closed

[pwl ssd] memory corruption (shared_ptr related?)

Added by jianpeng ma over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In xfstests w/ librbd/pwl/ssd, we met some crash.
#0 0x00007fd2457747b1 in std::__atomic_base<unsigned int>::fetch_add (_m=std::memory_order_seq_cst, __i=1, this=0x51)
at /usr/include/c++/9/bits/atomic_base.h:539
#1 std::
_atomic_base<unsigned int>::operator++ (this=0x51) at /usr/include/c++/9/bits/atomic_base.h:303
#2 ceph::buffer::v15_2_0::ptr::ptr (this=this@entry=0x7fd2142b7448, p=...) at ../src/common/buffer.cc:386
#3 0x00007fd245776630 in ceph::buffer::v15_2_0::ptr_node::ptr_node (this=0x7fd2142b7440) at ../src/include/buffer.h:397
#4 ceph::buffer::v15_2_0::ptr_node::cloner::operator() (this=<optimized out>, clone_this=...) at ../src/common/buffer.cc:2240
#5 0x00007fd23c18f023 in _gnu_cxx::_atomic_add_single (_val=1, __mem=0x7fcf) at /usr/include/c++/9/ext/atomicity.h:98
#6 __gnu_cxx::
_atomic_add_dispatch (_val=1, __mem=0x7fcf) at /usr/include/c++/9/ext/atomicity.h:98
#7 std::_Sp_counted_base<(
_gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x51) at /usr/include/c++/9/bits/shared_ptr_base.h:139
#8 std::__shared_count<(_gnu_cxx::_Lock_policy)2>::_shared_count (_r=..., this=0x7fd233ffd0e8)
at /usr/include/c++/9/bits/shared_ptr_base.h:737
#9 std::
_shared_ptr<librbd::cache::pwl::ssd::WriteLogEntry, (_gnu_cxx::_Lock_policy)2>::_shared_ptr<librbd::cache::pwl::GenericWriteLogEntry> (_p=0x7fcfc40333a0, __r=..., this=0x7fd233ffd0e0) at /usr/include/c++/9/bits/shared_ptr_base.h:1164
#10 std::shared_ptr<librbd::cache::pwl::ssd::WriteLogEntry>::shared_ptr<librbd::cache::pwl::GenericWriteLogEntry> (
__p=0x7fcfc40333a0, __r=std::shared_ptr<librbd::cache::pwl::GenericWriteLogEntry> (empty) = {...}, this=0x7fd233ffd0e0)
at /usr/include/c++/9/bits/shared_ptr.h:235
#11 std::static_pointer_cast<librbd::cache::pwl::ssd::WriteLogEntry, librbd::cache::pwl::GenericWriteLogEntry> (
__r=std::shared_ptr<librbd::cache::pwl::GenericWriteLogEntry> (empty) = {...}) at /usr/include/c++/9/bits/shared_ptr.h:494
#12 librbd::cache::pwl::ssd::WriteLog<librbd::ImageCtx>::collect_read_extents (this=0x7fd2200153e0, read_buffer_offset=12288,
map_entry=..., log_entries_to_read=std::vector of length 0, capacity 0, bls_to_read=std::vector of length 0, capacity 0,
entry_hit_length=<optimized out>, hit_extent={...}, read_ctx=0x7fd21423ab80) at ../src/librbd/cache/pwl/ssd/WriteLog.cc:80
#13 0x00007fd23c14f9a2 in librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx>::read (this=<optimized out>, image_extents=...,
bl=<optimized out>, fadvise_flags=fadvise_flags@entry=0, on_finish=<optimized out>) at /usr/include/c++/9/ext/atomicity.h:96
#14 0x00007fd23c135daa in librbd::cache::WriteLogImageDispatch<librbd::ImageCtx>::read (this=0x7fd224033000,
aio_comp=0x55c098c26a50, image_extents=..., read_result=...,
io_context=std::shared_ptr<neorados::IOContext> (use count 8, weak count 0) = {...}, op_flags=0, read_flags=0,
parent_trace=..., tid=4572348, image_dispatch_flags=0x55c0982a874c, dispatch_result=0x55c0982a8750, on_finish=0x55c098c26ba8,
on_dispatched=0x55c0982a8730) at /usr/include/c++/9/optional:963
#15 0x00007fd245bef334 in librbd::io::ImageDispatcher<librbd::ImageCtx>::SendVisitor::operator() (read=..., this=0x7fd233ffd480)
at /usr/include/c++/9/ext/atomicity.h:96
#16 boost::detail::variant::invoke_visitor<librbd::io::ImageDispatcher<librbd::ImageCtx>::SendVisitor const, false>::internal_visit<librbd::io::ImageDispatchSpec::Read&> (operand=..., this=<synthetic pointer>) at boost/include/boost/variant/variant.hpp:1028
#17 boost::detail::variant::visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<librbd::io::ImageDispatcher<librbd::ImageCtx>::SendVisitor const, false>, void*, librbd::io::ImageDispatchSpec::Read> (storage=<optimized out>,
visitor=<synthetic pointer>...) at boost/include/boost/variant/detail/visitation_impl.hpp:119
#18 boost::detail::variant::visitation_impl_invoke<boost::detail::variant::invoke_visitor<librbd::io::ImageDispatcher<librbd::ImageCtx>::SendVisitor const, false>, void*, librbd::io::ImageDispatchSpec::Read, boost::variant<librbd::io::ImageDispatchSpec::Read, librbd::io::ImageDispatchSpec::Discard, librbd::io::ImageDispatchSpec::Write, librbd::io::ImageDispatchSpec::WriteSame, librbd::io::ImageDispatchSpec::CompareAndWrite, librbd::io::ImageDispatchSpec::Flush, librbd::io::ImageDispatchSpec::ListSnaps>::has_fallback_type
> (
t=0x0, storage=<optimized out>, visitor=<synthetic pointer>..., internal_which=<optimized out>)


Related issues 1 (0 open1 closed)

Copied to rbd - Backport #52570: pacific: [pwl ssd] memory corruption (shared_ptr related?)ResolvedDeepika UpadhyayActions
Actions #1

Updated by jianpeng ma over 2 years ago

Actions #2

Updated by jianpeng ma over 2 years ago

Also found other struct corrupted:
──0 ceph-client2 gdb─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#0 0x00007f68129d37b1 in std::__atomic_base<unsigned int>::fetch_add (_m=std::memory_order_seq_cst, __i=1, this=0x51)
at /usr/include/c++/9/bits/atomic_base.h:539
#1 std::
_atomic_base<unsigned int>::operator++ (this=0x51) at /usr/include/c++/9/bits/atomic_base.h:303
#2 ceph::buffer::v15_2_0::ptr::ptr (this=this@entry=0x7f67e4a39ba8, p=...) at ../src/common/buffer.cc:386
#3 0x00007f68129d5630 in ceph::buffer::v15_2_0::ptr_node::ptr_node (this=0x7f67e4a39ba0) at ../src/include/buffer.h:397
#4 ceph::buffer::v15_2_0::ptr_node::cloner::operator() (this=this@entry=0x7f68057f8130, clone_this=...)
at ../src/common/buffer.cc:2240
#5 0x00007f68043ee087 in ceph::buffer::v15_2_0::list::buffers_t::clone_from (other=..., this=0x7f68057f8110)
at ../src/include/buffer.h:592
#6 ceph::buffer::v15_2_0::list::operator= (other=..., this=0x7f68057f8110) at ../src/include/buffer.h:963
#7 librbd::cache::pwl::ssd::WriteLog<librbd::ImageCtx>::collect_read_extents (this=0x7f67ec015300, read_buffer_offset=0,
map_entry=..., log_entries_to_read=std::vector of length 0, capacity 0, bls_to_read=std::vector of length 0, capacity 0,
entry_hit_length=<optimized out>, hit_extent={...}, read_ctx=0x7f67e43de5e0) at ../src/librbd/cache/pwl/ssd/WriteLog.cc:82
#8 0x00007f68043ae9ca in librbd::cache::pwl::AbstractWriteLog<librbd::ImageCtx>::read (this=<optimized out>, image_extents=...,
bl=<optimized out>, fadvise_flags=fadvise_flags@entry=0, on_finish=<optimized out>) at /usr/include/c++/9/ext/atomicity.h:96
#9 0x00007f6804394daa in librbd::cache::WriteLogImageDispatch<librbd::ImageCtx>::read (this=0x7f67f8034000,
aio_comp=0x562fdc6db540, image_extents=..., read_result=..., io_context=
std::shared_ptr<neorados::IOContext> (use count 14, weak count 0) = {...}, op_flags=0, read_flags=0, parent_trace=...,
tid=1803047, image_dispatch_flags=0x562fdc5d70ac, dispatch_result=0x562fdc5d70b0, on_finish=0x562fdc6db698,
on_dispatched=0x562fdc5d7090) at /usr/include/c++/9/optional:963
#10 0x00007f6812e4e334 in librbd::io::ImageDispatcher<librbd::ImageCtx>::SendVisitor::operator() (read=..., this=0x7f68057f8480)
at /usr/include/c++/9/ext/atomicity.h:96
#11 boost::detail::variant::invoke_visitor<librbd::io::ImageDispatcher<librbd::ImageCtx>::SendVisitor const, false>::internal_visit<librbd::io::ImageDispatchSpec::Read&> (operand=..., this=<synthetic pointer>) at boost/include/boost/variant/variant.hpp:1028
#12 boost::detail::variant::visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<librbd::io::ImageDispatcher<librbd::ImageCtx>::SendVisitor const, false>, void*, librbd::io::ImageDispatchSpec::Read> (storage=<optimized out>,
visitor=<synthetic pointer>...) at boost/include/boost/variant/detail/visitation_impl.hpp:119
#13 boost::detail::variant::visitation_impl_invoke<boost::detail::variant::invoke_visitor<librbd::io::ImageDispatcher<librbd::ImageCtx>::SendVisitor const, false>, void*, librbd::io::ImageDispatchSpec::Read, boost::variant<librbd::io::ImageDispatchSpec::Read, librbd::io::ImageDispatchSpec::Discard, librbd::io::ImageDispatchSpec::Write, librbd::io::ImageDispatchSpec::WriteSame, librbd::io::ImageDispatchSpec::CompareAndWrite, librbd::io::ImageDispatchSpec::Flush, librbd::io::ImageDispatchSpec::ListSnaps>::has_fallback_type_> (
t=0x0, storage=<optimized out>, visitor=<synthetic pointer>..., internal_which=<optimized out>)
at boost/include/boost/variant/detail/visitation_impl.hpp:157

Actions #3

Updated by jianpeng ma over 2 years ago

(gdb) bt
#0 0x00007f68129d37b1 in std::__atomic_base<unsigned int>::fetch_add (_m=std::memory_order_seq_cst, __i=1, this=0x51)
at /usr/include/c++/9/bits/atomic_base.h:539
#1 std::
_atomic_base<unsigned int>::operator++ (this=0x51) at /usr/include/c++/9/bits/atomic_base.h:303
#2 ceph::buffer::v15_2_0::ptr::ptr (this=this@entry=0x7f67e4a39ba8, p=...) at ../src/common/buffer.cc:386
#3 0x00007f68129d5630 in ceph::buffer::v15_2_0::ptr_node::ptr_node (this=0x7f67e4a39ba0) at ../src/include/buffer.h:397
#4 ceph::buffer::v15_2_0::ptr_node::cloner::operator() (this=<optimized out>, clone_this=...) at ../src/common/buffer.cc:2240
#5 0x00007f68043ee087 in librbd::cache::pwl::ssd::WriteLog<librbd::ImageCtx>::append_op_log_entries(std::__cxx11::list<std::shared_ptr<librbd::cache::pwl::GenericLogOperation>, std::allocator<std::shared_ptr<librbd::cache::pwl::GenericLogOperation> > >&)::{lambda(int)#2}::operator()(int) const (
r=-1945487352, this=<optimized out>) at ../src/librbd/cache/pwl/ssd/WriteLog.cc:423
#6 LambdaContext<librbd::cache::pwl::ssd::WriteLog<librbd::ImageCtx>::append_op_log_entries(std::__cxx11::list<std::shared_ptr<librbd::cache::pwl::GenericLogOperation>, std::allocator<std::shared_ptr<librbd::cache::pwl::GenericLogOperation> > >&)::{lambda(int)#2}>::finish(int) (this=0x7f67e4a39ba8,
r=-1945487352) at ../src/include/Context.h:166
#7 0x0000000000000000 in ?? ()

Actions #4

Updated by Ilya Dryomov over 2 years ago

  • Subject changed from librbd/pwl/ssd: memory corupt to [pwl ssd] memory corruption (shared_ptr related?)
Actions #5

Updated by Ilya Dryomov over 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to jianpeng ma
  • Pull request ID set to 42984
Actions #6

Updated by Ilya Dryomov over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Ilya Dryomov over 2 years ago

  • Backport set to pacific
Actions #8

Updated by Backport Bot over 2 years ago

  • Copied to Backport #52570: pacific: [pwl ssd] memory corruption (shared_ptr related?) added
Actions #9

Updated by Ilya Dryomov about 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF