Bug #51418
closed[pwl] segment fault on syncpoint stack
0%
Description
segment fault, stack is very very long.
Thread 137 "tp_pwl" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff9f7fe700 (LWP 807360)]
0x00007ffff50d7f38 in std::__shared_count<(_gnu_cxx::_Lock_policy)2>::~_shared_count (this=0x7fff74dbc108, _in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
730 _M_pi->_M_release();
(gdb) bt
#0 0x00007ffff50d7f38 in std::_shared_count<(_gnu_cxx::_Lock_policy)2>::~_shared_count (this=0x7fff74dbc108, _in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#1 0x00007fffb2133118 in std::_shared_ptr<librbd::cache::pwl::SyncPointLogEntry, (_gnu_cxx::_Lock_policy)2>::~_shared_ptr (this=0x7fff74dbc100, _in_chrg=<optimized out>)
at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#2 0x00007fffb2133138 in std::shared_ptrlibrbd::cache::pwl::SyncPointLogEntry::~shared_ptr (this=0x7fff74dbc100, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
#3 0x00007fffb217dd6c in librbd::cache::pwl::SyncPointLogEntry::~SyncPointLogEntry (this=0x7fff74dbc080, __in_chrg=<optimized out>) at ../src/librbd/cache/pwl/LogEntry.h:97
#4 0x00007fffb216dbf5 in __gnu_cxx::new_allocatorlibrbd::cache::pwl::SyncPointLogEntry::destroylibrbd::cache::pwl::SyncPointLogEntry (this=0x7fff74dbc080, __p=0x7fff74dbc080)
at /usr/include/c++/9/ext/new_allocator.h:153
#5 0x00007fffb216da0d in std::allocator_traits<std::allocatorlibrbd::cache::pwl::SyncPointLogEntry >::destroylibrbd::cache::pwl::SyncPointLogEntry (_a=..., _p=0x7fff74dbc080)
at /usr/include/c++/9/bits/alloc_traits.h:497
#6 0x00007fffb216ca59 in std::_Sp_counted_ptr_inplace<librbd::cache::pwl::SyncPointLogEntry, std::allocatorlibrbd::cache::pwl::SyncPointLogEntry, (_gnu_cxx::_Lock_policy)2>::_M_dispose (
this=0x7fff74dbc070) at /usr/include/c++/9/bits/shared_ptr_base.h:557
#7 0x00007ffff50d8cd0 in std::_Sp_counted_base<(_gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff74dbc070) at /usr/include/c++/9/bits/shared_ptr_base.h:155
#8 0x00007ffff50d7f3d in std::_shared_count<(_gnu_cxx::_Lock_policy)2>::~_shared_count (this=0x7fff72397b28, _in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#9 0x00007fffb2133118 in std::_shared_ptr<librbd::cache::pwl::SyncPointLogEntry, (_gnu_cxx::_Lock_policy)2>::~_shared_ptr (this=0x7fff72397b20, _in_chrg=<optimized out>)
at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#10 0x00007fffb2133138 in std::shared_ptrlibrbd::cache::pwl::SyncPointLogEntry::~shared_ptr (this=0x7fff72397b20, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
#11 0x00007fffb217dd6c in librbd::cache::pwl::SyncPointLogEntry::~SyncPointLogEntry (this=0x7fff72397aa0, __in_chrg=<optimized out>) at ../src/librbd/cache/pwl/LogEntry.h:97
#12 0x00007fffb216dbf5 in __gnu_cxx::new_allocatorlibrbd::cache::pwl::SyncPointLogEntry::destroylibrbd::cache::pwl::SyncPointLogEntry (this=0x7fff72397aa0, __p=0x7fff72397aa0)
at /usr/include/c++/9/ext/new_allocator.h:153
#13 0x00007fffb216da0d in std::allocator_traits<std::allocatorlibrbd::cache::pwl::SyncPointLogEntry >::destroylibrbd::cache::pwl::SyncPointLogEntry (_a=..., _p=0x7fff72397aa0)
at /usr/include/c++/9/bits/alloc_traits.h:497
#14 0x00007fffb216ca59 in std::_Sp_counted_ptr_inplace<librbd::cache::pwl::SyncPointLogEntry, std::allocatorlibrbd::cache::pwl::SyncPointLogEntry, (_gnu_cxx::_Lock_policy)2>::_M_dispose (
this=0x7fff72397a90) at /usr/include/c++/9/bits/shared_ptr_base.h:557
#15 0x00007ffff50d8cd0 in std::_Sp_counted_base<(_gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff72397a90) at /usr/include/c++/9/bits/shared_ptr_base.h:155
#16 0x00007ffff50d7f3d in std::_shared_count<(_gnu_cxx::_Lock_policy)2>::~_shared_count (this=0x7fff724b8398, _in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#17 0x00007fffb2133118 in std::_shared_ptr<librbd::cache::pwl::SyncPointLogEntry, (_gnu_cxx::_Lock_policy)2>::~_shared_ptr (this=0x7fff724b8390, _in_chrg=<optimized out>)
at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#18 0x00007fffb2133138 in std::shared_ptrlibrbd::cache::pwl::SyncPointLogEntry::~shared_ptr (this=0x7fff724b8390, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
#19 0x00007fffb217dd6c in librbd::cache::pwl::SyncPointLogEntry::~SyncPointLogEntry (this=0x7fff724b8310, __in_chrg=<optimized out>) at ../src/librbd/cache/pwl/LogEntry.h:97
#20 0x00007fffb216dbf5 in __gnu_cxx::new_allocatorlibrbd::cache::pwl::SyncPointLogEntry::destroylibrbd::cache::pwl::SyncPointLogEntry (this=0x7fff724b8310, __p=0x7fff724b8310)
at /usr/include/c++/9/ext/new_allocator.h:153
#21 0x00007fffb216da0d in std::allocator_traits<std::allocatorlibrbd::cache::pwl::SyncPointLogEntry >::destroylibrbd::cache::pwl::SyncPointLogEntry (_a=..., _p=0x7fff724b8310)
at /usr/include/c++/9/bits/alloc_traits.h:497
#22 0x00007fffb216ca59 in std::_Sp_counted_ptr_inplace<librbd::cache::pwl::SyncPointLogEntry, std::allocatorlibrbd::cache::pwl::SyncPointLogEntry, (_gnu_cxx::_Lock_policy)2>::_M_dispose (
this=0x7fff724b8300) at /usr/include/c++/9/bits/shared_ptr_base.h:557
#23 0x00007ffff50d8cd0 in std::_Sp_counted_base<(_gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fff724b8300) at /usr/include/c++/9/bits/shared_ptr_base.h:155
#24 0x00007ffff50d7f3d in std::_shared_count<(_gnu_cxx::_Lock_policy)2>::~_shared_count (this=0x7fffaff28ca8, _in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#25 0x00007fffb2133118 in std::_shared_ptr<librbd::cache::pwl::SyncPointLogEntry, (_gnu_cxx::_Lock_policy)2>::~_shared_ptr (this=0x7fffaff28ca0, _in_chrg=<optimized out>)
at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#26 0x00007fffb2133138 in std::shared_ptrlibrbd::cache::pwl::SyncPointLogEntry::~shared_ptr (this=0x7fffaff28ca0, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
#27 0x00007fffb217dd6c in librbd::cache::pwl::SyncPointLogEntry::~SyncPointLogEntry (this=0x7fffaff28c20, __in_chrg=<optimized out>) at ../src/librbd/cache/pwl/LogEntry.h:97
#28 0x00007fffb216dbf5 in __gnu_cxx::new_allocatorlibrbd::cache::pwl::SyncPointLogEntry::destroylibrbd::cache::pwl::SyncPointLogEntry (this=0x7fffaff28c20, __p=0x7fffaff28c20)
at /usr/include/c++/9/ext/new_allocator.h:153
#29 0x00007fffb216da0d in std::allocator_traits<std::allocatorlibrbd::cache::pwl::SyncPointLogEntry >::destroylibrbd::cache::pwl::SyncPointLogEntry (_a=..., _p=0x7fffaff28c20)
at /usr/include/c++/9/bits/alloc_traits.h:497
#30 0x00007fffb216ca59 in std::_Sp_counted_ptr_inplace<librbd::cache::pwl::SyncPointLogEntry, std::allocatorlibrbd::cache::pwl::SyncPointLogEntry, (_gnu_cxx::_Lock_policy)2>::_M_dispose (
this=0x7fffaff28c10) at /usr/include/c++/9/bits/shared_ptr_base.h:557
#31 0x00007ffff50d8cd0 in std::_Sp_counted_base<(_gnu_cxx::_Lock_policy)2>::_M_release (this=0x7fffaff28c10) at /usr/include/c++/9/bits/shared_ptr_base.h:155
#32 0x00007ffff50d7f3d in std::_shared_count<(_gnu_cxx::_Lock_policy)2>::~_shared_count (this=0x7fff7208ce18, _in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#33 0x00007fffb2133118 in std::_shared_ptr<librbd::cache::pwl::SyncPointLogEntry, (_gnu_cxx::_Lock_policy)2>::~_shared_ptr (this=0x7fff7208ce10, __in_chrg=<optimized out>)
at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#34 0x00007fffb2133138 in std::shared_ptrlibrbd::cache::pwl::SyncPointLogEntry::~shared_ptr (this=0x7fff7208ce10, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
--Type <RET> for more, q to quit, c to continue without paging--
Updated by Ilya Dryomov almost 3 years ago
- Status changed from New to Fix Under Review
- Assignee set to Hualong Feng
- Pull request ID set to 42149
Updated by Ilya Dryomov almost 3 years ago
- Subject changed from [pwl ssd] segment fault on syncpoint stack to [pwl] segment fault on syncpoint stack
Updated by CONGMIN YIN almost 3 years ago
https://github.com/ceph/ceph/pull/42149, supplement the cleanup of syncpoint. But we still don't understand the mechanics here. From the test results, something that leads to a segfault without the patch and runs fine for an extended period of time with the patch.
Updated by CONGMIN YIN almost 3 years ago
use gdb and fio to reproduce.
gdb fio
set args test.conf
run
#cat test.conf
[global]
ioengine=rbd
clientname=admin
rw=randwrite
#bs=1m
bs=16k
time_based=1
runtime=3h
iodepth=16
group_reporting
[volumes]
pool=test
rbdname=image10
when fio run into segment fault, execute 'bt' to show the stack, but the stack is very long, about 220000+ frame.
Note: this bug is not inevitable, but is likely to occur. When the bug does not occur after half an hour, it can be run again.
Updated by Ilya Dryomov almost 3 years ago
How big is the cache (rbd_persistent_cache_size)?
Updated by Ilya Dryomov over 2 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to pacific
Updated by Backport Bot over 2 years ago
- Copied to Backport #52092: pacific: [pwl] segment fault on syncpoint stack added
Updated by Deepika Upadhyay over 2 years ago
- Related to Bug #52258: [pwl] The write back time of cache is too long added
Updated by Ilya Dryomov over 2 years ago
- Related to Bug #52465: [pwl ssd] assert in AbstractWriteLog::handle_flushed_sync_point() added
Updated by Ilya Dryomov about 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".