Actions
Bug #22833
closedrgw: crash in RGWSyncTraceManager::finish_node()
Status:
Resolved
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Reported by mkogan:
This abort occurred on master branch after sync of 2692 objects. cosbench is writing s3 , 4mb obj workload to c1, c2 is syncing $ ../src/mrun c2 radosgw-admin sync status 2018-01-29 13:53:26.205 7fc7075243c0 -1 WARNING: all dangerous and experimental features are enabled. 2018-01-29 13:53:26.205 7fc7075243c0 -1 WARNING: all dangerous and experimental features are enabled. 2018-01-29 13:53:26.207 7fc7075243c0 -1 WARNING: all dangerous and experimental features are enabled. realm a720ab67-014d-47f7-b1df-2a39c14e4aca (gold) zonegroup 2b112a20-53b1-407e-b732-2e962c11d7a5 (us) zone c5488b04-0cd7-4682-8067-f5f26ce64dff (us-west) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 1dce771c-47f1-42c7-8deb-de1798034732 (us-east) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 1 shards oldest incremental change not applied: 2018-01-29 12:23:36.0.715214 from radosgw.8001.log of c2 : (I will rebuild with debug info tomorrow and reproduce with better debug info) -10> 2018-01-29 12:23:26.628 7f42a08b3700 20 RGWDataChangesLog::add_entry() bucket.name=s3testqwer011 shard_id=-1 now=2018-01-29 12:23:26.0.629662s cur_expiration=2018-01-29 12:23:36.0 -9> 2018-01-29 12:23:26.629 7f42998a5700 20 cr:s=0x558091170f00:op=0x5580948b8d00:19RGWFetchRemoteObjCR: operate() -8> 2018-01-29 12:23:26.629 7f42998a5700 20 cr:s=0x558091170f00:op=0x5580948b8d00:19RGWFetchRemoteObjCR: operate() -7> 2018-01-29 12:23:26.629 7f42998a5700 20 cr:s=0x558091170f00:op=0x5580948b8d00:19RGWFetchRemoteObjCR: operate() -6> 2018-01-29 12:23:26.629 7f42998a5700 20 cr:s=0x558091170f00:op=0x5580948b8d00:19RGWFetchRemoteObjCR: operate() -5> 2018-01-29 12:23:26.629 7f42998a5700 20 cr:s=0x558091170f00:op=0x55809541e400:26RGWBucketSyncSingleEntryCRINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE11rgw_obj_keyE: oper -4> 2018-01-29 12:23:26.629 7f42998a5700 10 RGW-SYNC:data:sync:shard[33]:entry[s3testqwer011:1dce771c-47f1-42c7-8deb-de1798034732.4146.1]:bucket[s3testqwer011:1dce771c-47f1-42c7-8deb- -3> 2018-01-29 12:23:26.629 7f42998a5700 5 data sync: Sync:1dce771c:data:Object:s3testqwer011:1dce771c-47f1-42c7-8deb-de1798034732.4146.1/myobjects30273[0]:done -2> 2018-01-29 12:23:26.629 7f42998a5700 20 cr:s=0x558091170f00:op=0x55809541e400:26RGWBucketSyncSingleEntryCRINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE11rgw_obj_keyE: oper -1> 2018-01-29 12:23:26.630 7f42b7af3700 1 -- 192.168.39.252:0/3791001179 <== osd.0 192.168.39.252:6806/4684 33033 ==== osd_op_reply(31213 .dir.1dce771c-47f1-42c7-8deb-de1798034732.4 0> 2018-01-29 12:23:26.634 7f42998a5700 -1 *** Caught signal (Aborted) ** in thread 7f42998a5700 thread_name:data-sync ceph version 13.0.0-5206-g5f01bcddff (5f01bcddffa327334652afb0d55ae26009e11d5c) mimic (dev) 1: (()+0x1f0090) [0x55808d29f090] 2: (()+0x12af0) [0x7f42c92d9af0] 3: (gsignal()+0xcb) [0x7f42bdf4166b] 4: (abort()+0x141) [0x7f42bdf43381] 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f42be8eb025] 6: (()+0x8fc16) [0x7f42be8e8c16] 7: (()+0x8fc61) [0x7f42be8e8c61] 8: (()+0x8fea4) [0x7f42be8e8ea4] 9: (std::__throw_system_error(int)+0x8e) [0x7f42be914f7e] 10: (()+0x52e286) [0x55808d5dd286] 11: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x46) [0x55808d24bdb6] 12: (std::_Sp_counted_ptr<RGWSyncTraceNode*, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x106) [0x55808d5e0226] 13: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x46) [0x55808d24bdb6] 14: (RGWSyncTraceManager::finish_node(RGWSyncTraceNode*)+0x17b) [0x55808d5dd20b] 15: (RGWBucketSyncSingleEntryCR<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, rgw_obj_key>::~RGWBucketSyncSingleEntryCR()+0x23a) [0x55808d59c8da] 16: (RefCountedObject::put() const+0x27a) [0x55808d2c8faa] 17: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x1e9) [0x55808d355999] 18: (RGWCoroutinesManager::run(std::__cxx11::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x1ef) [0x55808d35910f] 19: (RGWCoroutinesManager::run(RGWCoroutine*)+0x9b) [0x55808d35a0ab] 20: (RGWRemoteDataLog::run_sync(int)+0x209) [0x55808d58caa9] 21: (RGWDataSyncProcessorThread::process()+0x46) [0x55808d40ebb6] 22: (RGWRadosThread::Worker::entry()+0x230) [0x55808d3acaf0] 23: (()+0x761b) [0x7f42c92ce61b] 24: (clone()+0x3f) [0x7f42be02198f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. .... #12 0x00007f9810525e97 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x55623dde4620, tinfo=tinfo@entry=0x7f9810810ba0 <typeinfo for std::system_error>, dest=dest@entry=0x7f9810551c50 <std::system_error::~system_error()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:88 #13 0x00007f9810551f7e in std::__throw_system_error (__i=__i@entry=35) at ../../../../../libstdc++-v3/src/c++11/system_error.cc:81 #14 0x000055623b592196 in std::__shared_mutex_pthread::lock (this=0x55623d35ee08) at /usr/include/c++/7/shared_mutex:107 #15 std::shared_timed_mutex::lock (this=0x55623d35ee08) at /usr/include/c++/7/shared_mutex:378 #16 ceph::shunique_lock<std::shared_timed_mutex>::lock (this=<synthetic pointer>) at /mnt/vdb1/src-git/ceph--ceph-master/src/common/shunique_lock.h:157 #17 ceph::shunique_lock<std::shared_timed_mutex>::shunique_lock (m=..., this=<synthetic pointer>) at /mnt/vdb1/src-git/ceph--ceph-master/src/common/shunique_lock.h:65 #18 RGWSyncTraceManager::finish_node (this=0x55623d35ee00, node=0x55623d9d8d80) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_sync_trace.cc:260 #19 0x000055623b200db6 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55623ea437d0) at /usr/include/c++/7/bits/shared_ptr_base.h:154 #20 0x000055623b595106 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x55623d9d8f18, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:684 #21 std::__shared_ptr<RGWSyncTraceNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x55623d9d8f10, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:1123 #22 std::shared_ptr<RGWSyncTraceNode>::~shared_ptr (this=0x55623d9d8f10, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr.h:93 #23 RGWSyncTraceNode::~RGWSyncTraceNode (this=<optimized out>, __in_chrg=<optimized out>) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_sync_trace.h:32 #24 std::_Sp_counted_ptr<RGWSyncTraceNode*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:376 #25 0x000055623b200db6 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55623d8cafc0) at /usr/include/c++/7/bits/shared_ptr_base.h:154 #26 0x000055623b59211b in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::operator= (__r=..., this=0x55623d4680a8) at /usr/include/c++/7/bits/shared_ptr_base.h:703 #27 std::__shared_ptr<RGWSyncTraceNode, (__gnu_cxx::_Lock_policy)2>::operator= (this=0x55623d4680a0) at /usr/include/c++/7/bits/shared_ptr_base.h:1034 #28 std::shared_ptr<RGWSyncTraceNode>::operator= (this=0x55623d4680a0) at /usr/include/c++/7/bits/shared_ptr.h:93 #29 boost::circular_buffer<std::shared_ptr<RGWSyncTraceNode>, std::allocator<std::shared_ptr<RGWSyncTraceNode> > >::replace (this=0x55623d35ee80, item=..., pos=0x55623d4680a0) at /mnt/vdb1/src-git/ceph--ceph-master/build/boost/include/boost/circular_buffer/base.hpp:2396 #30 boost::circular_buffer<std::shared_ptr<RGWSyncTraceNode>, std::allocator<std::shared_ptr<RGWSyncTraceNode> > >::push_back_impl<std::shared_ptr<RGWSyncTraceNode> const&> ( item=std::shared_ptr (count 3, weak 0) 0x55623ed51380, this=0x55623d35ee80) at /mnt/vdb1/src-git/ceph--ceph-master/build/boost/include/boost/circular_buffer/base.hpp:1421 #31 boost::circular_buffer<std::shared_ptr<RGWSyncTraceNode>, std::allocator<std::shared_ptr<RGWSyncTraceNode> > >::push_back (item=..., this=0x55623d35ee80) at /mnt/vdb1/src-git/ceph--ceph-master/build/boost/include/boost/circular_buffer/base.hpp:1471 #32 RGWSyncTraceManager::finish_node (this=0x55623d35ee00, node=<optimized out>) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_sync_trace.cc:270 #33 0x000055623b551f3a in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5562408f3d40) at /usr/include/c++/7/bits/shared_ptr_base.h:154 #34 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x55623e768c68, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:684 #35 std::__shared_ptr<RGWSyncTraceNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x55623e768c60, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:1123 #36 std::shared_ptr<RGWSyncTraceNode>::~shared_ptr (this=0x55623e768c60, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr.h:93 #37 RGWRunBucketSyncCoroutine::~RGWRunBucketSyncCoroutine (this=0x55623e768000, __in_chrg=<optimized out>) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_data_sync.cc:966 #38 RGWRunBucketSyncCoroutine::~RGWRunBucketSyncCoroutine (this=0x55623e768000, __in_chrg=<optimized out>) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_data_sync.cc:970 #39 0x000055623b27df2a in RefCountedObject::put (this=0x55623e768000) at /mnt/vdb1/src-git/ceph--ceph-master/src/common/RefCountedObj.h:58 #40 0x000055623b30a8e9 in RGWCoroutinesStack::operate (this=0x55623ddbc2d0, _env=_env@entry=0x7f97eb4ddac0) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_coroutine.cc:205 #41 0x000055623b30e05f in RGWCoroutinesManager::run (this=this@entry=0x55623d406b18, stacks=std::__cxx11::list = {...}) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_coroutine.cc:484 #42 0x000055623b30effb in RGWCoroutinesManager::run (this=this@entry=0x55623d406b18, op=0x55623d3a3800) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_coroutine.cc:623 #43 0x000055623b5419b9 in RGWRemoteDataLog::run_sync (this=this@entry=0x55623d406b18, num_shards=<optimized out>) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_data_sync.cc:1750 #44 0x000055623b3c3af6 in RGWDataSyncStatusManager::run (this=0x55623d406a48) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_data_sync.h:347 #45 RGWDataSyncProcessorThread::process (this=0x55623d406a00) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_rados.cc:3237 #46 0x000055623b361a30 in RGWRadosThread::Worker::entry (this=0x55623d4b52c0) at /mnt/vdb1/src-git/ceph--ceph-master/src/rgw/rgw_rados.cc:3043 #47 0x00007f981af0b61b in start_thread (arg=0x7f97eb4e2700) at pthread_create.c:465 #48 0x00007f980fc5e98f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Updated by Yehuda Sadeh about 6 years ago
- Priority changed from Normal to Urgent
- Backport set to luminous
Updated by Yehuda Sadeh about 6 years ago
- Status changed from New to Fix Under Review
Updated by Casey Bodley about 6 years ago
- Status changed from Fix Under Review to 7
Updated by Casey Bodley about 6 years ago
- Related to Backport #23018: rgw multisite sync tracing feature added
Updated by Nathan Cutler about 6 years ago
- Backport deleted (
luminous)
Luminous backport to be done in #23018
Actions