Project

General

Profile

Actions

Bug #23661

closed

RGWAsyncGetSystemObj failed assertion on shutdown/realm reload

Added by Casey Bodley about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
multisite
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

 -2698> 2018-04-11 16:51:59.592 6f0b8700  1 rgw realm reloader: Frontends paused
...
   -73> 2018-04-11 16:52:00.492 34fea700 20 clearing stack on run() exit: stack=0x7b392d60 nref=2
   -72> 2018-04-11 16:52:00.492 34fea700 20 run(stacks) returned r=-125
...
    -6> 2018-04-11 16:52:00.513 16540700  1 -- 172.21.15.164:0/2127361047 <== osd.0 172.21.15.164:6805/30344 523 ==== osd_op_reply(1339 datalog.sync-status.shard.234e7cf5-a39c-4ebf-8a3b-7002cda4fa64.74 [read 0~40] v0'0 uv2586 ondisk = 0) v9 ==== 210+0+40 (2369085131 0 1514302540) 0x3452ba60 con 0x3291f060
    -5> 2018-04-11 16:52:00.514 24574700 20 rados->read r=0 bl.length=40
    -4> 2018-04-11 16:52:00.514 24574700 10 cache put: name=test-zone2.rgw.log++datalog.sync-status.shard.234e7cf5-a39c-4ebf-8a3b-7002cda4fa64.74 info.flags=0x1
    -3> 2018-04-11 16:52:00.514 24574700 10 adding test-zone2.rgw.log++datalog.sync-status.shard.234e7cf5-a39c-4ebf-8a3b-7002cda4fa64.74 to cache LRU end
...
2018-04-11 16:52:00.485 2ed89700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.0.2-994-g7c21f2e/rpm/el7/BUILD/ceph-13.0.2-994-g7c21f2e/src/common/buffer.cc: In function 'char* ceph::buffer::ptr::c_str()' thread 2ed89700 time 2018-04-11 16:52:00.400455
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.0.2-994-g7c21f2e/rpm/el7/BUILD/ceph-13.0.2-994-g7c21f2e/src/common/buffer.cc: 988: FAILED assert(_raw)

 ceph version 13.0.2-994-g7c21f2e (7c21f2edad61886351873068f6803446618fc2e4) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x612f80f]
 2: (()+0x2809f7) [0x612f9f7]
 3: (()+0xc2b0a) [0x4ef8b0a]
 4: (ceph::buffer::list::iterator_impl<false>::copy_all(ceph::buffer::list&)+0x2b) [0x4f00f5b]
 5: (RGWCache<RGWRados>::get_system_obj(RGWObjectCtx&, RGWRados::SystemObject::Read::GetObjState&, RGWObjVersionTracker*, rgw_raw_obj&, ceph::buffer::list&, long, long, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >*, rgw_cache_entry_info*, boost::optional<obj_version>)+0x3b6) [0x4f49d6]
 6: (RGWAsyncGetSystemObj::_send_request()+0x6d) [0x4242ed]
 7: (RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x22) [0x425532]
 8: (RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0xd) [0x4255fd]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x903) [0x6134c93]
 10: (ThreadPool::WorkThread::entry()+0x10) [0x6136140]
 11: (()+0x7e25) [0x5c9ae25]
 12: (clone()+0x6d) [0x1178434d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

http://qa-proxy.ceph.com/teuthology/cbodley-2018-04-11_16:15:50-rgw-wip-cbodley-testing-distro-basic-smithi/2386406/teuthology.log


Related issues 1 (0 open1 closed)

Related to rgw - Bug #35543: multisite: segfault on shutdown/realm reloadResolved09/04/2018

Actions
Actions #1

Updated by Orit Wasserman about 6 years ago

  • Backport set to luminous, jewel
Actions #2

Updated by Matt Benjamin about 6 years ago

  • Status changed from New to Triaged
  • Assignee set to Casey Bodley
Actions #3

Updated by Yehuda Sadeh almost 6 years ago

This looks like the teuthology run preceded the cloud sync merge. Unless it was testing the cloud sync work, I think we can close this for now (until we see it happening again) because the cloud sync work touched and fixed issues related to the coroutines stack shutdown (that would have looked like this specific issue). Casey, can we close this one?

Actions #5

Updated by Casey Bodley over 5 years ago

  • Related to Bug #35543: multisite: segfault on shutdown/realm reload added
Actions #6

Updated by Casey Bodley over 5 years ago

  • Status changed from Triaged to Resolved
  • Backport deleted (luminous, jewel)

resolved in http://tracker.ceph.com/issues/35543, and backports are tracked there

Actions

Also available in: Atom PDF