Bug #54130: OpsLogRados::log segfaults in rgw/multisite suite - rgw - Ceph

Actions

Copy link

Bug #54130

closed

OpsLogRados::log segfaults in rgw/multisite suite

Added by Casey Bodley about 2 years ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Cory Snyder

Target version:

% Done:

100%

Source:

Tags:

opslog

Backport:

octopus pacific quincy

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

44893

Crash signature (v1):

Crash signature (v2):

Description

2022-01-29T00:31:31.693 INFO:tasks.rgw_multisite_tests:running rgw multisite tests on '/home/teuthworker/src/github.com_ceph_ceph-c_7c1eb0cd5f8e9dc658d4fb8519b7c8bccf11fbd6/qa/../src/test/rgw/rgw_multi' with args=['tests.py']
2022-01-29T00:31:31.700 INFO:rgw_multi.tests:create bucket zone=a1 name=swdzrg-1
2022-01-29T00:31:35.011 INFO:rgw_multi.tests:create bucket zone=a2 name=swdzrg-2
*** Caught signal (Segmentation fault) **
 in thread 7fa8e16e2700 thread_name:radosgw
 ceph version 17.0.0-10459-g7c1eb0cd (7c1eb0cd5f8e9dc658d4fb8519b7c8bccf11fbd6) quincy (dev)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7faa8867d3c0]
 2: (OpsLogRados::log(req_state*, rgw_log_entry&)+0x123) [0x7faa88b7d9a3]
 3: (OpsLogManifold::log(req_state*, rgw_log_entry&)+0x3e) [0x7faa88b7ab9e]
 4: (rgw_log_op(RGWREST*, req_state*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, OpsLogSink*)+0xd62) [0x7faa88b7e942]
 5: (process_request(rgw::sal::Store*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSink*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, std::shared_ptr<RateLimiter>, int*)+0x1433) [0x7faa88b9c853]
 6: /lib/libradosgw.so.2(+0x473384) [0x7faa88afe384]
 7: /lib/libradosgw.so.2(+0x47469a) [0x7faa88aff69a]
 8: /lib/libradosgw.so.2(+0x47481c) [0x7faa88aff81c]
 9: make_fcontext()

from a recent master baseline, two rgw/multisite jobs failed this way:
http://qa-proxy.ceph.com/teuthology/yuriw-2022-01-28_16:15:58-rgw-wip-master-1.27.22-distro-default-smithi/6646932/teuthology.log
http://qa-proxy.ceph.com/teuthology/yuriw-2022-01-28_16:15:58-rgw-wip-master-1.27.22-distro-default-smithi/6646954/teuthology.log

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Cory Snyder about 2 years ago

Seems to be caused by the Store being de-allocated when a realm is reloaded. OpsLogRados is not reinitialized to use the new Store created by the reload. We can fix this by using the same pattern that is used for the UsageLogger: make the OpsLogRados instance a static variable within rgw_log.cc and create init/finalize methods to manage it's lifecycle. The realm reloader can then call these methods to refresh the logger when it reloads.

Actions

Copy link

Updated by Casey Bodley about 2 years ago

thanks for taking a look! another option to consider is letting RGWRados own OpsLogRados and handle its init/shutdown. that way, OpsLogRados never has a dangling pointer and RGWRealmReloader doesn't need another special case

Actions

Copy link