Bug #54130
closedOpsLogRados::log segfaults in rgw/multisite suite
100%
Description
2022-01-29T00:31:31.693 INFO:tasks.rgw_multisite_tests:running rgw multisite tests on '/home/teuthworker/src/github.com_ceph_ceph-c_7c1eb0cd5f8e9dc658d4fb8519b7c8bccf11fbd6/qa/../src/test/rgw/rgw_multi' with args=['tests.py'] 2022-01-29T00:31:31.700 INFO:rgw_multi.tests:create bucket zone=a1 name=swdzrg-1 2022-01-29T00:31:35.011 INFO:rgw_multi.tests:create bucket zone=a2 name=swdzrg-2 *** Caught signal (Segmentation fault) ** in thread 7fa8e16e2700 thread_name:radosgw ceph version 17.0.0-10459-g7c1eb0cd (7c1eb0cd5f8e9dc658d4fb8519b7c8bccf11fbd6) quincy (dev) 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7faa8867d3c0] 2: (OpsLogRados::log(req_state*, rgw_log_entry&)+0x123) [0x7faa88b7d9a3] 3: (OpsLogManifold::log(req_state*, rgw_log_entry&)+0x3e) [0x7faa88b7ab9e] 4: (rgw_log_op(RGWREST*, req_state*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, OpsLogSink*)+0xd62) [0x7faa88b7e942] 5: (process_request(rgw::sal::Store*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSink*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, std::shared_ptr<RateLimiter>, int*)+0x1433) [0x7faa88b9c853] 6: /lib/libradosgw.so.2(+0x473384) [0x7faa88afe384] 7: /lib/libradosgw.so.2(+0x47469a) [0x7faa88aff69a] 8: /lib/libradosgw.so.2(+0x47481c) [0x7faa88aff81c] 9: make_fcontext()
from a recent master baseline, two rgw/multisite jobs failed this way:
http://qa-proxy.ceph.com/teuthology/yuriw-2022-01-28_16:15:58-rgw-wip-master-1.27.22-distro-default-smithi/6646932/teuthology.log
http://qa-proxy.ceph.com/teuthology/yuriw-2022-01-28_16:15:58-rgw-wip-master-1.27.22-distro-default-smithi/6646954/teuthology.log
Updated by Cory Snyder about 2 years ago
Seems to be caused by the Store being de-allocated when a realm is reloaded. OpsLogRados is not reinitialized to use the new Store created by the reload. We can fix this by using the same pattern that is used for the UsageLogger: make the OpsLogRados instance a static variable within rgw_log.cc and create init/finalize methods to manage it's lifecycle. The realm reloader can then call these methods to refresh the logger when it reloads.
Updated by Casey Bodley about 2 years ago
thanks for taking a look! another option to consider is letting RGWRados own OpsLogRados and handle its init/shutdown. that way, OpsLogRados never has a dangling pointer and RGWRealmReloader doesn't need another special case
Updated by Casey Bodley about 2 years ago
- Status changed from New to Fix Under Review
- Backport set to quincy
- Pull request ID set to 44893
Updated by Casey Bodley about 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot about 2 years ago
- Copied to Backport #54162: quincy: OpsLogRados::log segfaults in rgw/multisite suite added
Updated by Casey Bodley about 2 years ago
- Backport changed from quincy to octopus pacific quincy
Updated by Casey Bodley about 2 years ago
it looks like the OpsLogManifold stuff was backported further than quincy, and i'm seeing the crashes in pacific testing too
Updated by Backport Bot about 2 years ago
- Copied to Backport #54536: octopus: OpsLogRados::log segfaults in rgw/multisite suite added
Updated by Backport Bot about 2 years ago
- Copied to Backport #54537: pacific: OpsLogRados::log segfaults in rgw/multisite suite added
Updated by Cory Snyder about 2 years ago
Note that I closed the Octopus backport tracker since the offending ops log changes were never backported to that release.
Updated by Backport Bot over 1 year ago
- Tags changed from opslog to opslog backport_processed
Updated by Konstantin Shalygin over 1 year ago
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100
- Tags changed from opslog backport_processed to opslog