Bug #52530
closedsegfault in rgw_log_op()
0%
Description
2021-09-06T13:29:33.806 INFO:tasks.rgw.client.0.smithi080.stdout:*** Caught signal (Segmentation fault) ** 2021-09-06T13:29:33.807 INFO:tasks.rgw.client.0.smithi080.stdout: in thread 7fbf42c36700 thread_name:radosgw 2021-09-06T13:29:33.808 INFO:tasks.rgw.client.0.smithi080.stdout: ceph version 17.0.0-7472-g0eb1a794 (0eb1a7943dd70e2a0b3086ea680284137a187e73) quincy (dev) 2021-09-06T13:29:33.808 INFO:tasks.rgw.client.0.smithi080.stdout: 1: /lib64/libpthread.so.0(+0x12b20) [0x7fbf98735b20] 2021-09-06T13:29:33.808 INFO:tasks.rgw.client.0.smithi080.stdout: 2: (rgw_log_op(rgw::sal::Store*, RGWREST*, req_state*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, OpsLogSocket*)+0x5be) [0x7fbf9b33276e] 2021-09-06T13:29:33.808 INFO:tasks.rgw.client.0.smithi080.stdout: 3: (process_request(rgw::sal::Store*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x1864) [0x7fbf9b3504d4] 2021-09-06T13:29:33.809 INFO:tasks.rgw.client.0.smithi080.stdout: 4: /lib64/libradosgw.so.2(+0x533c3d) [0x7fbf9b283c3d] 2021-09-06T13:29:33.809 INFO:tasks.rgw.client.0.smithi080.stdout: 5: /lib64/libradosgw.so.2(+0x535220) [0x7fbf9b285220] 2021-09-06T13:29:33.809 INFO:tasks.rgw.client.0.smithi080.stdout: 6: /lib64/libradosgw.so.2(+0x53537c) [0x7fbf9b28537c] 2021-09-06T13:29:33.809 INFO:tasks.rgw.client.0.smithi080.stdout: 7: make_fcontext()
Updated by Casey Bodley over 2 years ago
possibly caused by https://github.com/ceph/ceph/pull/39933?
Updated by Soumya Koduri over 2 years ago
Casey Bodley wrote:
possibly caused by https://github.com/ceph/ceph/pull/39933?
I suspected the same and ran tests on latest master with the above patch reverted. Below are the results -
https://pulpito.ceph.com/soumyakoduri-2021-09-07_17:44:21-rgw-wip-skoduri-testing-distro-basic-smithi/
branch - https://github.com/soumyakoduri/ceph/commits/wip-skoduri-testing
There are still failures mainly with multisite, multifs, verify tests but at least the crash in rgw_log_op is not reported.
Updated by J. Eric Ivancich over 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 43119
I've created a PR to revert the commit. It's just a bandaid since there's still a need to fix the issue the PR was meant to address.
Updated by Casey Bodley over 2 years ago
https://github.com/ceph/ceph/pull/43071 may be the real fix here?
Updated by Pritha Srivastava over 2 years ago
Casey Bodley wrote:
https://github.com/ceph/ceph/pull/43071 may be the real fix here?
I had observed a segfault while calling assumerolewithwebidentity (when it fails for any reason), the identity type is not set to anything, since this op is not authenticated by rgw. We will observe a similar crash for any other op for which the identity type pointer is null (I am not sure of such scenarios though).
Updated by J. Eric Ivancich over 2 years ago
Casey Bodley wrote:
https://github.com/ceph/ceph/pull/43071 may be the real fix here?
Should we switch the Pull Request ID over and I'll close my PR?
Updated by Matt Benjamin over 2 years ago
We agreed to run Pritha's PR (plus another change) through Teuthology first--if it doesn't address the segfault, we should do the revert--so let's keep your PR open.
Matt
Updated by Matt Benjamin over 2 years ago
Pritha's PR https://github.com/ceph/ceph/pull/43071 fixes this crash!
Updated by J. Eric Ivancich over 2 years ago
Matt Benjamin wrote:
Pritha's PR https://github.com/ceph/ceph/pull/43071 fixes this crash!
Since the revert PR has been closed, I'm changing the PR for this tracker to Pritha's. I'll also back-link if necessary.
Updated by J. Eric Ivancich over 2 years ago
- Pull request ID changed from 43119 to 43071
Updated by J. Eric Ivancich over 2 years ago
- Status changed from Fix Under Review to Resolved
Updated by Pritha Srivastava over 2 years ago
- Status changed from Resolved to Pending Backport
- Backport set to pacific
This has to be backported after https://github.com/ceph/ceph/pull/41735
Updated by Backport Bot over 2 years ago
- Copied to Backport #52787: pacific: segfault in rgw_log_op() added
Updated by Cory Snyder over 2 years ago
I don't believe that this fix needs to be backported to Pacific because rgw_log_entry does not have the identity_type field in that release series. The offending line (https://github.com/ceph/ceph/pull/43071/files#diff-310d9fbebe2238d31ebae638b48a4843c657c30dac1ebc5ac28452c83819e858L438) does not exist there.
Updated by Casey Bodley over 2 years ago
- Status changed from Pending Backport to Resolved
- Backport deleted (
pacific)
Cory Snyder wrote:
I don't believe that this fix needs to be backported to Pacific because rgw_log_entry does not have the identity_type field in that release series. The offending line (https://github.com/ceph/ceph/pull/43071/files#diff-310d9fbebe2238d31ebae638b48a4843c657c30dac1ebc5ac28452c83819e858L438) does not exist there.
great, thanks for looking into it
Updated by Pritha Srivastava over 2 years ago
Just reiterating that this fix needs to be backported, once https://github.com/ceph/ceph/pull/43956 gets merged to pacific.