Project

General

Profile

Actions

Bug #52530

closed

segfault in rgw_log_op()

Added by Casey Bodley over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-09-06T13:29:33.806 INFO:tasks.rgw.client.0.smithi080.stdout:*** Caught signal (Segmentation fault) **
2021-09-06T13:29:33.807 INFO:tasks.rgw.client.0.smithi080.stdout: in thread 7fbf42c36700 thread_name:radosgw
2021-09-06T13:29:33.808 INFO:tasks.rgw.client.0.smithi080.stdout: ceph version 17.0.0-7472-g0eb1a794 (0eb1a7943dd70e2a0b3086ea680284137a187e73) quincy (dev)
2021-09-06T13:29:33.808 INFO:tasks.rgw.client.0.smithi080.stdout: 1: /lib64/libpthread.so.0(+0x12b20) [0x7fbf98735b20]
2021-09-06T13:29:33.808 INFO:tasks.rgw.client.0.smithi080.stdout: 2: (rgw_log_op(rgw::sal::Store*, RGWREST*, req_state*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, OpsLogSocket*)+0x5be) [0x7fbf9b33276e]
2021-09-06T13:29:33.808 INFO:tasks.rgw.client.0.smithi080.stdout: 3: (process_request(rgw::sal::Store*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x1864) [0x7fbf9b3504d4]
2021-09-06T13:29:33.809 INFO:tasks.rgw.client.0.smithi080.stdout: 4: /lib64/libradosgw.so.2(+0x533c3d) [0x7fbf9b283c3d]
2021-09-06T13:29:33.809 INFO:tasks.rgw.client.0.smithi080.stdout: 5: /lib64/libradosgw.so.2(+0x535220) [0x7fbf9b285220]
2021-09-06T13:29:33.809 INFO:tasks.rgw.client.0.smithi080.stdout: 6: /lib64/libradosgw.so.2(+0x53537c) [0x7fbf9b28537c]
2021-09-06T13:29:33.809 INFO:tasks.rgw.client.0.smithi080.stdout: 7: make_fcontext()

ex. http://qa-proxy.ceph.com/teuthology/soumyakoduri-2021-09-06_10:30:05-rgw-wip-skoduri-lua-distro-basic-smithi/6376864/teuthology.log


Related issues 1 (0 open1 closed)

Copied to rgw - Backport #52787: pacific: segfault in rgw_log_op()ResolvedActions
Actions #2

Updated by Soumya Koduri over 2 years ago

Casey Bodley wrote:

possibly caused by https://github.com/ceph/ceph/pull/39933?

I suspected the same and ran tests on latest master with the above patch reverted. Below are the results -
https://pulpito.ceph.com/soumyakoduri-2021-09-07_17:44:21-rgw-wip-skoduri-testing-distro-basic-smithi/
branch - https://github.com/soumyakoduri/ceph/commits/wip-skoduri-testing

There are still failures mainly with multisite, multifs, verify tests but at least the crash in rgw_log_op is not reported.

Actions #3

Updated by J. Eric Ivancich over 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 43119

I've created a PR to revert the commit. It's just a bandaid since there's still a need to fix the issue the PR was meant to address.

Actions #4

Updated by J. Eric Ivancich over 2 years ago

  • Assignee set to J. Eric Ivancich
Actions #5

Updated by Casey Bodley over 2 years ago

https://github.com/ceph/ceph/pull/43071 may be the real fix here?

Actions #6

Updated by Pritha Srivastava over 2 years ago

Casey Bodley wrote:

https://github.com/ceph/ceph/pull/43071 may be the real fix here?

I had observed a segfault while calling assumerolewithwebidentity (when it fails for any reason), the identity type is not set to anything, since this op is not authenticated by rgw. We will observe a similar crash for any other op for which the identity type pointer is null (I am not sure of such scenarios though).

Actions #7

Updated by J. Eric Ivancich over 2 years ago

Casey Bodley wrote:

https://github.com/ceph/ceph/pull/43071 may be the real fix here?

Should we switch the Pull Request ID over and I'll close my PR?

Actions #8

Updated by Matt Benjamin over 2 years ago

We agreed to run Pritha's PR (plus another change) through Teuthology first--if it doesn't address the segfault, we should do the revert--so let's keep your PR open.

Matt

Actions #9

Updated by Matt Benjamin over 2 years ago

Pritha's PR https://github.com/ceph/ceph/pull/43071 fixes this crash!

Actions #10

Updated by J. Eric Ivancich over 2 years ago

Matt Benjamin wrote:

Pritha's PR https://github.com/ceph/ceph/pull/43071 fixes this crash!

Since the revert PR has been closed, I'm changing the PR for this tracker to Pritha's. I'll also back-link if necessary.

Actions #11

Updated by J. Eric Ivancich over 2 years ago

  • Pull request ID changed from 43119 to 43071
Actions #12

Updated by J. Eric Ivancich over 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions #13

Updated by Pritha Srivastava over 2 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to pacific

This has to be backported after https://github.com/ceph/ceph/pull/41735

Actions #14

Updated by Backport Bot over 2 years ago

Actions #15

Updated by Cory Snyder over 2 years ago

I don't believe that this fix needs to be backported to Pacific because rgw_log_entry does not have the identity_type field in that release series. The offending line (https://github.com/ceph/ceph/pull/43071/files#diff-310d9fbebe2238d31ebae638b48a4843c657c30dac1ebc5ac28452c83819e858L438) does not exist there.

Actions #16

Updated by Casey Bodley over 2 years ago

  • Status changed from Pending Backport to Resolved
  • Backport deleted (pacific)

Cory Snyder wrote:

I don't believe that this fix needs to be backported to Pacific because rgw_log_entry does not have the identity_type field in that release series. The offending line (https://github.com/ceph/ceph/pull/43071/files#diff-310d9fbebe2238d31ebae638b48a4843c657c30dac1ebc5ac28452c83819e858L438) does not exist there.

great, thanks for looking into it

Actions #17

Updated by Pritha Srivastava over 2 years ago

Just reiterating that this fix needs to be backported, once https://github.com/ceph/ceph/pull/43956 gets merged to pacific.

Actions

Also available in: Atom PDF