Project

General

Profile

Actions

Bug #53913

closed

rgw: s3website crashes after upgrade from octopus to pacific

Added by Hubert Niedlich over 2 years ago. Updated almost 2 years ago.

Status:
Duplicate
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
website
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After update from octopus v15.2.15 to pacific v16.2.7 rgw with only s3website api enabled is randomly crashing:

-8> 2022-01-18T10:28:09.309+0100 7f2b088a2700  1 ====== starting new request req=0x7f2af77fe620 =====
-7> 2022-01-18T10:28:09.309+0100 7f2b088a2700 2 req 15814323700875663596 0.000000000s initializing for trans_id = tx00000db77c39e39bc18ec-0061e68829-1e443996-default
-6> 2022-01-18T10:28:09.311+0100 7f2b088a2700 -1 res_query() failed
-5> 2022-01-18T10:28:09.311+0100 7f2b088a2700 2 req 15814323700875663596 0.002000030s getting op 0
-4> 2022-01-18T10:28:09.311+0100 7f2b088a2700 2 req 15814323700875663596 0.002000030s s3:get_obj verifying requester
-3> 2022-01-18T10:28:09.311+0100 7f2b088a2700 2 req 15814323700875663596 0.002000030s s3:get_obj normalizing buckets and tenants
-2> 2022-01-18T10:28:09.311+0100 7f2b088a2700 2 req 15814323700875663596 0.002000030s s3:get_obj init permissions
-1> 2022-01-18T10:28:09.311+0100 7f2b088a2700 2 req 15814323700875663596 0.002000030s s3:get_obj recalculating target
0> 2022-01-18T10:28:09.313+0100 7f2b088a2700 -1 * Caught signal (Segmentation fault) *
in thread 7f2b088a2700 thread_name:radosgw
ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
1: /lib64/libpthread.so.0(+0x12c20) [0x7f2c2ca32c20]
2: (RGWHandler_REST_S3Website::retarget(RGWOp
, RGWOp
, optional_yield)+0x174) [0x7f2c37aabc24]
3: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, bool)+0xf0a) [0x7f2c3764a35a]
4: (process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSink*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, int*)+0x2891) [0x7f2c3764de21]
5: /lib64/libradosgw.so.2(+0x4b1b63) [0x7f2c3759db63]
6: /lib64/libradosgw.so.2(+0x4b3604) [0x7f2c3759f604]
7: /lib64/libradosgw.so.2(+0x4b386e) [0x7f2c3759f86e]
8: make_fcontext()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.]]

I was trying to reproduce error, but without luck.


Related issues 2 (0 open2 closed)

Related to rgw - Bug #51491: "ERROR: s3tests.functional.test_s3_website.test_website_xredirect_private_relative" in rgwResolved

Actions
Is duplicate of rgw - Bug #56281: crash: RGWHandler_REST_S3Website::retarget(RGWOp*, RGWOp**, optional_yield)Resolved

Actions
Actions #1

Updated by Casey Bodley over 2 years ago

  • Related to Bug #51491: "ERROR: s3tests.functional.test_s3_website.test_website_xredirect_private_relative" in rgw added
Actions #2

Updated by Casey Bodley over 2 years ago

After update from octopus v15.2.15 to pacific v16.2.7 rgw with only s3website api enabled is randomly crashing:

strange, this looks like the same crash from https://tracker.ceph.com/issues/51491, and its pacific backport https://tracker.ceph.com/issues/52468 says it was fixed in 16.2.7

Actions #3

Updated by Hubert Niedlich about 2 years ago

I did some tests, tried to curl known fixed issues, like double ending slashes etc, with no rgw crash. Also fresh new installed rgw are crashing, so this issue was not caused by upgrade. Crashes occur about 20 times per day, at random hours, everytime producing same output like posted before. I'm open to further testing if you provide me what to check.

Actions #4

Updated by Casey Bodley about 2 years ago

  • Status changed from New to Need More Info

in the bug description, you share 8 lines of the radosgw log leading up to the crash. are you able to capture a log of the full request (with debug_rgw=20 and debug_ms=1) that triggers this crash?

Actions #5

Updated by Hubert Niedlich about 2 years ago

Is that enough? previous lines are just osd spams

33> 2022-02-04T08:36:56.009+0100 7f5968b77700 10 monclient: tick
-32> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 HTTP_ACCEPT=*/*
-31> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 HTTP_ACCEPT_ENCODING=deflate, gzip, br
-30> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 HTTP_HOST=s3web.exea.pl
-29> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 HTTP_USER_AGENT=curl/7.74.0
-28> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 HTTP_VERSION=1.1
-27> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 REMOTE_ADDR=10.5.11.142
-26> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 REQUEST_METHOD=GET
-25> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 REQUEST_URI=///installer.php
-24> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 SCRIPT_URI=///installer.php
-23> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 SERVER_PORT=8080
-22> 2022-02-04T08:36:56.600+0100 7f58c422e700 1 ====== starting new request req=0x7f5859f55620 =====
-21> 2022-02-04T08:36:56.600+0100 7f58c422e700 2 req 11648163593991034750 0.000000000s initializing for trans_id = tx00000a1a69712f979177e-0061fcd798-1ef828a1-default
-20> 2022-02-04T08:36:56.600+0100 7f58c422e700 10 req 11648163593991034750 0.000000000s rgw api priority: s3=-1 s3website=1
-19> 2022-02-04T08:36:56.600+0100 7f58c422e700 10 req 11648163593991034750 0.000000000s host=s3web.exea.pl
-18> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 req 11648163593991034750 0.000000000s subdomain= domain=s3web.exea.pl in_hosted_domain=1 in_hosted_domain_s3website=1
-17> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 req 11648163593991034750 0.000000000s final domain/bucket subdomain= domain=s3web.exea.pl in_hosted_domain=1 in_hosted_domain_s3website=1 s
>info.domain=s3web.exea.pl s->info.request_uri=///installer.php
16> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 req 11648163593991034750 0.000000000s get_handler handler=33RGWHandler_REST_Service_S3Website
-15> 2022-02-04T08:36:56.600+0100 7f58c422e700 10 req 11648163593991034750 0.000000000s handler=33RGWHandler_REST_Service_S3Website
-14> 2022-02-04T08:36:56.600+0100 7f58c422e700 2 req 11648163593991034750 0.000000000s getting op 0
-13> 2022-02-04T08:36:56.600+0100 7f58c422e700 10 req 11648163593991034750 0.000000000s s3:get_obj scheduling with throttler client=2 cost=1
-12> 2022-02-04T08:36:56.600+0100 7f58c422e700 10 req 11648163593991034750 0.000000000s s3:get_obj op=28RGWGetObj_ObjStore_S3Website
-11> 2022-02-04T08:36:56.600+0100 7f58c422e700 2 req 11648163593991034750 0.000000000s s3:get_obj verifying requester
-10> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 req 11648163593991034750 0.000000000s s3:get_obj rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy
-9> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 req 11648163593991034750 0.000000000s s3:get_obj rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::S3AnonymousEngine
-8> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 req 11648163593991034750 0.000000000s s3:get_obj rgw::auth::s3::S3AnonymousEngine granted access
-7> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 req 11648163593991034750 0.000000000s s3:get_obj rgw::auth::s3::AWSAuthStrategy granted access
-6> 2022-02-04T08:36:56.600+0100 7f58c422e700 2 req 11648163593991034750 0.000000000s s3:get_obj normalizing buckets and tenants
-5> 2022-02-04T08:36:56.600+0100 7f58c422e700 10 req 11648163593991034750 0.000000000s s
>object=/installer.php s->bucket=
-4> 2022-02-04T08:36:56.600+0100 7f58c422e700 2 req 11648163593991034750 0.000000000s s3:get_obj init permissions
-3> 2022-02-04T08:36:56.600+0100 7f58c422e700 20 req 11648163593991034750 0.000000000s s3:get_obj RGWSI_User_RADOS::read_user_info(): anonymous user
-2> 2022-02-04T08:36:56.600+0100 7f58c422e700 2 req 11648163593991034750 0.000000000s s3:get_obj recalculating target
-1> 2022-02-04T08:36:56.600+0100 7f58c422e700 10 req 11648163593991034750 0.000000000s retarget Starting retarget
0> 2022-02-04T08:36:56.603+0100 7f58c422e700 -1 * Caught signal (Segmentation fault) *
in thread 7f58c422e700 thread_name:radosgw
ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
1: /lib64/libpthread.so.0(+0x12c20) [0x7f598f30cc20]
2: (RGWHandler_REST_S3Website::retarget(RGWOp
, RGWOp
, optional_yield)+0x174) [0x7f599a385c24]
3: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, bool)+0xf0a) [0x7f5999f2435a]
4: (process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSink*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, int*)+0x2891) [0x7f5999f27e21]
5: /lib64/libradosgw.so.2(+0x4b1b63) [0x7f5999e77b63]
6: /lib64/libradosgw.so.2(+0x4b3604) [0x7f5999e79604]
7: /lib64/libradosgw.so.2(+0x4b386e) [0x7f5999e7986e]
8: make_fcontext()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #6

Updated by Daniel Gryniewicz about 2 years ago

  • Assignee set to Daniel Gryniewicz
Actions #7

Updated by Casey Bodley almost 2 years ago

  • Is duplicate of Bug #56281: crash: RGWHandler_REST_S3Website::retarget(RGWOp*, RGWOp**, optional_yield) added
Actions #8

Updated by Casey Bodley almost 2 years ago

  • Status changed from Need More Info to Duplicate
Actions

Also available in: Atom PDF