Project

General

Profile

Actions

Bug #51574

open

Segfault when uploading file

Added by Jan Graichen almost 3 years ago. Updated 11 months ago.

Status:
Fix Under Review
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We recently upgraded our cluster to 16.2.4 but got segmentations faults in radosgw when uploading files.

At first, I thought we are hit by https://tracker.ceph.com/issues/50556, as very few uploads did work, and we are using bucket policies, but I was able to reproduce the issue with the following devel versions too. As far as know, they should have included the backport from 50556.

16.2.4-568-g2e1902f3
16.2.4-670-g468a1be6

I did run a radosgw via docker to reproduce the issue:

docker run --rm -it --net=host --user 64045:64045 -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ --name rgw.compute3 ceph/daemon-base:latest-pacific-devel@sha256:ce85def02b46df732434a553f0f343edd51ddbf67c1e0dc0a5b1ed19f32923ae radosgw -d --id rgw.test --keyring /etc/ceph/ceph.client.rgw.test.keyring --debug 255
2021-07-07T15:20:26.618+0000 7ff5f64e3440  0 ceph version 16.2.4-568-g2e1902f3 (2e1902f3a43860da461e68ebea5ef8dd48418278) pacific (stable), process radosgw, pid 1
2021-07-07T15:20:26.618+0000 7ff5f64e3440  0 framework: civetweb
2021-07-07T15:20:26.618+0000 7ff5f64e3440  0 framework conf key: port, val: 127.0.0.1:6080
2021-07-07T15:20:26.618+0000 7ff5f64e3440  1 radosgw_Main not setting numa affinity
2021-07-07T15:20:26.618+0000 7ff5f64e3440 -1 asok(0x55ba6c6e4000) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.rgw.test.1.94259171456320.asok': (13) Permission denied
2021-07-07T15:20:26.910+0000 7ff5f64e3440  0 framework: beast
2021-07-07T15:20:26.910+0000 7ff5f64e3440  0 framework conf key: ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
2021-07-07T15:20:26.910+0000 7ff5f64e3440  0 framework conf key: ssl_private_key, val: config://rgw/cert/$realm/$zone.key
2021-07-07T15:20:26.910+0000 7ff5f64e3440  0 starting handler: civetweb
2021-07-07T15:20:27.002+0000 7ff5bd0e8700  0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5, try again
2021-07-07T15:20:27.018+0000 7ff5f64e3440  1 mgrc service_daemon_register rgw.52645456 metadata {arch=x86_64,ceph_release=pacific,ceph_version=ceph version 16.2.4-568-g2e1902f3 (2e1902f3a43860da461e68ebea5ef8dd48418278) pacific (stable),ceph_version_short=16.2.4-568-g2e1902f3,cpu=AMD EPYC 7302P 16-Core Processor,distro=centos,distro_description=CentOS Linux 8,distro_version=8,frontend_config#0=civetweb port=127.0.0.1:6080,frontend_type#0=civetweb,hostname=core-a,id=test,kernel_description=#52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020,kernel_version=5.4.0-48-generic,mem_swap_kb=16759804,mem_total_kb=131448768,num_handles=1,os=Linux,pid=1,zone_id=5d41157e-dd10-42a1-99c7-542bf1fc6645,zone_name=default,zonegroup_id=99c1add5-41f3-4b7a-b2bd-32a84919c2db,zonegroup_name=default}
2021-07-07T15:20:27.022+0000 7ff5b90e0700  0 lifecycle: RGWLC::process() failed to acquire lock on lc.5, sleep 5, try again
2021-07-07T15:21:02.215+0000 7ff5b48d7700  1 ====== starting new request req=0x7ff5b48ced10 =====
2021-07-07T15:21:02.223+0000 7ff5b48d7700  1 ====== req done req=0x7ff5b48ced10 op status=0 http_status=200 latency=0.008000219s ======
2021-07-07T15:21:02.223+0000 7ff5b48d7700  1 civetweb: 0x55ba6d864000: 127.0.0.1 - - [07/Jul/2021:15:21:02 +0000] "OPTIONS /uploads HTTP/1.0" 200 354 https://example.org/ Mozilla/5.0 (X11; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0
2021-07-07T15:21:02.271+0000 7ff5b48d7700  1 ====== starting new request req=0x7ff5b48ced10 =====
2021-07-07T15:21:02.307+0000 7ff5b48d7700  0 req 2 0.036000986s s3:post_obj Signature verification algorithm AWS v4 (AWS4-HMAC-SHA256)
2021-07-07T15:21:02.307+0000 7ff5b48d7700  0 req 2 0.036000986s Signature verification algorithm AWS v4 (AWS4-HMAC-SHA256)
2021-07-07T15:21:02.311+0000 7ff5b48d7700  1 policy condition check $key [uploads/0f45545d-09ac-4040-a744-93aa3ddc4c47/13fec8c30338d94b6767ac4c6f54df14215b1d241e9719deaa5cc74608f43398_1.jpg] uploads/0f45545d-09ac-4040-a744-93aa3ddc4c47/ [uploads/0f45545d-09ac-4040-a744-93aa3ddc4c47/]
2021-07-07T15:21:02.311+0000 7ff5b48d7700  1 policy condition check $Content-Type [image/jpeg]  []
*** Caught signal (Segmentation fault) **
 in thread 7ff5b48d7700 thread_name:civetweb-worker
 ceph version 16.2.4-568-g2e1902f3 (2e1902f3a43860da461e68ebea5ef8dd48418278) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7ff5ea8beb20]
 2: (rgw_bucket::rgw_bucket(rgw_bucket const&)+0x23) [0x7ff5f5717123]
 3: (rgw::sal::RGWObject::get_obj() const+0x20) [0x7ff5f5747320]
 4: (RGWPostObj::execute(optional_yield)+0xb0) [0x7ff5f5a76250]
 5: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, bool)+0xb12) [0x7ff5f56f5a82]
 6: (process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x2851) [0x7ff5f56f98d1]
 7: (RGWCivetWebFrontend::process(mg_connection*)+0x29b) [0x7ff5f562fa8b]
 8: /lib64/libradosgw.so.2(+0x62a8f6) [0x7ff5f57c88f6]
 9: /lib64/libradosgw.so.2(+0x62c567) [0x7ff5f57ca567]
 10: /lib64/libradosgw.so.2(+0x62ca28) [0x7ff5f57caa28]
 11: /lib64/libpthread.so.0(+0x814a) [0x7ff5ea8b414a]
 12: clone()
2021-07-07T15:21:02.315+0000 7ff5b48d7700 -1 *** Caught signal (Segmentation fault) **
 in thread 7ff5b48d7700 thread_name:civetweb-worker

 ceph version 16.2.4-568-g2e1902f3 (2e1902f3a43860da461e68ebea5ef8dd48418278) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7ff5ea8beb20]
 2: (rgw_bucket::rgw_bucket(rgw_bucket const&)+0x23) [0x7ff5f5717123]
 3: (rgw::sal::RGWObject::get_obj() const+0x20) [0x7ff5f5747320]
 4: (RGWPostObj::execute(optional_yield)+0xb0) [0x7ff5f5a76250]
 5: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, bool)+0xb12) [0x7ff5f56f5a82]
 6: (process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x2851) [0x7ff5f56f98d1]
 7: (RGWCivetWebFrontend::process(mg_connection*)+0x29b) [0x7ff5f562fa8b]
 8: /lib64/libradosgw.so.2(+0x62a8f6) [0x7ff5f57c88f6]
 9: /lib64/libradosgw.so.2(+0x62c567) [0x7ff5f57ca567]
 10: /lib64/libradosgw.so.2(+0x62ca28) [0x7ff5f57caa28]
 11: /lib64/libpthread.so.0(+0x814a) [0x7ff5ea8b414a]
 12: clone()

This completely blocks use from upgrading radosgw, as most buckets and uploads in our cloud are affected. We are currently running all components on 16.2.4 (via Debian packages), but only radosgw on v15.2 (via docker).


Related issues 1 (0 open1 closed)

Is duplicate of rgw - Bug #50556: Reproducible crash on multipart upload to bucket with policyResolvedOr Friedmann

Actions
Actions

Also available in: Atom PDF