Actions
Bug #51574
openSegfault when uploading file
% Done:
0%
Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We recently upgraded our cluster to 16.2.4 but got segmentations faults in radosgw when uploading files.
At first, I thought we are hit by https://tracker.ceph.com/issues/50556, as very few uploads did work, and we are using bucket policies, but I was able to reproduce the issue with the following devel versions too. As far as know, they should have included the backport from 50556.
16.2.4-568-g2e1902f3 16.2.4-670-g468a1be6
I did run a radosgw via docker to reproduce the issue:
docker run --rm -it --net=host --user 64045:64045 -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ --name rgw.compute3 ceph/daemon-base:latest-pacific-devel@sha256:ce85def02b46df732434a553f0f343edd51ddbf67c1e0dc0a5b1ed19f32923ae radosgw -d --id rgw.test --keyring /etc/ceph/ceph.client.rgw.test.keyring --debug 255 2021-07-07T15:20:26.618+0000 7ff5f64e3440 0 ceph version 16.2.4-568-g2e1902f3 (2e1902f3a43860da461e68ebea5ef8dd48418278) pacific (stable), process radosgw, pid 1 2021-07-07T15:20:26.618+0000 7ff5f64e3440 0 framework: civetweb 2021-07-07T15:20:26.618+0000 7ff5f64e3440 0 framework conf key: port, val: 127.0.0.1:6080 2021-07-07T15:20:26.618+0000 7ff5f64e3440 1 radosgw_Main not setting numa affinity 2021-07-07T15:20:26.618+0000 7ff5f64e3440 -1 asok(0x55ba6c6e4000) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.rgw.test.1.94259171456320.asok': (13) Permission denied 2021-07-07T15:20:26.910+0000 7ff5f64e3440 0 framework: beast 2021-07-07T15:20:26.910+0000 7ff5f64e3440 0 framework conf key: ssl_certificate, val: config://rgw/cert/$realm/$zone.crt 2021-07-07T15:20:26.910+0000 7ff5f64e3440 0 framework conf key: ssl_private_key, val: config://rgw/cert/$realm/$zone.key 2021-07-07T15:20:26.910+0000 7ff5f64e3440 0 starting handler: civetweb 2021-07-07T15:20:27.002+0000 7ff5bd0e8700 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5, try again 2021-07-07T15:20:27.018+0000 7ff5f64e3440 1 mgrc service_daemon_register rgw.52645456 metadata {arch=x86_64,ceph_release=pacific,ceph_version=ceph version 16.2.4-568-g2e1902f3 (2e1902f3a43860da461e68ebea5ef8dd48418278) pacific (stable),ceph_version_short=16.2.4-568-g2e1902f3,cpu=AMD EPYC 7302P 16-Core Processor,distro=centos,distro_description=CentOS Linux 8,distro_version=8,frontend_config#0=civetweb port=127.0.0.1:6080,frontend_type#0=civetweb,hostname=core-a,id=test,kernel_description=#52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020,kernel_version=5.4.0-48-generic,mem_swap_kb=16759804,mem_total_kb=131448768,num_handles=1,os=Linux,pid=1,zone_id=5d41157e-dd10-42a1-99c7-542bf1fc6645,zone_name=default,zonegroup_id=99c1add5-41f3-4b7a-b2bd-32a84919c2db,zonegroup_name=default} 2021-07-07T15:20:27.022+0000 7ff5b90e0700 0 lifecycle: RGWLC::process() failed to acquire lock on lc.5, sleep 5, try again 2021-07-07T15:21:02.215+0000 7ff5b48d7700 1 ====== starting new request req=0x7ff5b48ced10 ===== 2021-07-07T15:21:02.223+0000 7ff5b48d7700 1 ====== req done req=0x7ff5b48ced10 op status=0 http_status=200 latency=0.008000219s ====== 2021-07-07T15:21:02.223+0000 7ff5b48d7700 1 civetweb: 0x55ba6d864000: 127.0.0.1 - - [07/Jul/2021:15:21:02 +0000] "OPTIONS /uploads HTTP/1.0" 200 354 https://example.org/ Mozilla/5.0 (X11; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0 2021-07-07T15:21:02.271+0000 7ff5b48d7700 1 ====== starting new request req=0x7ff5b48ced10 ===== 2021-07-07T15:21:02.307+0000 7ff5b48d7700 0 req 2 0.036000986s s3:post_obj Signature verification algorithm AWS v4 (AWS4-HMAC-SHA256) 2021-07-07T15:21:02.307+0000 7ff5b48d7700 0 req 2 0.036000986s Signature verification algorithm AWS v4 (AWS4-HMAC-SHA256) 2021-07-07T15:21:02.311+0000 7ff5b48d7700 1 policy condition check $key [uploads/0f45545d-09ac-4040-a744-93aa3ddc4c47/13fec8c30338d94b6767ac4c6f54df14215b1d241e9719deaa5cc74608f43398_1.jpg] uploads/0f45545d-09ac-4040-a744-93aa3ddc4c47/ [uploads/0f45545d-09ac-4040-a744-93aa3ddc4c47/] 2021-07-07T15:21:02.311+0000 7ff5b48d7700 1 policy condition check $Content-Type [image/jpeg] [] *** Caught signal (Segmentation fault) ** in thread 7ff5b48d7700 thread_name:civetweb-worker ceph version 16.2.4-568-g2e1902f3 (2e1902f3a43860da461e68ebea5ef8dd48418278) pacific (stable) 1: /lib64/libpthread.so.0(+0x12b20) [0x7ff5ea8beb20] 2: (rgw_bucket::rgw_bucket(rgw_bucket const&)+0x23) [0x7ff5f5717123] 3: (rgw::sal::RGWObject::get_obj() const+0x20) [0x7ff5f5747320] 4: (RGWPostObj::execute(optional_yield)+0xb0) [0x7ff5f5a76250] 5: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, bool)+0xb12) [0x7ff5f56f5a82] 6: (process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x2851) [0x7ff5f56f98d1] 7: (RGWCivetWebFrontend::process(mg_connection*)+0x29b) [0x7ff5f562fa8b] 8: /lib64/libradosgw.so.2(+0x62a8f6) [0x7ff5f57c88f6] 9: /lib64/libradosgw.so.2(+0x62c567) [0x7ff5f57ca567] 10: /lib64/libradosgw.so.2(+0x62ca28) [0x7ff5f57caa28] 11: /lib64/libpthread.so.0(+0x814a) [0x7ff5ea8b414a] 12: clone() 2021-07-07T15:21:02.315+0000 7ff5b48d7700 -1 *** Caught signal (Segmentation fault) ** in thread 7ff5b48d7700 thread_name:civetweb-worker ceph version 16.2.4-568-g2e1902f3 (2e1902f3a43860da461e68ebea5ef8dd48418278) pacific (stable) 1: /lib64/libpthread.so.0(+0x12b20) [0x7ff5ea8beb20] 2: (rgw_bucket::rgw_bucket(rgw_bucket const&)+0x23) [0x7ff5f5717123] 3: (rgw::sal::RGWObject::get_obj() const+0x20) [0x7ff5f5747320] 4: (RGWPostObj::execute(optional_yield)+0xb0) [0x7ff5f5a76250] 5: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, bool)+0xb12) [0x7ff5f56f5a82] 6: (process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x2851) [0x7ff5f56f98d1] 7: (RGWCivetWebFrontend::process(mg_connection*)+0x29b) [0x7ff5f562fa8b] 8: /lib64/libradosgw.so.2(+0x62a8f6) [0x7ff5f57c88f6] 9: /lib64/libradosgw.so.2(+0x62c567) [0x7ff5f57ca567] 10: /lib64/libradosgw.so.2(+0x62ca28) [0x7ff5f57caa28] 11: /lib64/libpthread.so.0(+0x814a) [0x7ff5ea8b414a] 12: clone()
This completely blocks use from upgrading radosgw, as most buckets and uploads in our cloud are affected. We are currently running all components on 16.2.4 (via Debian packages), but only radosgw on v15.2 (via docker).
Actions