Project

General

Profile

Actions

Bug #44570

open

rgw segfault when uploading files with BEAST

Added by Patrick McLean about 4 years ago. Updated almost 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This happens 100% of the time with a command like this:

aws --endpoint-url http://&lt;host&gt;:7480 s3 cp <big file> s3://<user>

This is with ceph-12.2.8 using the bundled version of boost, compiled with gcc 9.2.0.

I have attached a full backtrace ("thread apply all bt"). Please let me know if there is any other information I can provide.


Files

core.msgr-worker-1.14950.1583968363.gdb.xz (23.8 KB) core.msgr-worker-1.14950.1583968363.gdb.xz Patrick McLean, 03/11/2020 11:38 PM
ceph-conf-redacted.txt (3.88 KB) ceph-conf-redacted.txt ceph.conf with redactions JuanJose Galvez, 03/12/2020 08:28 PM
redacted-small-and-large-file-upload.zip (145 KB) redacted-small-and-large-file-upload.zip log showing seg fault for both small and large file uploads JuanJose Galvez, 03/12/2020 10:47 PM
Actions #1

Updated by Patrick McLean about 4 years ago

More readable version of command:

aws --endpoint-url http://[host]:7480 s3 cp [big file] s3://[user]

Actions #2

Updated by Mark Kogan about 4 years ago

  • Status changed from New to Need More Info

For reproduction purposes, few questions please:

- what is the size of the large file?
- were the default multipart settings of the aws cli command changed?
- please attach ceph.conf and radosgw log file with the seg fault.

Thanks

Actions #3

Updated by Mark Kogan about 4 years ago

  • Affected Versions v12.2.8 added
Actions #4

Updated by Casey Bodley about 4 years ago

this is luminous? does this reproduce in later (supported) releases?

Actions #5

Updated by Patrick McLean about 4 years ago

Argh, I had an inadvertent typo, this is actually 14.2.8. The backtraces attached and everything are from 14.2.8 (nautilus). We had the same issue on 12, and were using civetweb, we were hoping it would be fixed with nautilus.

Actions #6

Updated by JuanJose Galvez about 4 years ago

  • Affected Versions v14.2.8 added
  • Affected Versions deleted (v12.2.8)

Updated affected version. This is happening on v14.2.8 with the bundled boost packages.

Updated by JuanJose Galvez about 4 years ago

Mark Kogan wrote:

For reproduction purposes, few questions please:

- what is the size of the large file?

The file is a few GB's in size, however the crash happens at seemingly random points early on. I can also reproduce this error with a small 1M file.

- were the default multipart settings of the aws cli command changed?

No changes to the aws cli multiplart settings.

- please attach ceph.conf and radosgw log file with the seg fault.

I've attached a redacted ceph.conf and a log where the crash was reproduced with both a large file and a small file upload. debug_rgw = 20 and debug_ms = 1

Thanks

Please let us know if there is anything more we can provide to help identify and resolve this.

Actions #8

Updated by Mark Kogan about 4 years ago

  • Assignee set to Mark Kogan
Actions #10

Updated by Mark Kogan about 4 years ago

Posting the callstacks from the attached logs:

ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)
 1: (()+0x149f0) [0x715520b039f0]
 2: (()+0x56f6d) [0x715529d60f6d]
 3: (ceph::buffer::v14_2_0::ptr_node::cloner::operator()(ceph::buffer::v14_2_0::ptr_node const&)+0x4c) [0x715529d7d36c]
 4: (()+0xded8c) [0x715529de8d8c]
 5: (()+0xae6d4) [0x715529db86d4]
 6: (()+0xb8371) [0x715529dc2371]
 7: (()+0xb9c1b) [0x715529dc3c1b]
 8: (()+0xc8b5b) [0x715529dd2b5b]
 9: (()+0xc8d7f) [0x715529dd2d7f]
 10: (()+0x8978a) [0x715529d9378a]
 11: (librados::v14_2_0::IoCtx::operate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, librados::v14_2_0::ObjectReadOperation*, ceph::buffer::v14_2_0::list*)+0x5f) [0x715529d62fef]
 12: (rgw_rados_operate(librados::v14_2_0::IoCtx&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, librados::v14_2_0::ObjectReadOperation*, ceph::buffer::v14_2_0::list*, optional_yield)+0x2cc) [0x13e39434290c]
 13: (RGWSI_SysObj_Core::read(RGWSysObjectCtxBase&, RGWSI_SysObj_Core::GetObjState&, RGWObjVersionTracker*, rgw_raw_obj const&, ceph::buffer::v14_2_0::list*, long, long, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v14_2_0::list> > >*, bool, rgw_cache_entry_info*, boost::optional<obj_version>)+0x2e2) [0x13e393fba362]
 14: (RGWSI_SysObj_Cache::read(RGWSysObjectCtxBase&, RGWSI_SysObj_Core::GetObjState&, RGWObjVersionTracker*, rgw_raw_obj const&, ceph::buffer::v14_2_0::list*, long, long, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v14_2_0::list> > >*, bool, rgw_cache_entry_info*, boost::optional<obj_version>)+0x486) [0x13e3943bbdd6]
 15: (RGWSI_SysObj::Obj::ROp::read(long, long, ceph::buffer::v14_2_0::list*)+0x9b) [0x13e393fb6d5b]
 16: (rgw_get_system_obj(RGWRados*, RGWSysObjectCtx&, rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v14_2_0::list&, RGWObjVersionTracker*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v14_2_0::list> > >*, rgw_cache_entry_info*, boost::optional<obj_version>)+0x312) [0x13e39433fd92]
 17: (rgw_get_user_info_by_uid(RGWRados*, rgw_user const&, RGWUserInfo&, RGWObjVersionTracker*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, rgw_cache_entry_info*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v14_2_0::list> > >*)+0x18a) [0x13e39434f1ba]
 18: (rgw_get_user_info_from_index(RGWRados*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw_pool const&, RGWUserInfo&, RGWObjVersionTracker*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*)+0x565) [0x13e39434fec5]
 19: (rgw::auth::s3::LocalEngine::authenticate(DoutPrefixProvider const*, boost::basic_string_view<char, std::char_traits<char> > const&, boost::basic_string_view<char, std::char_traits<char> > const&, boost::basic_string_view<char, std::char_traits<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<basic_sstring<char, unsigned short, (unsigned short)65> (CephContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&, std::function<std::shared_ptr<rgw::auth::Completer> (boost::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&)> const&, req_state const*) const+0x361) [0x13e394321631]
 20: (rgw::auth::s3::AWSEngine::authenticate(DoutPrefixProvider const*, req_state const*) const+0x141) [0x13e3942ff181]
 21: (rgw::auth::Strategy::authenticate(DoutPrefixProvider const*, req_state const*) const+0x565) [0x13e393fe98f5]
 22: (rgw::auth::Strategy::authenticate(DoutPrefixProvider const*, req_state const*) const+0x565) [0x13e393fe98f5]
 23: (rgw::auth::Strategy::apply(DoutPrefixProvider const*, rgw::auth::Strategy const&, req_state*)+0x3f) [0x13e393fea40f]
 24: (RGW_Auth_S3::authorize(DoutPrefixProvider const*, RGWRados*, rgw::auth::StrategyRegistry const&, req_state*)+0x50) [0x13e39431d5f0]
 25: (RGWHandler_REST_S3::authorize(DoutPrefixProvider const*)+0x9a) [0x13e39431d82a]
 26: (process_request(RGWRados*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, int*)+0x1042) [0x13e393f390d2]
 27: (()+0x31de2f) [0x13e393e7de2f]
 28: (()+0x31f16c) [0x13e393e7f16c]
 29: (make_fcontext()+0x2f) [0x13e39446334f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Thread 1 (Thread 0x64a6e4bba700 (LWP 14952)):
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000013f502628651 in reraise_fatal (signum=11) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/global/signal_handler.cc:326
#2  handle_fatal_signal (signum=11) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/global/signal_handler.cc:326
#3  <signal handler called>
#4  0x000064a6ef114963 in ceph::buffer::v14_2_0::list::buffers_t::clear_and_dispose (this=0x13f5082457b8) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/include/buffer.h:648
#5  ceph::buffer::v14_2_0::list::buffers_t::clone_from (other=..., this=0x13f5082457b8) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/include/buffer.h:639
#6  ceph::buffer::v14_2_0::list::operator= (other=..., this=0x13f5082457b8) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/include/buffer.h:1003
#7  ceph::buffer::v14_2_0::list::operator= (other=..., this=0x13f5082457b8) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/include/buffer.h:1000
#8  Objecter::handle_osd_op_reply (this=0x13f507c75700, m=0x13f50820eb00) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/osdc/Objecter.cc:3502
#9  0x000064a6ef115cab in Objecter::ms_dispatch (this=0x13f507c75700, m=0x13f50820eb00) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/osdc/Objecter.cc:966
#10 0x000064a6ef1347d7 in non-virtual thunk to Objecter::ms_fast_dispatch(Message*) () at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/osdc/Objecter.h:2110
#11 0x000064a6ef0e5440 in Dispatcher::ms_fast_dispatch2 (this=0x13f507c75708, m=...) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/common/RefCountedObj.h:41
#12 0x000064a6e67d530f in Messenger::ms_fast_dispatch (m=..., this=<optimized out>) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8_build/boost/include/boost/container/new_allocator.hpp:165
#13 DispatchQueue::fast_dispatch (this=0x13f506fcf558, m=...) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/msg/DispatchQueue.cc:72
#14 0x000064a6e68fa75e in DispatchQueue::fast_dispatch (m=0x13f50820eb00, this=<optimized out>) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8_build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:67
#15 ProtocolV2::handle_message (this=<optimized out>) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/msg/async/ProtocolV2.cc:1501
#16 0x000064a6e690cec8 in ProtocolV2::handle_read_frame_dispatch (this=0x13f5080fb700) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/msg/async/ProtocolV2.cc:1128
#17 0x000064a6e690d245 in ProtocolV2::handle_read_frame_epilogue_main (this=0x13f5080fb700, buffer=..., r=<optimized out>) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/msg/async/ProtocolV2.cc:1352
#18 0x000064a6e68f203f in ProtocolV2::run_continuation (this=0x13f5080fb700, continuation=...) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/msg/async/ProtocolV2.cc:45
#19 0x000064a6e68b898e in std::function<void (char*, long)>::operator()(char*, long) const (__args#1=<optimized out>, __args#0=<optimized out>, this=0x13f508165610) at /usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/include/g++-v9/bits/std_function.h:685
#20 AsyncConnection::process (this=0x13f508165200) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/msg/async/AsyncConnection.cc:450
#21 0x000064a6e69181b8 in EventCenter::process_events (this=this@entry=0x13f506f53700, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x64a6e4bb7d28) at /usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/include/g++-v9/bits/basic_ios.h:282
#22 0x000064a6e691e25c in NetworkStack::<lambda()>::operator() (__closure=0x13f506f93808, __closure=0x13f506f93808) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8/src/msg/async/Stack.cc:53
#23 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/include/g++-v9/bits/std_function.h:300
#24 0x000064a6e5ca2130 in std::execute_native_thread_routine (__p=0x13f506f93800) at /var/tmp/portage/sys-devel/gcc-9.2.0-r2/work/gcc-9.2.0/libstdc++-v3/src/c++11/thread.cc:80
#25 0x000064a6e5e2c482 in start_thread (arg=<optimized out>) at pthread_create.c:486
#26 0x000064a6e5abc5cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Running reproducer load on version v14.2.8 consisting 10GB objects PUTs with similar ceph.conf to the provided above
the segmentation fault does not reproduce.

The difference between civetweb and beast frameworks during load, is that memory footprint of the beast framework is larger.

A clue that this is related to memory preassure is the following like from the segmentation fault:

#12 0x000064a6e67d530f in Messenger::ms_fast_dispatch (m=..., this=<optimized out>) at /var/tmp/portage/sys-cluster/ceph-14.2.8/work/ceph-14.2.8_build/boost/include/boost/container/new_allocator.hpp:165

If it is possible to proceed investigating whether the segmentation fault was a result of memory preasure,
there are debug steps that can assist in advancing the investigation:
provide a sosreport to provide information aboutt the system spec like the total memory and avaliable memory during usage,
and/or provide atop or sar system load logs (from /var/log/sa/ or /var/log/atop/) from the time surrounding the segfault reproduction.

Actions #11

Updated by JuanJose Galvez about 4 years ago

Mark Kogan wrote:

Posting the callstacks from the attached logs:
[...]

[...]

Running reproducer load on version v14.2.8 consisting 10GB objects PUTs with similar ceph.conf to the provided above
the segmentation fault does not reproduce.

The difference between civetweb and beast frameworks during load, is that memory footprint of the beast framework is larger.

A clue that this is related to memory preassure is the following like from the segmentation fault:
[...]

If it is possible to proceed investigating whether the segmentation fault was a result of memory preasure,
there are debug steps that can assist in advancing the investigation:
provide a sosreport to provide information aboutt the system spec like the total memory and avaliable memory during usage,
and/or provide atop or sar system load logs (from /var/log/sa/ or /var/log/atop/) from the time surrounding the segfault reproduction.

Unfortunately the sa and atop logs don't go that far back for us, however we do have Prometheus data. I can confirm that these nodes all have 128GB and only have two instances of radosgw running on each (s3api and s3website). The lowest dip in memory over the last month was 117GB available, I suppose a large spike could have occurred in the time between prometheus scrapes but it does seem unlikely.

Is there a combination of debug options that can provide greater detail of the activity going on in the RGW leading to the crash?

Also, I've noted a few grsec messages in our dmesg logs, they are unrelated (ntupdate) but do you know if there is anything RGW related we should watch for when grsecurity is enabled?

Actions #12

Updated by Mark Kogan about 4 years ago

Thank you,
a couple of questions please to try to clue on the source of the problem,

- Under which OS and OS version it's running?
- is selinux or equivalent is active
- Is radosgw installed from packages (rpm/deb, which one?) or compiled on the machine?
- tcmalloc package name & version
- Is it running in a docker containerized environment?
- is the radosgw process constrained by groups (systemd unit configuration)
- Please paste the content of 'cat /proc/$(pidof radosgw)/limits'

Actions #13

Updated by Patrick McLean almost 4 years ago

- Under which OS and OS version it's running?

It's running under Gentoo, which is a rolling distro with no specific version.

- is selinux or equivalent is active

No selinux or equivalent, it does run under grsecurity normally, but we can reproduce this exactly the same without it enabled

- Is radosgw installed from packages (rpm/deb, which one?) or compiled on the machine?

It's installed via the Gentoo package, it's compiled on a local build server

- tcmalloc package name & version

The Gentoo google-perftools package version google-perftools-2.7-r1
The source of the code this package uses is https://github.com/gperftools/gperftools

- Is it running in a docker containerized environment?

No.

- is the radosgw process constrained by groups (systemd unit configuration)

No, it's running under OpenRC with zero control group constraints.

- Please paste the content of 'cat /proc/$(pidof radosgw)/limits'

Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 2097152 2097152 processes
Max open files 1048576 1048576 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 513634 513634 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

Actions #14

Updated by Mark Kogan almost 4 years ago

Patrick McLean wrote:

- Is radosgw installed from packages (rpm/deb, which one?) or compiled on the machine?

It's installed via the Gentoo package, it's compiled on a local build server

This may be related to the Gentoo compilation flags,
Is it possible to please try to provide the optimization flags that are used in the Gentoo compilation?
(will try to compile with the same flags (was it compiled with gcc or clang))

PS: found this, but it does not contain compilation flags:
https://wiki.gentoo.org/wiki/Ceph/Installation#Installation --> https://packages.gentoo.org/packages/sys-cluster/ceph --> https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-cluster/ceph/ceph-15.2.3.ebuild

Actions #15

Updated by Patrick McLean almost 4 years ago

The compilation flags on Gentoo are user selected.
These hosts are being compiled with gcc-9.3.0 with the flags "-O2 -march=znver1 -mindirect-branch=thunk -frecord-gcc-switches -fno-omit-frame-pointer -g -pipe"

We have tried removing "-mindirect-branch=thunk" which had no effect on this issue.

Actions #16

Updated by Patrick McLean almost 4 years ago

I will also note that I am the maintainer of the package in Gentoo.

If there are compile flags that should be avoided, I can have the ebuild filter them out automatically

Actions #17

Updated by Mark Kogan almost 4 years ago

Patrick McLean wrote:

The compilation flags on Gentoo are user selected.
These hosts are being compiled with gcc-9.3.0 with the flags "-O2 -march=znver1 -mindirect-branch=thunk -frecord-gcc-switches -fno-omit-frame-pointer -g -pipe"

We have tried removing "-mindirect-branch=thunk" which had no effect on this issue.

Updating - in process of looking for a machine with an AMD Epyc processor to work on reproducing with "-march=znver1"

Actions #18

Updated by Mark Kogan almost 4 years ago

Updating, the seg fault did not reproduce on

lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               1
Model name:          AMD Ryzen 7 1700 Eight-Core Processor
Stepping:            1
CPU MHz:             3000.047
BogoMIPS:            6000.09
Hypervisor vendor:   Microsoft
Virtualization type: full
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            8192K

and vstart environment built with

-DCMAKE_CXX_FLAGS="-O2 -march=znver1 -mindirect-branch=thunk -frecord-gcc-switches -fno-omit-frame-pointer -g -pipe" 

and

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic
Actions #19

Updated by Mark Kogan almost 4 years ago

Is it possible to please check if reproduces with version 15.x or master on the reproducing environment

Actions #20

Updated by Sage Weil almost 3 years ago

  • Project changed from Ceph to rgw
Actions

Also available in: Atom PDF