Project

General

Profile

Bug #47910

radosgw crash on objecter operations

Added by Rafal Wadolowski over 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After upgrading from 12.2.12 to 14.2.11 radosgw can't handle any request.

Application without client traffic works stable. When we test it with simple curl IP:PORT, it crashes.

It fails with several different backtraces

Thread 21 "radosgw" received signal SIGBUS, Bus error.
[Switching to Thread 0x7fffdfd58700 (LWP 972959)]
0x00007ffff7b86bb3 in tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) () from /usr/lib/libtcmalloc.so.4
(gdb) bt
#0  0x00007ffff7b86bb3 in tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) () from /usr/lib/libtcmalloc.so.4
#1  0x00007ffff7b86e8a in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /usr/lib/libtcmalloc.so.4
#2  0x00007ffff7b86f3f in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/libtcmalloc.so.4
#3  0x00007ffff7b89f2a in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /usr/lib/libtcmalloc.so.4
#4  0x00007ffff7b98a9b in tc_malloc () from /usr/lib/libtcmalloc.so.4
#5  0x00007fffeea642d8 in operator new(unsigned long) () from /usr/lib/ceph/libceph-common.so.0
#6  0x00007fffee76955d in __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > > >::allocate (this=<optimized out>, __n=1) at /usr/include/c++/7/ext/new_allocator.h:111
#7  std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > > > >::allocate (__a=..., __n=1) at /usr/include/c++/7/bits/alloc_traits.h:436
#8  std::_Rb_tree<int, std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > >, std::_Select1st<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > > >::_M_get_node (this=0x273c860) at /usr/include/c++/7/bits/stl_tree.h:588
#9  std::_Rb_tree<int, std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > >, std::_Select1st<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > > >::_M_create_node<std::piecewise_construct_t const&, std::tuple<int&&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple<int&&>&&, std::tuple<>&&) (
    this=0x273c860) at /usr/include/c++/7/bits/stl_tree.h:642
#10 std::_Rb_tree<int, std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > >, std::_Select1st<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<int&&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > >, std::piecewise_construct_t const&, std::tuple<int&&>&&, std::tuple<>&&) (this=this@entry=0x273c860, __pos=__pos@entry=...) at /usr/include/c++/7/bits/stl_tree.h:2398
#11 0x00007fffee76a20f in std::map<int, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::list<std::pair<ceph::buffer::v14_2_0::list, Message*>, std::allocator<std::pair<ceph::buffer::v14_2_0::list, Message*> > > > > >::operator[](int&&) (__k=<unknown type in /usr/lib/debug/.build-id/f2/965c588303d06cdaeac3a5daaba7266bdd311b.debug, CU 0x385479c, DIE 0x397fe18>, this=0x273c860)
    at /usr/include/c++/7/bits/stl_map.h:512
#12 ProtocolV1::send_message (this=0x273c800, m=0x40f9b00) at /build/ceph-14.2.11/src/msg/async/ProtocolV1.cc:247
#13 0x00007fffee759fcd in AsyncConnection::send_message (this=0x270df80, m=0x40f9b00) at /build/ceph-14.2.11/src/msg/async/AsyncConnection.cc:548
#14 0x00007ffff78c9ec5 in Objecter::_send_op (this=this@entry=0x2551080, op=op@entry=0x40f9800) at /build/ceph-14.2.11/src/osdc/Objecter.cc:3274
#15 0x00007ffff78cab34 in Objecter::_send_linger_ping (this=this@entry=0x2551080, info=info@entry=0x263ba00) at /build/ceph-14.2.11/src/osdc/Objecter.cc:693
#16 0x00007ffff78cb56b in Objecter::tick (this=0x2551080) at /build/ceph-14.2.11/src/osdc/Objecter.cc:2153
#17 0x00007ffff78b3f80 in std::function<void ()>::operator()() const (this=0x1887570) at /usr/include/c++/7/bits/std_function.h:706
#18 ceph::timer_detail::timer<ceph::time_detail::coarse_mono_clock>::timer_thread (this=0x25511a0) at /build/ceph-14.2.11/src/common/ceph_timer.h:132
#19 0x00007fffeea91e8f in execute_native_thread_routine () from /usr/lib/ceph/libceph-common.so.0
#20 0x00007fffecceb6ba in start_thread (arg=0x7fffdfd58700) at pthread_create.c:333
#21 0x00007fffec7184dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 42 "rgw_gc" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd7547700 (LWP 984488)]
0x00007ffff7b86bb3 in tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) () from /usr/lib/libtcmalloc.so.4
(gdb) bt
#0  0x00007ffff7b86bb3 in tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) () from /usr/lib/libtcmalloc.so.4
#1  0x00007ffff7b86fa6 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/libtcmalloc.so.4
#2  0x00007ffff7b89f2a in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /usr/lib/libtcmalloc.so.4
#3  0x00007ffff7b9c2c8 in tc_newarray () from /usr/lib/libtcmalloc.so.4
#4  0x00007ffff786c0e2 in __gnu_cxx::new_allocator<int*>::allocate (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/7/ext/new_allocator.h:111
#5  std::allocator_traits<std::allocator<int*> >::allocate (__a=..., __n=<optimized out>) at /usr/include/c++/7/bits/alloc_traits.h:436
#6  std::_Vector_base<int*, std::allocator<int*> >::_M_allocate (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/7/bits/stl_vector.h:172
#7  std::vector<int*, std::allocator<int*> >::_M_default_append (this=this@entry=0x375a240, __n=1) at /usr/include/c++/7/bits/vector.tcc:571
#8  0x00007ffff78a6657 in std::vector<int*, std::allocator<int*> >::resize (__new_size=<optimized out>, this=0x375a240) at /usr/include/c++/7/bits/stl_vector.h:692
#9  Objecter::Op::Op (this=0x375a000, o=..., ol=..., op=std::vector of length 0, capacity 0, f=<optimized out>, fin=0x36021a0, ov=0x3eef6b0, offset=0x0,
    parent_trace=0x7fffd7544860) at /build/ceph-14.2.11/src/osdc/Objecter.h:1414
#10 0x00007ffff789bb2e in Objecter::prepare_mutate_op (parent_trace=0x7fffd7544860, reqid=..., objver=0x3eef6b0, oncommit=0x36021a0, flags=0, mtime=..., snapc=..., op=...,
    oloc=..., oid=..., this=<optimized out>) at /build/ceph-14.2.11/src/osdc/Objecter.h:2249
#11 librados::IoCtxImpl::aio_operate (this=0x269bee0, oid=..., o=0x28ac300, c=0x3eef600, snap_context=..., flags=flags@entry=0, trace_info=0x0)
    at /build/ceph-14.2.11/src/librados/IoCtxImpl.cc:800
#12 0x00007ffff78742dc in librados::v14_2_0::IoCtx::aio_operate (this=this@entry=0x1983a78, oid="gc.641", c=c@entry=0x32eac70, o=o@entry=0x7fffd7544980)
    at /build/ceph-14.2.11/src/librados/librados_cxx.cc:1431
#13 0x00000000009ccf0b in RGWRados::gc_aio_operate (this=0x1983800, oid="gc.641", op=op@entry=0x7fffd7544980, pc=pc@entry=0x7fffd7544a38)
    at /build/ceph-14.2.11/src/rgw/rgw_rados.cc:9034
#14 0x0000000000b24b98 in RGWGC::remove (this=0x25a52c0, index=index@entry=641, tags=std::vector of length 0, capacity 0, pc=pc@entry=0x7fffd7544a38)
    at /build/ceph-14.2.11/src/rgw/rgw_gc.cc:92
#15 0x0000000000b2983c in RGWGCIOManager::flush_remove_tags (this=this@entry=0x7fffd7544b00, index=index@entry=641, rt=std::vector of length 0, capacity 0)
    at /build/ceph-14.2.11/src/rgw/rgw_gc.cc:291
#16 0x0000000000b27ac5 in RGWGCIOManager::flush_remove_tags (this=0x7fffd7544b00) at /build/ceph-14.2.11/src/rgw/rgw_gc.cc:310
#17 RGWGCIOManager::drain (this=<optimized out>) at /build/ceph-14.2.11/src/rgw/rgw_gc.cc:270
#18 RGWGC::process (this=<optimized out>, expired_only=expired_only@entry=true) at /build/ceph-14.2.11/src/rgw/rgw_gc.cc:456
#19 0x0000000000b28172 in RGWGC::GCWorker::entry (this=0x18980e0) at /build/ceph-14.2.11/src/rgw/rgw_gc.cc:498
#20 0x00007fffecceb6ba in start_thread (arg=0x7fffd7547700) at pthread_create.c:333
#21 0x00007fffec7184dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109


Thread 3 "msgr-worker-1" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe566b700 (LWP 999197)]
0x00007ffff7b8a0b3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /usr/lib/libtcmalloc.so.4
(gdb) bt
#0  0x00007ffff7b8a0b3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /usr/lib/libtcmalloc.so.4
#1  0x00007ffff7b8a16b in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) () from /usr/lib/libtcmalloc.so.4
#2  0x00007ffff7b9c818 in tc_deletearray () from /usr/lib/libtcmalloc.so.4
#3  0x00007fffee7b2755 in ceph::buffer::v14_2_0::ptr_node::disposer::operator() (this=<optimized out>, delete_this=0x37419c0) at /build/ceph-14.2.11/src/include/buffer.h:404
#4  ceph::buffer::v14_2_0::list::buffers_t::clear_and_dispose (this=0x26d6640) at /build/ceph-14.2.11/src/include/buffer.h:649
#5  ceph::buffer::v14_2_0::list::clear (this=0x26d6640) at /build/ceph-14.2.11/src/include/buffer.h:1068
#6  PosixConnectedSocketImpl::send (this=0x37b02c0, bl=..., more=false) at /build/ceph-14.2.11/src/msg/async/PosixStack.cc:150
#7  0x00007fffee757e59 in ConnectedSocket::send (more=false, bl=..., this=0x26d6600) at /build/ceph-14.2.11/src/msg/async/Stack.h:108
#8  AsyncConnection::_try_send (this=0x26d6400, more=more@entry=false) at /build/ceph-14.2.11/src/msg/async/AsyncConnection.cc:323
#9  0x00007fffee770779 in ProtocolV1::write_message (this=this@entry=0x277a000, m=m@entry=0x3e98600, bl=..., more=more@entry=false)
    at /build/ceph-14.2.11/src/msg/async/ProtocolV1.cc:1160
#10 0x00007fffee771254 in ProtocolV1::write_event (this=0x277a000) at /build/ceph-14.2.11/src/msg/async/ProtocolV1.cc:346
#11 0x00007fffee75a9f3 in AsyncConnection::handle_write (this=0x26d6400) at /build/ceph-14.2.11/src/msg/async/AsyncConnection.cc:692
#12 0x00007fffee7af6f7 in EventCenter::process_events (this=this@entry=0x18ac980, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000,
    working_dur=working_dur@entry=0x7fffe5668c08) at /build/ceph-14.2.11/src/msg/async/Event.cc:441
#13 0x00007fffee7b3de8 in NetworkStack::<lambda()>::operator() (__closure=0x1936408) at /build/ceph-14.2.11/src/msg/async/Stack.cc:53
#14 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/7/bits/std_function.h:316
#15 0x00007fffeea91e8f in execute_native_thread_routine () from /usr/lib/ceph/libceph-common.so.0
#16 0x00007fffecceb6ba in start_thread (arg=0x7fffe566b700) at pthread_create.c:333
#17 0x00007fffec7184dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 2 "msgr-worker-0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe5e6c700 (LWP 1009698)]
0x00007ffff7b8a0b3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /usr/lib/libtcmalloc.so.4
(gdb) bt
#0  0x00007ffff7b8a0b3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /usr/lib/libtcmalloc.so.4
#1  0x00007ffff7b8a16b in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) () from /usr/lib/libtcmalloc.so.4
#2  0x00007ffff7b9c818 in tc_deletearray () from /usr/lib/libtcmalloc.so.4
#3  0x00007fffee4eb527 in RefCountedObject::put (this=0x4058c00) at /build/ceph-14.2.11/src/common/RefCountedObj.h:64
#4  0x00007fffee77079a in ProtocolV1::write_message (this=this@entry=0x2e53800, m=m@entry=0x4058c00, bl=..., more=more@entry=false)
    at /build/ceph-14.2.11/src/msg/async/ProtocolV1.cc:1174
#5  0x00007fffee771254 in ProtocolV1::write_event (this=0x2e53800) at /build/ceph-14.2.11/src/msg/async/ProtocolV1.cc:346
#6  0x00007fffee75a9f3 in AsyncConnection::handle_write (this=0x272ed80) at /build/ceph-14.2.11/src/msg/async/AsyncConnection.cc:692
#7  0x00007fffee7af6f7 in EventCenter::process_events (this=this@entry=0x18acbc0, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000,
    working_dur=working_dur@entry=0x7fffe5e69c08) at /build/ceph-14.2.11/src/msg/async/Event.cc:441
#8  0x00007fffee7b3de8 in NetworkStack::<lambda()>::operator() (__closure=0x1936438) at /build/ceph-14.2.11/src/msg/async/Stack.cc:53
#9  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/7/bits/std_function.h:316
#10 0x00007fffeea91e8f in execute_native_thread_routine () from /usr/lib/ceph/libceph-common.so.0
#11 0x00007fffecceb6ba in start_thread (arg=0x7fffe5e6c700) at pthread_create.c:333
#12 0x00007fffec7184dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

We tested it from Ubuntu 16.04 and 18.04 - this same effect.
The cluster has mons,mgrs,osds on 14.2.11 - that part works stable.

log_with_msgr_worker_crash (25.7 KB) Rafal Wadolowski, 10/22/2020 03:30 PM

log_with_radosgw_crash (38.4 KB) Rafal Wadolowski, 10/22/2020 03:34 PM


Related issues

Duplicates rgw - Bug #43739: radosgw abort caused by beast frontend coroutine stack overflow Resolved

History

#1 Updated by Rafal Wadolowski over 3 years ago

When I changed to civetweb, it is stable.
So it looks like a problem with beast, are there any requirements?

#2 Updated by Or Friedmann over 3 years ago

  • Status changed from New to Need More Info

1) If you switch to civetweb and then switch to beast again it happens too?

2) Can you please provide a full log file with debug_rgw 20 and debug_ms 1?

3) What is the tcmalloc version installed in your environment?

#3 Updated by Rafal Wadolowski over 3 years ago

1) Yes, now we have enabled msgr2 and 9 radosgw with civetweb. When I connect radosgw with beast frontend it crashes.
2) attached
3) Tested with two versions

ii  libtcmalloc-minimal4                        2.5-2.2ubuntu3                                  amd64        efficient thread-caching malloc
ii  libtcmalloc-minimal4                  2.4-0ubuntu5.16.04.1                            amd64        efficient thread-caching malloc

#4 Updated by Rafal Wadolowski over 3 years ago

And log with radosgw SIGSEGV

#5 Updated by Or Friedmann over 3 years ago

Thank you Rafal,

Can you please provide this command output:

ceph versions and ceph.conf

#6 Updated by Rafal Wadolowski over 3 years ago

Here you are

{
    "mon": {
        "ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 3
    },
    "osd": {
        "ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 5037
    },
    "mds": {},
    "rgw": {
        "ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 9
    },
    "rgw-nfs": {
        "ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 7
    },
    "overall": {
        "ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 5059
    }
}

[client.rgw.rgw1.rgw0]
host = rgw1
keyring = /var/lib/ceph/radosgw/ceph-rgw.rgw1.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-rgw1.rgw0.log
rgw frontends = beast endpoint=0.0.0.0:8080
rgw thread pool size = 512

# Please do not change this file directly since it is managed by Ansible and will be overwritten
[global]
bluestore_rocksdb_options = "compression=kSnappyCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=3,recycle_log_file_num=16,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,num_levels=5,max_bytes_for_level_base=1610612736,max_bytes_for_level_multiplier=10" 
cluster network = xxx
err_to_syslog = true
fsid = 414b0b49-0f21-4f7d-82da-7586ce0f95db
log_to_syslog = true
mon host = xxx
mon initial members = xxx
osd pool default crush rule = -1
osd_command_thread_suicide_timeout = 7200
osd_command_thread_timeout = 4800
osd_crush_update_on_start = true
osd_deep_scrub_interval = 5184000
osd_max_backfills = 1
osd_max_pg_per_osd_hard_ratio = 32
osd_max_scrubs = 1
osd_memory_target = 3221225472
osd_op_thread_suicide_timeout = 600
osd_op_thread_timeout = 300
osd_scrub_chunk_max = 1
osd_scrub_chunk_min = 1
osd_scrub_during_recovery = false
osd_scrub_max_interval = 2592000
osd_scrub_min-interval = 864000
osd_scrub_priority = 1
public network = xxx
rgw_cache_lru_size = 1000000
rgw_dynamic_resharding = False
rgw_gc_max_objs = 2647
rgw_gc_obj_min_wait = 30
rgw_gc_processor_max_time = 600
rgw_gc_processor_period = 600
rgw_lc_max_objs = 2647
rgw_lifecycle_work_time = 00:01-23:59
rgw_nfs_fhcache_partitions = 1
rgw_nfs_fhcache_size = 0
rgw_nfs_lru_lane_hiwat = 524288
rgw_nfs_lru_lanes = 10
rgw_num_rados_handles = 8

#7 Updated by Or Friedmann over 3 years ago

Thanks for the fast response,

Is it possible to install on the specific RGW newer version of tcmalloc(https://launchpad.net/ubuntu/focal/+package/libtcmalloc-minimal4) and test?
I have looked in the logs it looks like curl / is working twice and on the third time it crashes.
can you please run this command against beast frontend and check if it causes failure too
curl http://127.0.0.1:8080/swift/healthcheck

#8 Updated by Rafal Wadolowski over 3 years ago

Or,
focal is not supported for nautilus, I tested with bionic version and this same effect.
I can't understand, why RGW crash on third or first request. I was able to crash it on first request.

Okay, good idea. This curl is working without crash, so it looks like a problem with S3...

Changing rgw_num_rados_handles doesn't have any effect.

#9 Updated by Or Friedmann over 3 years ago

The swift healthcheck is not making a RADOS call so this is the reason I guess.
Can you please update this rgw to a newer tcmalloc(from the focal repo) so we can be sure this is not a tcmalloc bug.

Thank you

#10 Updated by Rafal Wadolowski over 3 years ago

This one from focal 2.7? Right now I'm testing on bionic with 2.5-2.2ubuntu3

#11 Updated by Or Friedmann over 3 years ago

Right.
I want to make sure this is not a tcmalloc bug (on one of the nodes, you are using for testing this error).

#12 Updated by Mauricio Oliveira over 3 years ago

Hi Rafal and Or,

Rafal,

Could you please confirm if you are using ceph packages from Ubuntu?
(say, Ubuntu Cloud Archive, version 14.2.11-0ubuntu0.19.10.1~cloud4)

And if so, would you be able to test a patched package to check if
this issue happens with increased coroutine stack sizes? (from [1])

Or,

I've been debugging a bug report that is apparently this very same
issue (rgw segfaults with beast/not civetweb, after L -> N upgrade,
and tcmalloc in the stack trace) but it isn't reproducible anymore,
after a N -> O upgrade.
(Thus unfortunately the reproducer is lost; it was not my decision.)

I'll post more details later, but essentially, from the 6 coredumps,
4/6 have a similar signature to this (with tcmalloc), and the other
2/6 hit exceptions, which seem to be due to stack corruption.
(all 6 happen in the same circumstance, a bit after starting radosgw;
so I think despite different signatures, cause may be similar/same):

- first, a tcmalloc large alloc of ~86 TiB, because the request size
was actually a pointer to a stack address (that converts to 86 TiB);

- second, the stack trace shows that a function return address was
not from the text section, but rather an address in the stack.

So, that apparently suggests that stack corruption is happening, and
this post [2] from dev@ceph.io reported stack overflow in coroutines
stack/beast in Nautilus. They increased the stack sizes from 128k to
1MiB to confirm whether the overflow/problem went away. That is the
patch in the test packages above/PPA [1]. (that's not the fix; sure.)

I couldn't identify/prove that stack corruption happened w/ GDB and
the core-dumps yet, but if we could test that to confirm whether or
not this is the issue here, it would be great.

Regarding tcmalloc, I guess we cannot rule out that it has an issue,
particularly because the coroutines stacks are allocated from heap
IIUIC, which is managed by tcmalloc; so both issues might be related.
(or turn out that there's no stack corruption, but something else.)

Thanks,
Mauricio

[1] https://launchpad.net/~mfo/+archive/ubuntu/ceph47910
[2] https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/6LBFZIFUPTJQ3SNTLVKSQMVITJWVWTZ6/#6LBFZIFUPTJQ3SNTLVKSQMVITJWVWTZ6

#13 Updated by Rafal Wadolowski over 3 years ago

Hi Mauricio,
I am using package from download.ceph.com.

I can test the fix, I will back with the answers until end of the week.

#14 Updated by Mauricio Oliveira over 3 years ago

Hey Rafal,

Thanks!

By the way, does your ceph cluster have:
1) bucket(s) with a large number of objects?
2) Erasure-Coded pool(s)?

#15 Updated by Rafal Wadolowski over 3 years ago

Both of them.

I checked your links. I should implement https://github.com/cbodley/ceph/commit/d23507bd1295a29ccae3ae36187194d44c9a6438, am I correct?

#16 Updated by Mauricio Oliveira over 3 years ago

Thanks for confirming on large buckets/EC pools.

Correct; that patch is the only change in the ceph packages in the PPA.

So, if you'd like to use already/built packages to test whether that patch helps, you can try this:
(e.g., in a VM or container to connect to cluster, so not to change original packages/environment)

1) Confirm the issue happens with the original ceph packages from the Ubuntu Cloud Archive:

$ sudo add-apt-repository cloud-archive:train

$ apt-cache policy radosgw | grep Candidate:
  Candidate: 14.2.11-0ubuntu0.19.10.1~cloud4

$ sudo install radosgw # and others if needed.

2) Check whether the issue happens with the patched ceph packages:

$ sudo add-apt-repository ppa:mfo/ceph47910

$ apt-cache policy radosgw | grep Candidate:
  Candidate: 14.2.11-0ubuntu0.19.10.1~cloud4+costack1mb.1

$ sudo install radosgw # and others if needed.

Hope this helps,
Mauricio

#17 Updated by Rafal Wadolowski over 3 years ago

Or, Mauricio,

I tested radosgw with increased coroutine stack sized to 1M. Everything is working fine.
I think, we should merge this into master + backports.

#18 Updated by Mauricio Oliveira over 3 years ago

Hi Rafal,

Thanks for testing!

So, the issue indeed does seem to be stack corruption/overflow.

Unfortunately this is just a debug patch (to confirm the issue)
and cannot be merged upstream for scalability reasons (see [1]).

The bug report on my end moved from Nautilus to Octopus before
that could be tested, but they do not hit it on Octopus.

Thus the 'fix', which should be related to/imply smaller stack
size into the path triggering the overflow (unknown, right now)
is already present in Octopus.

I'll try and search for N -> O related changes, as a beginner.

Or, maybe you have some closer insight about such changes?

(In my bug report, the segfaults usually happen immediately
after starting radosgw, 1 sec after, or 60 secs after IIRC,
if that helps.)

Thanks,
Mauricio

[1] https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/6LBFZIFUPTJQ3SNTLVKSQMVITJWVWTZ6/#6LBFZIFUPTJQ3SNTLVKSQMVITJWVWTZ6

""" ... since these coroutines are how we expect
the beast frontend to scale to thousands of connections, we really don't
want to raise the stack size permanently and limit how many we can fit
in memory. """

#19 Updated by Casey Bodley over 3 years ago

  • Duplicates Bug #43739: radosgw abort caused by beast frontend coroutine stack overflow added

#20 Updated by Casey Bodley over 3 years ago

this was fixed for octopus as part of https://tracker.ceph.com/issues/43739

it looks like the nautilus backport https://tracker.ceph.com/issues/43921 never went through. any volunteers?

#21 Updated by Mauricio Oliveira over 3 years ago

Casey, thanks for the pointers!

Not sure how I missed that one.
So, the stack size was increased from 128k to 512k.

I'll assess the backport, but can't commit right now.

#22 Updated by Mauricio Oliveira over 3 years ago

I completed a first pass of the backport,
and should look at build/test next week.

#23 Updated by Mauricio Oliveira over 3 years ago

Hi Rafal, Casey,

I've completed the backport on the nautilus branch, and tested with the steps from 43739 [1] using the development cluster ('vstart.sh').

The patched code didn't hit any issues, but neither the original code (both running radosgw with beast), so it can't confirm it's fixed.

Rafal,

Could you please verify again with the new patched packages in the PPA? [2]
So to confirm that it fixes the issue you observed/reported.

It's currently building and should be available in a few hours.
You can check for it with `apt-cache policy` (version below.)

Thanks,
Mauricio

$ sudo add-apt-repository cloud-archive:train # for dependency packages
$ sudo add-apt-repository ppa:mfo/ceph47910   # for patched packages

$ apt-cache policy radosgw | grep Candidate:
  Candidate: 14.2.11-0ubuntu0.19.10.1~cloud4+test.ceph43921.1 

$ sudo install radosgw # and others if needed.

[1] https://tracker.ceph.com/issues/43739
[2] https://launchpad.net/~mfo/+archive/ubuntu/ceph47910

#24 Updated by Mauricio Oliveira over 3 years ago

Hi Rafal,

Please excuse this message if you're already out for vacation/holiday season.

Just following up if you had a chance to verify the test packages with the proper fix/backport.

Thanks,
Mauricio

#25 Updated by Rafal Wadolowski over 3 years ago

Hi Mauricio,
I planned to test backport this week, I will be back with results.

Rafal

#26 Updated by Mauricio Oliveira about 3 years ago

Hi Rafal,

Happy New Year!

Following up on our conversation on IRC, please let us know once you have test results w/ patched packages.
(If I recall correctly, you tested with a dev cluster, and was going to test with a pre-production cluster.)

cheers,
Mauricio

#27 Updated by Rafal Wadolowski about 3 years ago

Currently, with and without this patch, we hit in some GC tracebacks, so we trying to fix that. imo it's not coroutine related.

#28 Updated by Rafal Wadolowski about 3 years ago

Hi Mauricio!

We successfully started radosgw on our preproduction.
I confirm, that your changes are working. We should merge it into master.

Rafal

#29 Updated by Mauricio Oliveira about 3 years ago

Hi Rafal,

That's great news; thanks for testing!

Can you please confirm that on this cluster the issue happens w/out the patchset, and is fixed w/ the patchset?

#30 Updated by Rafal Wadolowski about 3 years ago

Yes, this changes fix the issue with crushing radosgw

#31 Updated by Mauricio Oliveira about 3 years ago

Rafal,

Great, thanks for confirming!

I've to update commit messages for the process to submit backports; and should be able to send those next week.

cheers,
Mauricio

#33 Updated by Loïc Dachary about 3 years ago

  • Status changed from Need More Info to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

#34 Updated by Mauricio Oliveira about 3 years ago

This should be fixed on version v14.2.19 per
https://tracker.ceph.com/issues/43921#note-7

#35 Updated by Mauricio Oliveira over 2 years ago

The fix has actually been released on v14.2.22: [1]

rgw: beast frontend uses 512k mprotected coroutine stacks (pr#39947, Yaakov Selkowitz, Mauricio Faria de Oliveira, Daniel Gryniewicz, Casey Bodley)

[1] https://docs.ceph.com/en/latest/releases/nautilus/#v14-2-22-nautilus

Also available in: Atom PDF