Project

General

Profile

Actions

Bug #43739

closed

radosgw abort caused by beast frontend coroutine stack overflow

Added by Mark Kogan over 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Reproducer flow:

compile radosgw Debug build

./do_cmake.sh -DCMAKE_BUILD_TYPE=Debug

block the rados tcp port:

sudo iptables -I INPUT -i lo -m multiport -p tcp --dports 6800:7300 -j DROP

generate large object PUT load:

for i in {1..800}; do  s3cmd --access_key=b2345678901234567890 --secret_key=b234567890123456789012345678901234567890 --signature-v2 put ./ubuntu-13.04-server-amd64.iso s3://bkt &  done

wait for all s3cmd instances to start and unblock the rados connection:

sudo iptables -D INPUT 1

radosgw log will indicate rados tcp disconnection and radosgw will abort,
logs excerpt:

2020-01-09T17:11:28.716+0200 7f70336b6700 -1 rgw realm watcher: RGWRealmWatcher::handle_error oid=realms.996e28b9-ae01-440d-befd-a082a1623f70.control err=-107
2020-01-09T17:11:29.115+0200 7f753b7fe700 -1 RGWWatcher::handle_error cookie 58961024 err (107) Transport endpoint is not connected
2020-01-09T17:11:29.116+0200 7f753b7fe700 -1 RGWWatcher::handle_error cookie 58965312 err (107) Transport endpoint is not connected
2020-01-09T17:11:29.118+0200 7f753b7fe700 -1 RGWWatcher::handle_error cookie 58953360 err (107) Transport endpoint is not connected
2020-01-09T17:11:29.119+0200 7f753b7fe700 -1 RGWWatcher::handle_error cookie 58960224 err (107) Transport endpoint is not connected
2020-01-09T17:11:29.121+0200 7f753b7fe700 -1 RGWWatcher::handle_error cookie 140131361122240 err (107) Transport endpoint is not connected
2020-01-09T17:11:29.123+0200 7f753b7fe700 -1 RGWWatcher::handle_error cookie 58952224 err (107) Transport endpoint is not connected
2020-01-09T17:11:29.125+0200 7f753b7fe700 -1 RGWWatcher::handle_error cookie 140128542709104 err (107) Transport endpoint is not connected
2020-01-09T17:11:29.126+0200 7f753b7fe700 -1 RGWWatcher::handle_error cookie 140128543179216 err (107) Transport endpoint is not connected

free(): invalid size
*** Caught signal (Aborted) **
 in thread 7f70e67fc700 thread_name:radosgw
 ceph version 14.0.0-18987-g2ca2221ddb (2ca2221ddbd600c0a0213b81a95c30a6c4f2163d) octopus (dev)
 1: ./bin/radosgw() [0x121998f]
 2: (()+0x14b20) [0x7f75702c8b20]
 3: (gsignal()+0x145) [0x7f756fa55625]
 4: (abort()+0x12b) [0x7f756fa3e8d9]
 5: (()+0x804af) [0x7f756fa994af]
 6: (()+0x87a9c) [0x7f756faa0a9c]
 7: (()+0x894ac) [0x7f756faa24ac]
 8: (boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits>::deallocate(boost::coroutines::stack_context&)+0x11f) [0x11b021f]
 9: ./bin/radosgw() [0x102a655]
 10: ./bin/radosgw() [0x1022208]
 11: (boost::coroutines::push_coroutine<void>::~push_coroutine()+0x5c) [0x11b088c]
 12: (std::_Sp_counted_ptr<boost::coroutines::push_coroutine<void>*, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x53) [0x11b0a43]
 13: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x74) [0xf2b924]
 14: (std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()+0x52) [0xf2b872]
 15: (std::__shared_ptr<boost::coroutines::push_coroutine<void>, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()+0x4f) [0x10933af]
 16: (std::shared_ptr<boost::coroutines::push_coroutine<void> >::~shared_ptr()+0x48) [0x106db78]
 17: (boost::asio::detail::coro_handler<boost::asio::executor_binder<void (*)(), boost::asio::executor>, unsigned long>::~coro_handler()+0x5f) [0x1081b9f]
 18: (boost::beast::async_base<boost::asio::detail::coro_handler<boost::asio::executor_binder<void (*)(), boost::asio::executor>, unsigned long>, boost::asio::executor, std::allocator<void> >
 19: (boost::beast::detail::dynamic_read_ops::read_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::executor>, boost::beast::flat_static_buffer<65536ul>, boost::beast::>
 20: (boost::asio::detail::binder2<boost::beast::detail::dynamic_read_ops::read_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::executor>, boost::beast::flat_static_bu>
 21: (boost::asio::detail::executor_function<boost::asio::detail::binder2<boost::beast::detail::dynamic_read_ops::read_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::>
 22: (boost::asio::detail::executor_function_base::complete()+0x50) [0x105fda0]
 23: (boost::asio::executor::function::operator()()+0x61) [0x105fd11]
 24: (void boost::asio::asio_handler_invoke<boost::asio::executor::function>(boost::asio::executor::function&, ...)+0x48) [0x105fbf8]
 25: (void boost_asio_handler_invoke_helpers::invoke<boost::asio::executor::function, boost::asio::executor::function>(boost::asio::executor::function&, boost::asio::executor::function&)+0x7>
 26: (boost::asio::detail::executor_op<boost::asio::executor::function, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_ope>
 27: (boost::asio::detail::scheduler_operation::complete(void*, boost::system::error_code const&, unsigned long)+0x85) [0x103e1d5]
 28: (boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const>::operator()()+0xd8) [0x118f6e8]
 29: (void boost::asio::asio_handler_invoke<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const> >(boost::asio::detail::strand_executor_service
 30: (void boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const>, boost::asio::detail::strand_executor>
 31: (void boost::asio::io_context::executor_type::dispatch<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const>, std::allocator<void> >(boost:>
 32: (void boost::asio::detail::strand_executor_service::dispatch<boost::asio::io_context::executor_type const, boost::asio::executor::function, std::allocator<void> >(std::shared_ptr<boost:>
 33: (void boost::asio::strand<boost::asio::io_context::executor_type>::dispatch<boost::asio::executor::function, std::allocator<void> >(boost::asio::executor::function&&, std::allocator<voi>
 34: (boost::asio::executor::impl<boost::asio::strand<boost::asio::io_context::executor_type>, std::allocator<void> >::dispatch(boost::asio::executor::function&&)+0x6e) [0x118df1e]
 35: (void boost::asio::executor::dispatch<boost::asio::detail::binder2<boost::beast::detail::dynamic_read_ops::read_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::ex>
 36: (void boost::asio::detail::handler_work<boost::beast::detail::dynamic_read_ops::read_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::executor>, boost::beast::flat>
 37: (boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffer, boost::beast::detail::dynamic_read_ops::read_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::>
 38: (boost::asio::detail::scheduler_operation::complete(void*, boost::system::error_code const&, unsigned long)+0x85) [0x103e1d5]
 39: (boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&>
 40: (boost::asio::detail::scheduler::run(boost::system::error_code&)+0x130) [0x103c910]
 41: (boost::asio::io_context::run(boost::system::error_code&)+0x5c) [0x11d825c]
 42: ./bin/radosgw() [0x102cdd6]
 43: ./bin/radosgw() [0x102cd30]
 44: ./bin/radosgw() [0x102cba0]
 45: ./bin/radosgw() [0x102cb18]
 46: ./bin/radosgw() [0x102ca88]
 47: ./bin/radosgw() [0x102c72f]
 48: (()+0xd76f4) [0x7f756fe1b6f4]
 49: (()+0x94e2) [0x7f75702bd4e2]
 50: (clone()+0x43) [0x7f756fb1a693]
2020-01-09T17:47:19.451+0200 7f70e67fc700 -1 *** Caught signal (Aborted) **
 in thread 7f70e67fc700 thread_name:radosgw

Related issues 2 (0 open2 closed)

Has duplicate rgw - Bug #47910: radosgw crash on objecter operationsResolved

Actions
Copied to rgw - Backport #43921: nautilus: radosgw abort caused by beast frontend coroutine stack overflowResolvedActions
Actions #1

Updated by Ken Dreyer about 4 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to nautilus
Actions #2

Updated by Nathan Cutler about 4 years ago

  • Copied to Backport #43921: nautilus: radosgw abort caused by beast frontend coroutine stack overflow added
Actions #3

Updated by Nathan Cutler almost 4 years ago

  • Project changed from Ceph to rgw
Actions #4

Updated by Casey Bodley over 3 years ago

  • Has duplicate Bug #47910: radosgw crash on objecter operations added
Actions #5

Updated by Loïc Dachary about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF