Project

General

Profile

Actions

Bug #65664

open

Crash observed in boost::asio module related to stream.async_shutdown()

Added by Mark Kogan 23 days ago. Updated 9 days ago.

Status:
Pending Backport
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
beast ssl backport_processed
Backport:
quincy reef squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

continuing from downstream BZ#2275284

call stack:

completing the missing callstack symbols using addr2line:

"backtrace": ['
"/lib64/libc.so.6(+0x54db0) [0x7fd314053db0]",'

"/usr/bin/radosgw(+0x33b8ea) [0x55783b4ae8ea]",'
0x000000000033b8ea: boost::asio::detail::epoll_reactor::start_op(int, int, boost::asio::detail::epoll_reactor::descriptor_state*&, boost::asio::detail::reactor_op*, bool, bool) at /usr/src/debug/ceph-18.2.0-189.el9cp.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/epoll_reactor.ipp:246:3

"/usr/bin/radosgw(+0x35ba27) [0x55783b4cea27]",'
0x000000000035ba27: boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> >, boost::asio::ssl::detail::shutdown_op, spawn::detail::coro_handler<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> > >, void> >::operator()(boost::system::error_code, unsigned long, int) at /usr/src/debug/ceph-18.2.0-189.el9cp.x86_64/redhat-linux-build/boost/include/boost/asio/detail/reactive_socket_service_base.hpp:419:13

"(boost::asio::detail::executor_op<boost::asio::detail::binder2<boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> >, boost::asio::ssl::detail::shutdown_op, spawn::detail::coro_handler<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> > >, void> >, boost::system::error_code, unsigned long>, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x1d2) [0x55783b4ef882]",'

"/usr/bin/radosgw(+0x3807de) [0x55783b4f37de]",'
0x00000000003807de: boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::basic_executor_type<std::allocator<void>, 4ul> const, void>::operator()() at /usr/include/c++/11/bits/shared_ptr_base.h:1296:16

"/usr/bin/radosgw(+0x379910) [0x55783b4ec910]",'
0x0000000000379910: void boost::asio::io_context::basic_executor_type<std::allocator<void>, 4ul>::execute<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::basic_executor_type<std::allocator<void>, 4ul> const, void> >(boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::basic_executor_type<std::allocator<void>, 4ul> const, void>&&) const at /usr/src/debug/ceph-18.2.0-189.el9cp.x86_64/redhat-linux-build/boost/include/boost/asio/impl/io_context.hpp:300:3

"(boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffers_1, boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> >, boost::asio::ssl::detail::shutdown_op, spawn::detail::coro_handler<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> > >, void> >, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x6a6) [0x55783b4dda06]",'

"/usr/bin/radosgw(+0xb8534e) [0x55783bcf834e]",'
0x0000000000b8534e: boost::asio::detail::thread_info_base::rethrow_pending_exception() at /usr/src/debug/ceph-18.2.0-189.el9cp.x86_64/redhat-linux-build/boost/include/boost/asio/detail/thread_info_base.hpp:228:5
 (inlined by) boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) at /usr/src/debug/ceph-18.2.0-189.el9cp.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:493:46
 (inlined by) boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .constprop.0] [clone .isra.0] at /usr/src/debug/ceph-18.2.0-189.el9cp.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:210:20

"/usr/bin/radosgw(+0x3cf04d) [0x55783b54204d]",'
0x00000000003cf04d: std::thread::_State_impl<std::thread::_Invoker<std::tuple<(anonymous namespace)::AsioFrontend::run()::{lambda()#2}> > >::_M_run() [clone .lto_priv.0] at /usr/src/debug/ceph-18.2.0-189.el9cp.x86_64/redhat-linux-build/boost/include/boost/system/detail/error_code.hpp:305:13

"/lib64/libstdc++.so.6(+0xdb924) [0x7fd3143db924]",'
"/lib64/libc.so.6(+0x9f802) [0x7fd31409e802]",'
"/lib64/libc.so.6(+0x3f450) [0x7fd31403e450]"'

continuing from the last BZ comment:

(In reply to Mark Kogan from comment #15)

i think you have this part backwards. the call had previously been wrapped in a `if (!ec) {` block which means there was no error

errors here are common because of http keepalive. the server keeps trying to read more requests from the client until the client hangs up, where the server sees errors like ECONNRESET

Thanks Casey,
Suggesting that will check exactly which error is the 'normal' error
(ECONNRESET or other)
and add back the if to perform async_shutdown() during only the normal error
and no error
like for example:
if (!ec || ec == boost::asio::error::connection_reset) { ...
stream.async_shutdown() ...

still interested in finding a root cause for the crash. are there really no rgw logs from qe? the dump leading up to the crash would be really valuable. @Tejas?

a note from https://www.openssl.org/docs/man1.1.1/man3/SSL_shutdown.html:

Note that SSL_shutdown() must not be called if a previous fatal error has occurred on a connection i.e. if SSL_get_error() has returned SSL_ERROR_SYSCALL or SSL_ERROR_SSL.

unrelated to the crash, but this is probably why it had the `if (!ec) {` condition. not all errors here would be fatal to the connection, though. for example, boost::asio::error::operation_aborted would indicate a read/write timeout on our end, but the connection would remain intact

ultimately we want to allow for ssl session reuse in all possible cases, but it would be useful to categorize which cases are really possible. part of the responsibility lies with the client to allow for clean shutdown before closing their end of the socket

for your `s_client --reconnect` reproducer in https://github.com/ceph/ceph/pull/55967, what error code leads to our call to async_shutdown()?


Related issues 4 (3 open1 closed)

Related to rgw - Bug #65742: beast: revert changes to ssl async_shutdown()RejectedCasey Bodley

Actions
Copied to rgw - Backport #65886: reef: Crash observed in boost::asio module related to stream.async_shutdown()In ProgressMark KoganActions
Copied to rgw - Backport #65887: quincy: Crash observed in boost::asio module related to stream.async_shutdown()In ProgressMark KoganActions
Copied to rgw - Backport #65888: squid: Crash observed in boost::asio module related to stream.async_shutdown()In ProgressCasey BodleyActions
Actions

Also available in: Atom PDF