Project

General

Profile

Actions

Bug #61763

open

RGW issue sync loop restart master zone

Added by Guillaume Morin 11 months ago. Updated 9 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello, i bug with a multisite configuration.
When i enable sync with a secondary zone, my master zone radosgw restart in a loop every 30s. Please see attached logs about loop restart.
(Sync works, i have data in my bucket )

master zone (cluster ceph pacific 16.2.9)
secondary zone (cluster ceph quincy 17.2.5)

Regards
Guillaume


Files

dump_rgw_loop_restart.txt (84.9 KB) dump_rgw_loop_restart.txt Guillaume Morin, 06/22/2023 08:06 AM
log (16.5 KB) log xiaobao wen, 08/18/2023 06:07 AM
Actions #1

Updated by Guillaume Morin 11 months ago

a part of logs with error:
-9> 2023-06-21T15:21:40.020+0200 7f167fa02840 1 librados: init done
-8> 2023-06-21T15:21:40.028+0200 7f14cc5d8700 4 mgrc handle_mgr_map Got map version 111
-7> 2023-06-21T15:21:40.028+0200 7f14cc5d8700 4 mgrc handle_mgr_map Active mgr is now [v2:10.17.66.11:6810/2110232,v1:10.17.66.11:6811/2110232]
-6> 2023-06-21T15:21:40.028+0200 7f14cc5d8700 4 mgrc reconnect Starting new session with [v2:10.17.66.11:6810/2110232,v1:10.17.66.11:6811/2110232]
-5> 2023-06-21T15:21:40.028+0200 7f167d620700 10 monclient: get_auth_request con 0x7f15fc0a20d0 auth_method 0
-4> 2023-06-21T15:21:40.028+0200 7f167ce1f700 10 monclient: get_auth_request con 0x557e66374330 auth_method 0
-3> 2023-06-21T15:21:42.820+0200 7f1664ff9700 10 monclient: tick
-2> 2023-06-21T15:21:42.912+0200 7f15e77fe700 10 monclient: tick
-1> 2023-06-21T15:21:43.020+0200 7f14cb5d6700 10 monclient: tick
0> 2023-06-21T15:21:43.712+0200 7f157df3b700 -1 ** Caught signal (Aborted) *
in thread 7f157df3b700 thread_name:radosgw

ceph version 16.2.9 (a569859f5e07da0c4c39da81d5fb5675cd95da49) pacific (stable)
1: /lib/x86_64-linux-gnu/libc.so.6(0x3bd60) [0x7f168bc06d60]
2: gsignal()
3: abort()
4: /lib/x86_64-linux-gnu/libstdc
+.so.6(0x9a7ec) [0x7f168223b7ec]
5: /lib/x86_64-linux-gnu/libstdc
+.so.6(0xa5966) [0x7f1682246966]
6: /lib/x86_64-linux-gnu/libstdc
+.so.6(0xa59d1) [0x7f16822469d1]
7: /lib/x86_64-linux-gnu/libstdc
+.so.6(0xa595b) [0x7f168224695b]
8: (spawn::detail::continuation_context::resume()+0x87) [0x7f168c1b3917]
9: (boost::asio::detail::executor_op<ceph::async::ForwardingHandler<ceph::async::CompletionHandler<spawn::detail::coro_handler<boost::asio::executor_binder<void ()(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, std::tuple<boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > > >, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x2ba) [0x7f168c1c0e6a]
10: (boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const>::operator()()+0x85) [0x7f168c1c3005]
11: (boost::asio::detail::executor_op<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const>, boost::asio::detail::recycling_allocator<void, boost::asio::detail::thread_info_base::default_tag>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x80) [0x7f168c1c3340]
12: (boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&)+0x403) [0x7f168c1b28d3]
13: /lib/libradosgw.so.2(+0x408ba9) [0x7f168c198ba9]
14: /lib/libradosgw.so.2(+0x408d52) [0x7f168c198d52]
15: /lib/x86_64-linux-gnu/libstdc
+.so.6(+0xceed0) [0x7f168226fed0]
16: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f1682376ea7]
17: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #2

Updated by Casey Bodley 11 months ago

  • Status changed from New to Need More Info

it's hard to tell what the problem is when all of the rgw logging is disabled by 0/ 0 rgw. can you please raise debug_rgw to 0/20 and share the new log?

Actions #3

Updated by xiaobao wen 9 months ago

Casey Bodley wrote:

it's hard to tell what the problem is when all of the rgw logging is disabled by 0/ 0 rgw. can you please raise debug_rgw to 0/20 and share the new log?

We didn't use the sync function and had a similar problem. The following is the information of the core file.

~~
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `radosgw --fsid=e0e613ce-827c-40ea-b38e-169fc4d1d948 --keyring=/etc/ceph/keyring'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f79da065b7f in raise () from /lib64/libpthread.so.0
[Current thread is 1 (Thread 0x7f7916ef3700 (LWP 409))]
Missing separate debuginfos, use: yum debuginfo-install ceph-radosgw-16.2.10-0.el8.x86_64
(gdb) bt
#0 0x00007f79da065b7f in raise () from /lib64/libpthread.so.0
#1 0x00007f79e5303563 in reraise_fatal (signum=6) at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/src/global/signal_handler.cc:332
#2 handle_fatal_signal (signum=6) at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/src/global/signal_handler.cc:332
#3 <signal handler called>
#4 0x00007f79d86b6a9f in raise () from /lib64/libc.so.6
#5 0x00007f79d8689e05 in abort () from /lib64/libc.so.6
#6 0x00007f79d905809b in _gnu_cxx::_verbose_terminate_handler() [clone .cold.1] () from /lib64/libstdc++.so.6
#7 0x00007f79d905e53c in _cxxabiv1::_terminate(void ()()) () from /lib64/libstdc++.so.6
#8 0x00007f79d905e597 in std::terminate() () from /lib64/libstdc++.so.6
#9 0x00007f79d905e52e in std::rethrow_exception(std::__exception_ptr::exception_ptr) () from /lib64/libstdc++.so.6
#10 0x00007f79e4b70a17 in spawn::detail::continuation_context::resume (this=<optimized out>) at /usr/include/c++/8/bits/exception_ptr.h:107
#11 0x00007f79e4b77b6a in spawn::detail::coro_handler<boost::asio::executor_binder<void (
)(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >::operator() (value=..., ec=..., this=0x7f7916eeedd0) at /usr/include/c++/8/bits/shared_ptr_base.h:1018
#12 std::__invoke_impl<void, spawn::detail::coro_handler<boost::asio::executor_binder<void ()(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > (_f=...) at /usr/include/c++/8/bits/invoke.h:60
#13 std::
_invoke<spawn::detail::coro_handler<boost::asio::executor_binder<void (
)(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > (_fn=...) at /usr/include/c++/8/bits/invoke.h:95
#14 std::
_apply_impl<spawn::detail::coro_handler<boost::asio::executor_binder<void ()(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, std::tuple<boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, 0ul, 1ul> (_t=..., __f=...) at /usr/include/c++/8/tuple:1678
#15 std::apply<spawn::detail::coro_handler<boost::asio::executor_binder<void (
)(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, std::tuple<boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > > (_t=..., _f=...) at /usr/include/c++/8/tuple:1687
#16 ceph::async::CompletionHandler<spawn::detail::coro_handler<boost::asio::executor_binder<void ()(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, std::tuple<boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > >::operator()() && (this=0x7f7916eeedd0)
at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/src/common/async/bind_handler.h:52
#17 ceph::async::ForwardingHandler<ceph::async::CompletionHandler<spawn::detail::coro_handler<boost::asio::executor_binder<void (
)(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, std::tuple<boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > > >::operator()<>() (this=0x7f7916eeedd0)
at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/src/common/async/forward_handler.h:47
#18 boost::asio::asio_handler_invoke<ceph::async::ForwardingHandler<ceph::async::CompletionHandler<spawn::detail::coro_handler<boost::asio::executor_binder<void ()(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, std::tuple<boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > > > > (function=...)
at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/handler_invoke_hook.hpp:69
#19 boost_asio_handler_invoke_helpers::invoke<ceph::async::ForwardingHandler<ceph::async::CompletionHandler<spawn::detail::coro_handler<boost::asio::executor_binder<void (
)(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, std::tuple<boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > > >, ceph::async::ForwardingHandler<ceph::async::CompletionHandler<spawn::detail::coro_handler<boost::asio::executor_binder<void ()(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, std::tuple<boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > > > > (context=..., function=...)
at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#20 boost::asio::detail::executor_op<ceph::async::ForwardingHandler<ceph::async::CompletionHandler<spawn::detail::coro_handler<boost::asio::executor_binder<void (
)(), boost::asio::strand<boost::asio::io_context::executor_type> >, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > >, std::tuple<boost::system::error_code, std::shared_lock<ceph::async::SharedMutex<boost::asio::io_context::executor_type> > > > >, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete (owner=0x55fd64378910, base=<optimized out>) at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/detail/executor_op.hpp:70
#21 0x00007f79e4b7f93d in boost::asio::detail::scheduler_operation::complete (bytes_transferred=0, ec=..., owner=<optimized out>, this=<optimized out>)
at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/detail/scheduler_operation.hpp:40
#22 boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const>::operator() (this=this@entry=0x7f7916eeeed0)
at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/detail/impl/strand_executor_service.hpp:90
#23 0x00007f79e4b7fca6 in boost::asio::asio_handler_invoke<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const> > (function=...)
at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/handler_invoke_hook.hpp:67
#24 boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const>, boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const> > (context=..., function=...) at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#25 boost::asio::detail::executor_op<boost::asio::detail::strand_executor_service::invoker<boost::asio::io_context::executor_type const>, boost::asio::detail::recycling_allocator<void, boost::asio::detail::thread_info_base::default_tag>, boost::asio::detail::scheduler_operation>::do_complete (owner=0x55fd63db8300, base=<optimized out>) at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/detail/executor_op.hpp:70
#26 0x00007f79e4b73ad2 in boost::asio::detail::scheduler_operation::complete (bytes_transferred=0, ec=..., owner=0x55fd63db8300, this=<optimized out>)
at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/detail/scheduler_operation.hpp:40
#27 boost::asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=..., this=0x55fd63db8300) at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/detail/impl/scheduler.ipp:447
#28 boost::asio::detail::scheduler::run (this=0x55fd63db8300, ec=...) at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/detail/impl/scheduler.ipp:200
#29 0x00007f79e4b56376 in boost::asio::io_context::run (this=<optimized out>, ec=...) at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/build/boost/include/boost/asio/impl/io_context.ipp:71
#30 (anonymous namespace)::AsioFrontend::<lambda()>::operator() (
_closure=0x55fd640cf148) at /usr/src/debug/ceph-16.2.10-0.el8.x86_64/src/rgw/rgw_asio_frontend.cc:1025
#31 std::__invoke_impl<void, (anonymous namespace)::AsioFrontend::run()::<lambda()> > (_f=...) at /usr/include/c++/8/bits/invoke.h:60
#32 std::
_invoke<(anonymous namespace)::AsioFrontend::run()::<lambda()> > (__fn=...) at /usr/include/c++/8/bits/invoke.h:95
#33 std::thread::_Invoker<std::tuple<(anonymous namespace)::AsioFrontend::run()::<lambda()> > >::_M_invoke<0> (this=0x55fd640cf148) at /usr/include/c++/8/thread:244
#34 std::thread::_Invoker<std::tuple<(anonymous namespace)::AsioFrontend::run()::<lambda()> > >::operator() (this=0x55fd640cf148) at /usr/include/c++/8/thread:253
#35 std::thread::_State_impl<std::thread::_Invoker<std::tuple<(anonymous namespace)::AsioFrontend::run()::<lambda()> > > >::_M_run(void) (this=0x55fd640cf140) at /usr/include/c++/8/thread:196
#36 0x00007f79d908aba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#37 0x00007f79da05b1ca in start_thread () from /lib64/libpthread.so.0
#38 0x00007f79d86a1dd3 in clone () from /lib64/libc.so.6
~~

version info:
~~
[root@node01 deeproute]# uname -a
Linux smd-node01 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@node01 deeproute]# ceph version
ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
~~

Actions #4

Updated by xiaobao wen 9 months ago

last ~100 lines of log

Actions

Also available in: Atom PDF