Bug #56589
closedperf-crimson-msgr: segmentation fault happens when shutdown
0%
Description
INFO 2022-07-18 11:00:36,609 [shard 1] ms - client#1 shutdown...
INFO 2022-07-18 11:00:36,609 [shard 1] ms - [osd.1(client#1) 127.0.0.1:0/1@57149 >> osd.0 v2:127.0.0.1:9010/0] closing: reset no, replace no
--Type <RET> for more, q to quit, c to continue without paging--
Thread 2 "reactor-1" received signal SIGSEGV, Segmentation fault.
0x00000000006f4fbb in crimson::net::SocketConnection::close_clean(bool) ()
(gdb) bt
#0 0x00000000006f4fbb in crimson::net::SocketConnection::close_clean(bool) ()
#1 0x00000000006cf415 in crimson::net::SocketMessenger::shutdown() ()
#2 0x0000000000692d7e in std::_Function_handler<seastar::future<void> ((anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&), seastar::sharded<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client>::invoke_on_all<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client::shutdown()::{lambda(auto:1&)#1}>(seastar::smp_submit_to_options, (anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client::shutdown()::{lambda(auto:1&)#1})::{lambda((anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&)#1}>::_M_invoke(std::_Any_data const&, (anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&) ()
#3 0x0000000000689453 in seastar::smp_message_queue::async_work_item<seastar::sharded<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client>::invoke_on_all(seastar::smp_submit_to_options, std::function<seastar::future<void> ((anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&)>)::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::run_and_dispose() ()
#4 0x0000000000c5e99e in seastar::reactor::run_tasks(seastar::reactor::task_queue&) ()
#5 0x0000000000c5eda2 in seastar::reactor::run_some_tasks() [clone .part.0] ()
#6 0x0000000000c8abbe in seastar::reactor::do_run() ()
#7 0x0000000000c97ad9 in seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::{lambda()#3}::operator()() const ()
#8 0x0000000000c46e3e in seastar::posix_thread::start_routine(void*) ()
#9 0x00007ffff557d1ca in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff46add83 in clone () from /lib64/libc.so.6
Updated by Rixin Luo almost 2 years ago
It both happens on aarch64 and x86 platform.
Updated by Kefu Chai almost 2 years ago
see the discussion at https://github.com/scylladb/seastar/pull/1138#discussion_r923401951.
this more likely happens when we test the Release build of Seastar, so the first called continuation is inlined, and trigger_close()
gets called immediately. trigger_close()
in turn removes the first connection being closed from the container. so the iterator is invalidated. that's why we have segfault when trying to advance the iterator after calling the func.
a simple workaround could be to use seastar::yield()
to defer the call to close_clean()
so that parallel_for_each()
gets a chance to walk through the whole container and transform all the elements in the container to futures by applying the specified func on them.
Updated by Kefu Chai almost 2 years ago
- Status changed from New to Fix Under Review
- Assignee set to Kefu Chai
- Pull request ID set to 47152
Updated by Samuel Just over 1 year ago
- Project changed from Ceph to crimson
- Category deleted (
msgr)
Updated by Yingxin Cheng over 1 year ago
- Status changed from Fix Under Review to Resolved
The fix was merged.