Bug #56589: perf-crimson-msgr: segmentation fault happens when shutdown - crimson - Ceph

Actions

Copy link

Bug #56589

closed

perf-crimson-msgr: segmentation fault happens when shutdown

Added by Rixin Luo almost 2 years ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

Kefu Chai

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

47152

Crash signature (v1):

Crash signature (v2):

Description

INFO  2022-07-18 11:00:36,609 [shard 1] ms - client#1 shutdown...
INFO  2022-07-18 11:00:36,609 [shard 1] ms - [osd.1(client#1) 127.0.0.1:0/1@57149 >> osd.0 v2:127.0.0.1:9010/0] closing: reset no, replace no

--Type <RET> for more, q to quit, c to continue without paging--
Thread 2 "reactor-1" received signal SIGSEGV, Segmentation fault.
0x00000000006f4fbb in crimson::net::SocketConnection::close_clean(bool) ()
(gdb) bt
#0  0x00000000006f4fbb in crimson::net::SocketConnection::close_clean(bool) ()
#1  0x00000000006cf415 in crimson::net::SocketMessenger::shutdown() ()
#2  0x0000000000692d7e in std::_Function_handler<seastar::future<void> ((anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&), seastar::sharded<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client>::invoke_on_all<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client::shutdown()::{lambda(auto:1&)#1}>(seastar::smp_submit_to_options, (anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client::shutdown()::{lambda(auto:1&)#1})::{lambda((anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&)#1}>::_M_invoke(std::_Any_data const&, (anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&) ()
#3  0x0000000000689453 in seastar::smp_message_queue::async_work_item<seastar::sharded<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client>::invoke_on_all(seastar::smp_submit_to_options, std::function<seastar::future<void> ((anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&)>)::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::run_and_dispose() ()
#4  0x0000000000c5e99e in seastar::reactor::run_tasks(seastar::reactor::task_queue&) ()
#5  0x0000000000c5eda2 in seastar::reactor::run_some_tasks() [clone .part.0] ()
#6  0x0000000000c8abbe in seastar::reactor::do_run() ()
#7  0x0000000000c97ad9 in seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::{lambda()#3}::operator()() const ()
#8  0x0000000000c46e3e in seastar::posix_thread::start_routine(void*) ()
#9  0x00007ffff557d1ca in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff46add83 in clone () from /lib64/libc.so.6

Actions

Copy link

Updated by Rixin Luo almost 2 years ago

It both happens on aarch64 and x86 platform.

Actions

Copy link

Updated by Kefu Chai almost 2 years ago

see the discussion at https://github.com/scylladb/seastar/pull/1138#discussion_r923401951.

this more likely happens when we test the Release build of Seastar, so the first called continuation is inlined, and trigger_close() gets called immediately. trigger_close() in turn removes the first connection being closed from the container. so the iterator is invalidated. that's why we have segfault when trying to advance the iterator after calling the func.

a simple workaround could be to use seastar::yield() to defer the call to close_clean() so that parallel_for_each() gets a chance to walk through the whole container and transform all the elements in the container to futures by applying the specified func on them.

Actions

Copy link