Project

General

Profile

Actions

Bug #56589

closed

perf-crimson-msgr: segmentation fault happens when shutdown

Added by Rixin Luo almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

INFO  2022-07-18 11:00:36,609 [shard 1] ms - client#1 shutdown...
INFO  2022-07-18 11:00:36,609 [shard 1] ms - [osd.1(client#1) 127.0.0.1:0/1@57149 >> osd.0 v2:127.0.0.1:9010/0] closing: reset no, replace no

--Type <RET> for more, q to quit, c to continue without paging--
Thread 2 "reactor-1" received signal SIGSEGV, Segmentation fault.
0x00000000006f4fbb in crimson::net::SocketConnection::close_clean(bool) ()
(gdb) bt
#0  0x00000000006f4fbb in crimson::net::SocketConnection::close_clean(bool) ()
#1  0x00000000006cf415 in crimson::net::SocketMessenger::shutdown() ()
#2  0x0000000000692d7e in std::_Function_handler<seastar::future<void> ((anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&), seastar::sharded<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client>::invoke_on_all<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client::shutdown()::{lambda(auto:1&)#1}>(seastar::smp_submit_to_options, (anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client::shutdown()::{lambda(auto:1&)#1})::{lambda((anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&)#1}>::_M_invoke(std::_Any_data const&, (anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&) ()
#3  0x0000000000689453 in seastar::smp_message_queue::async_work_item<seastar::sharded<(anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client>::invoke_on_all(seastar::smp_submit_to_options, std::function<seastar::future<void> ((anonymous namespace)::run((anonymous namespace)::perf_mode_t, (anonymous namespace)::client_config const&, (anonymous namespace)::server_config const&)::test_state::Client&)>)::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::run_and_dispose() ()
#4  0x0000000000c5e99e in seastar::reactor::run_tasks(seastar::reactor::task_queue&) ()
#5  0x0000000000c5eda2 in seastar::reactor::run_some_tasks() [clone .part.0] ()
#6  0x0000000000c8abbe in seastar::reactor::do_run() ()
#7  0x0000000000c97ad9 in seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::{lambda()#3}::operator()() const ()
#8  0x0000000000c46e3e in seastar::posix_thread::start_routine(void*) ()
#9  0x00007ffff557d1ca in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff46add83 in clone () from /lib64/libc.so.6
Actions #1

Updated by Rixin Luo almost 2 years ago

It both happens on aarch64 and x86 platform.

Actions #2

Updated by Kefu Chai almost 2 years ago

see the discussion at https://github.com/scylladb/seastar/pull/1138#discussion_r923401951.

this more likely happens when we test the Release build of Seastar, so the first called continuation is inlined, and trigger_close() gets called immediately. trigger_close() in turn removes the first connection being closed from the container. so the iterator is invalidated. that's why we have segfault when trying to advance the iterator after calling the func.

a simple workaround could be to use seastar::yield() to defer the call to close_clean() so that parallel_for_each() gets a chance to walk through the whole container and transform all the elements in the container to futures by applying the specified func on them.

Actions #3

Updated by Kefu Chai almost 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Kefu Chai
  • Pull request ID set to 47152
Actions #4

Updated by Samuel Just over 1 year ago

  • Project changed from Ceph to crimson
  • Category deleted (msgr)
Actions #5

Updated by Yingxin Cheng over 1 year ago

  • Status changed from Fix Under Review to Resolved

The fix was merged.

Actions

Also available in: Atom PDF