Actions
Bug #24851
openmsg/async: segv in C_clean_handler::do_request during shutdown
Status:
New
Priority:
Normal
Assignee:
-
Category:
AsyncMessenger
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
(gdb) bt #0 0x00007f56c7e5a59b in raise () from /lib64/libpthread.so.0 #1 0x0000559073e58495 in reraise_fatal (signum=11) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:80 #2 handle_fatal_signal (signum=11) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:290 #3 <signal handler called> #4 0x0000559075242770 in ?? () #5 0x00007f56cb425558 in cleanup (this=0x5590756aca00) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncConnection.h:394 #6 C_clean_handler::do_request (this=0x559075242710, id=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncConnection.cc:88 #7 0x00007f56cb432ef7 in EventCenter::process_events (this=this@entry=0x559075258c80, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f56bf510490) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Event.cc:439 #8 0x00007f56cb435b0c in operator() (__closure=0x5590755ab2f8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Stack.cc:53 #9 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316 #10 0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0 #11 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #12 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 (gdb) info th Id Target Id Frame 21 Thread 0x7f56d3eb5140 (LWP 30698) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 20 Thread 0x7f56bc50f700 (LWP 30832) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 19 Thread 0x7f56b9d0a700 (LWP 30837) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 18 Thread 0x7f56bad0c700 (LWP 30835) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 17 Thread 0x7f56b9509700 (LWP 30838) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 16 Thread 0x7f56c0517700 (LWP 30802) 0x00007f56c6d1b183 in epoll_wait () from /lib64/libc.so.6 15 Thread 0x7f56c293d700 (LWP 30739) 0x00007f56c6d0ff0d in poll () from /lib64/libc.so.6 14 Thread 0x7f56c40bb700 (LWP 30704) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 13 Thread 0x7f56bbd0e700 (LWP 30833) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 12 Thread 0x7f56bfd16700 (LWP 30803) 0x00007f56c6d1b183 in epoll_wait () from /lib64/libc.so.6 11 Thread 0x7f56bd511700 (LWP 30830) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 10 Thread 0x7f56c18d4700 (LWP 30764) 0x00007f56c6d0ff0d in poll () from /lib64/libc.so.6 9 Thread 0x7f56c0f19700 (LWP 30800) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 8 Thread 0x7f56ba50b700 (LWP 30836) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7f56c313e700 (LWP 30736) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7f56be513700 (LWP 30828) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 0x7f56bcd10700 (LWP 30831) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 Thread 0x7f56bb50d700 (LWP 30834) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x7f56bdd12700 (LWP 30829) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x7f56bed14700 (LWP 30805) 0x00007f56c6ce156d in nanosleep () from /lib64/libc.so.6 * 1 Thread 0x7f56bf515700 (LWP 30804) 0x00007f56c7e5a59b in raise () from /lib64/libpthread.so.0
/a/sage-2018-07-10_00:40:55-rados-wip-sage3-testing-2018-07-09-1727-distro-basic-smithi/2760645
-118> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 shutdown_connections -117> 2018-07-10 08:19:24.593 7f56c0517700 10 cephx: verify_authorizer global_id=0 -116> 2018-07-10 08:19:24.593 7f56c0517700 10 cephx: cephx_verify_authorizer adding server_challenge 7443361343173355738 -115> 2018-07-10 08:19:24.593 7f56c0517700 0 mon.l@11(probing) e1 ms_verify_authorizer bad authorizer from mon 172.21.15.189:6790/0 -114> 2018-07-10 08:19:24.593 7f56c0517700 0 -- 172.21.15.189:6792/0 >> 172.21.15.189:6790/0 conn(0x5590756ad800 legacy :6792 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: challenging authorizer -113> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756aae00 172.21.15.82:6794/0 -112> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756ab500 - -111> 2018-07-10 08:19:24.593 7f56bf515700 10 mon.l@11(probing) e1 ms_verify_authorizer 172.21.15.189:6793/0 mon protocol 2 -110> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615 -109> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756aca00 172.21.15.189:6793/0 -108> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer global_id=0 -107> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: cephx_verify_authorizer adding server_challenge 96238160679238695 -106> 2018-07-10 08:19:24.593 7f56bf515700 0 mon.l@11(probing) e1 ms_verify_authorizer bad authorizer from mon 172.21.15.189:6793/0 -105> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756ad800 172.21.15.189:6790/0 -104> 2018-07-10 08:19:24.593 7f56bf515700 0 -- 172.21.15.189:6792/0 >> 172.21.15.189:6793/0 conn(0x5590756aca00 legacy :6792 s=STATE_CLOSED pgs=0 cs=0 l=0).handle_connect_msg: challenging authorizer -103> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 _reset -102> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 cancel_probe_timeout (none scheduled) -101> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 timecheck_finish -100> 2018-07-10 08:19:24.593 7f56d3eb5140 15 mon.l@11(probing) e1 health_tick_stop -99> 2018-07-10 08:19:24.593 7f56d3eb5140 15 mon.l@11(probing) e1 health_interval_stop -98> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 scrub_event_cancel -97> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 scrub_reset -96> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxos(paxos recovering c 1..220) restart -- canceling timeouts -95> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(mdsmap 1..1) restart -94> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(osdmap 1..10) restart -93> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(logm 1..71) restart -92> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(monmap 1..1) restart -91> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(auth 1..24) restart -90> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(mgr 1..3) restart -89> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(mgrstat 1..46) restart -88> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(health 1..67) restart -87> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(config 1..1) restart -86> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 cancel_probe_timeout (none scheduled) -85> 2018-07-10 08:19:24.593 7f56bf515700 10 mon.l@11(probing) e1 ms_verify_authorizer 172.21.15.189:6793/0 mon protocol 2 -84> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615 -83> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 reset_probe_timeout 0x559075956570 after 2 seconds -82> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 probing other monitors -81> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer global_id=0 -80> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: cephx_verify_authorizer got server_challenge+1 96238160679238696 expecting 96238160679238696 -79> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.0 172.21.15.82:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907527b900 -78> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer ok nonce e09e68b4c761c83 reply_bl.length()=36 -77> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907527b900 con 0 -76> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.1 172.21.15.184:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907527bb80 -75> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907527bb80 con 0 -74> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.2 172.21.15.189:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597a000 -73> 2018-07-10 08:19:24.593 7f56bf515700 10 In get_auth_session_handler for protocol 2 -72> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597a000 con 0 -71> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.3 172.21.15.82:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597a280 -70> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597a280 con 0 -69> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.4 172.21.15.184:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597a500 -68> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597a500 con 0 -67> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.5 172.21.15.189:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597a780 -66> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597a780 con 0 -65> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.6 172.21.15.82:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597aa00 -64> 2018-07-10 08:19:24.593 7f56c0517700 1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6789/0 conn(0x5590756ab500 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed -63> 2018-07-10 08:19:24.593 7f56bfd16700 1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6790/0 conn(0x559075814300 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed -62> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x5590756ab500 172.21.15.82:6789/0 -61> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x559075814300 172.21.15.82:6790/0 -60> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597aa00 con 0 -59> 2018-07-10 08:19:24.593 7f56bfd16700 1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6791/0 conn(0x5590756ac300 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed -58> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x5590756ac300 172.21.15.82:6791/0 -57> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.7 172.21.15.184:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597ac80 -56> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597ac80 con 0 -55> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.8 172.21.15.189:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597af00 -54> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597af00 con 0 -53> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.9 172.21.15.82:6792/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597b180 -52> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6792/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597b180 con 0 -51> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.10 172.21.15.184:6792/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597b400 -50> 2018-07-10 08:19:24.593 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon -49> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6792/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597b400 con 0 -48> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.12 172.21.15.82:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597b680 -47> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597b680 con 0 -46> 2018-07-10 08:19:24.593 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon. -45> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.13 172.21.15.184:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597b900 -44> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597b900 con 0 -43> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.14 172.21.15.189:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597bb80 -42> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597bb80 con 0 -41> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.15 172.21.15.82:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6000 -40> 2018-07-10 08:19:24.593 7f56bf515700 1 -- 172.21.15.189:6792/0 >> 172.21.15.189:6793/0 conn(0x5590756aca00 legacy :6792 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=13 cs=1 l=0).handle_connect_msg existing race replacing process for addr=172.21.15.189:6793/0 just fail later one(this) -39> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6000 con 0 -38> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.16 172.21.15.184:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6280 -37> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6280 con 0 -36> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.17 172.21.15.189:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6500 -35> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6500 con 0 -34> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756aca00 172.21.15.189:6793/0 -33> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.18 172.21.15.82:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6780 -32> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6780 con 0 -31> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.19 172.21.15.184:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6a00 -30> 2018-07-10 08:19:24.593 7f56bf515700 1 -- 172.21.15.189:37330/30698 learned_addr learned my addr 172.21.15.189:37330/30698 (peer_addr_for_me 172.21.15.189:37330/0) -29> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6a00 con 0 -28> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 _send_message--> mon.20 172.21.15.189:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6c80 -27> 2018-07-10 08:19:24.593 7f56bf515700 10 mon.l@11(probing) e1 ms_get_authorizer for mgr -26> 2018-07-10 08:19:24.593 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon -25> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx keyserverdata: get_caps: name=mon. -24> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: build_service_ticket service mgr secret_id 2 ticket_info.ticket.name=mon. -23> 2018-07-10 08:19:24.593 7f56d3eb5140 1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6c80 con 0 -22> 2018-07-10 08:19:24.593 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon. -21> 2018-07-10 08:19:24.594 7f56bfd16700 1 -- 172.21.15.189:6792/0 >> 172.21.15.184:6793/0 conn(0x559075990300 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed -20> 2018-07-10 08:19:24.594 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x559075990300 172.21.15.184:6793/0 -19> 2018-07-10 08:19:24.594 7f56bfd16700 1 -- 172.21.15.189:6792/0 >> 172.21.15.184:6794/0 conn(0x559075991800 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed -18> 2018-07-10 08:19:24.594 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x559075991800 172.21.15.184:6794/0 -17> 2018-07-10 08:19:24.594 7f56bfd16700 10 mon.l@11(probing) e1 ms_get_authorizer for mon -16> 2018-07-10 08:19:24.594 7f56bfd16700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon. -15> 2018-07-10 08:19:24.594 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon -14> 2018-07-10 08:19:24.594 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon. -13> 2018-07-10 08:19:24.594 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon -12> 2018-07-10 08:19:24.594 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon. -11> 2018-07-10 08:19:24.594 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon -10> 2018-07-10 08:19:24.594 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon. -9> 2018-07-10 08:19:24.594 7f56c0517700 1 -- 172.21.15.189:6792/0 >> 172.21.15.184:6795/0 conn(0x5590759b0e00 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed -8> 2018-07-10 08:19:24.594 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x5590759b0e00 172.21.15.184:6795/0 -7> 2018-07-10 08:19:24.594 7f56c0517700 1 -- 172.21.15.189:6792/0 >> 172.21.15.189:6790/0 conn(0x559075813c00 legacy :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got WAIT (connection race) -6> 2018-07-10 08:19:24.594 7f56c0517700 1 -- 172.21.15.189:6792/0 >> 172.21.15.189:6790/0 conn(0x559075813c00 legacy :-1 s=STATE_WAIT pgs=0 cs=0 l=0).process enter wait state, failing -5> 2018-07-10 08:19:24.594 7f56bfd16700 10 mon.l@11(probing) e1 ms_get_authorizer for mon -4> 2018-07-10 08:19:24.594 7f56bfd16700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon. -3> 2018-07-10 08:19:24.594 7f56bfd16700 1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6792/0 conn(0x55907598ee00 legacy :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got WAIT (connection race) -2> 2018-07-10 08:19:24.594 7f56bfd16700 1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6792/0 conn(0x55907598ee00 legacy :-1 s=STATE_WAIT pgs=0 cs=0 l=0).process enter wait state, failing -1> 2018-07-10 08:19:24.594 7f56bf515700 -1 *** Caught signal (Segmentation fault) ** in thread 7f56bf515700 thread_name:msgr-worker-2 ceph version 14.0.0-1224-g0087bf8 (0087bf835dfd0670408c4d798f595d99ff3cbe26) nautilus (dev) 1: (()+0xf6d0) [0x7f56c7e5a6d0] 2: [0x559075242770] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Sage Weil almost 6 years ago
- Description updated (diff)
- Status changed from New to 12
Updated by Sage Weil almost 6 years ago
Thread 21 (Thread 0x7f56d3eb5140 (LWP 30698)): #0 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb42d687 in Wait (mutex=..., this=0x559075815e30) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48 #2 AsyncMessenger::wait (this=0x559075815800) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncMessenger.cc:520 #3 0x0000559073b2a432 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/ceph_mon.cc:802 Thread 20 (Thread 0x7f56bc50f700 (LWP 30832)): #0 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb2f75c9 in WaitUntil (when=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64 #2 WaitInterval (interval=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73 #3 ThreadPool::worker (this=0x5590756b82a0, wt=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.cc:141 #4 0x00007f56cb2f8eb0 in ThreadPool::WorkThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.h:449 #5 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 19 (Thread 0x7f56b9d0a700 (LWP 30837)): #0 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb36aa6c in Wait (mutex=..., this=0x5590756aa8d8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48 #2 DispatchQueue::entry (this=0x5590756aa870) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.cc:212 #3 0x00007f56cb40e42d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.h:102 #4 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 18 (Thread 0x7f56bad0c700 (LWP 30835)): #0 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb36a370 in Wait (mutex=..., this=0x559075815b88) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48 #2 DispatchQueue::run_local_delivery (this=0x559075815970) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.cc:117 #3 0x00007f56cb40e53d in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.h:116 #4 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7f56b9509700 (LWP 30838)): #0 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb36a370 in Wait (mutex=..., this=0x5590756aaa88) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48 #2 DispatchQueue::run_local_delivery (this=0x5590756aa870) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.cc:117 #3 0x00007f56cb40e53d in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.h:116 #4 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x7f56c0517700 (LWP 30802)): #0 0x00007f56c6d1b183 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f56cb687959 in EpollDriver::event_wait (this=0x5590755ab4d0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/EventEpoll.cc:113 #2 0x00007f56cb432587 in EventCenter::process_events (this=this@entry=0x559075259280, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f56c0512490) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Event.cc:399 #3 0x00007f56cb435b0c in operator() (__closure=0x5590755ab2c8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Stack.cc:53 #4 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316 #5 0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0 #6 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7f56c293d700 (LWP 30739)): #0 0x00007f56c6d0ff0d in poll () from /lib64/libc.so.6 #1 0x00007f56cb2cf3f7 in poll (__timeout=-1, __nfds=2, __fds=0x7f56c29384b0) at /usr/include/bits/poll2.h:46 #2 AdminSocket::entry (this=0x55907524aea0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/admin_socket.cc:234 #3 0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0 #4 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7f56c40bb700 (LWP 30704)): #0 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb31498d in ceph::logging::Log::entry (this=0x559075280680) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/log/Log.cc:542 #2 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7f56bbd0e700 (LWP 30833)): #0 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb2f75c9 in WaitUntil (when=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64 #2 WaitInterval (interval=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73 #3 ThreadPool::worker (this=0x5590756b82a0, wt=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.cc:141 #4 0x00007f56cb2f8eb0 in ThreadPool::WorkThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.h:449 #5 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7f56bfd16700 (LWP 30803)): #0 0x00007f56c6d1b183 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f56cb687959 in EpollDriver::event_wait (this=0x5590755ab110, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/EventEpoll.cc:113 #2 0x00007f56cb432587 in EventCenter::process_events (this=this@entry=0x559075258a80, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f56bfd11490) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Event.cc:399 #3 0x00007f56cb435b0c in operator() (__closure=0x5590755ab358) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Stack.cc:53 #4 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316 #5 0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0 #6 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f56bd511700 (LWP 30830)): #0 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb2f75c9 in WaitUntil (when=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64 #2 WaitInterval (interval=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73 ---Type <return> to continue, or q <return> to quit--- #3 ThreadPool::worker (this=0x5590756b82a0, wt=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.cc:141 #4 0x00007f56cb2f8eb0 in ThreadPool::WorkThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.h:449 #5 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7f56c18d4700 (LWP 30764)): #0 0x00007f56c6d0ff0d in poll () from /lib64/libc.so.6 #1 0x0000559073e58de2 in poll (__timeout=-1, __nfds=4, __fds=0x7f56c18cf260) at /usr/include/bits/poll2.h:41 #2 SignalHandler::entry (this=0x5590752809c0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:453 #3 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7f56c0f19700 (LWP 30800)): #0 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb2f0bc5 in Wait (mutex=..., this=0x5590756aa580) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48 #2 Finisher::finisher_thread_entry (this=0x5590756aa520) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Finisher.cc:87 #3 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f56ba50b700 (LWP 30836)): #0 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb2eef19 in Wait (mutex=..., this=0x5590756b89f8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48 #2 SafeTimer::timer_thread (this=0x5590756b89e0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Timer.cc:108 #3 0x00007f56cb2f037d in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Timer.cc:30 #4 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f56c313e700 (LWP 30736)): #0 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb4fde60 in WaitUntil (when=..., mutex=..., this=0x55907524aff0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64 #2 WaitInterval (interval=..., mutex=..., this=0x55907524aff0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73 #3 CephContextServiceThread::entry (this=0x55907524af70) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/ceph_context.cc:149 #4 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f56be513700 (LWP 30828)): #0 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb2f0bc5 in Wait (mutex=..., this=0x5590756b81c0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48 #2 Finisher::finisher_thread_entry (this=0x5590756b8160) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Finisher.cc:87 #3 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f56bcd10700 (LWP 30831)): #0 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb2f75c9 in WaitUntil (when=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64 #2 WaitInterval (interval=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73 #3 ThreadPool::worker (this=0x5590756b82a0, wt=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.cc:141 #4 0x00007f56cb2f8eb0 in ThreadPool::WorkThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.h:449 #5 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f56bb50d700 (LWP 30834)): #0 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb36aa6c in Wait (mutex=..., this=0x5590758159d8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48 #2 DispatchQueue::entry (this=0x559075815970) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.cc:212 #3 0x00007f56cb40e42d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.h:102 #4 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f56bdd12700 (LWP 30829)): #0 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f56cb2ef155 in WaitUntil (when=..., mutex=..., this=0x5590756b80a8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64 #2 SafeTimer::timer_thread (this=0x5590756b8090) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Timer.cc:110 #3 0x00007f56cb2f037d in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Timer.cc:30 #4 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f56bed14700 (LWP 30805)): #0 0x00007f56c6ce156d in nanosleep () from /lib64/libc.so.6 #1 0x00007f56c6d12404 in usleep () from /lib64/libc.so.6 #2 0x00007f56cb30b24b in OpHistoryServiceThread::entry (this=0x5590756b91f8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/TrackedOp.cc:44 #3 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f56c6d1abad in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f56bf515700 (LWP 30804)): #0 0x00007f56c7e5a59b in raise () from /lib64/libpthread.so.0 #1 0x0000559073e58495 in reraise_fatal (signum=11) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:80 #2 handle_fatal_signal (signum=11) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:290 #3 <signal handler called> #4 0x0000559075242770 in ?? () #5 0x00007f56cb425558 in cleanup (this=0x5590756aca00) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncConnection.h:394 #6 C_clean_handler::do_request (this=0x559075242710, id=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncConnection.cc:88 #7 0x00007f56cb432ef7 in EventCenter::process_events (this=this@entry=0x559075258c80, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f56bf510490) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Event.cc:439 #8 0x00007f56cb435b0c in operator() (__closure=0x5590755ab2f8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Stack.cc:53 #9 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316 #10 0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0 #11 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #12 0x00007f56c6d1abad in clone () from /lib64/libc.so.6
Updated by Sage Weil almost 6 years ago
(gdb) p this $1 = (AsyncConnection * const) 0x5590756aca00
which recently
-40> 2018-07-10 08:19:24.593 7f56bf515700 1 -- 172.21.15.189:6792/0 >> 172.21.15.189:6793/0 conn(0x5590756aca00 legacy :6792 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=13 cs=1 l=0).handle_connect_msg existing race replacing process for addr=172.21.15.189:6793/0 just fail later one(this) -34> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756aca00 172.21.15.189:6793/0
Updated by Kefu Chai over 5 years ago
ceph version 14.0.1-1145-g9379360 (937936047bc4fe0667467fcde9a8630519d3c4b5) nautilus (dev) 1: (()+0x12890) [0x7f3ebfc90890] 2: (AsyncConnection::cleanup()+0x57) [0x55ee12a23137] 3: (C_clean_handler::do_request(unsigned long)+0x12) [0x55ee12a29692] 4: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x1fdf) [0x55ee128ac70f] 5: (()+0xdcd4ea) [0x55ee128b44ea] 6: (()+0xbe733) [0x7f3ebf364733] 7: (()+0x76db) [0x7f3ebfc856db] 8: (clone()+0x3f) [0x7f3ebea2088f]
-2> 2018-11-27 12:25:58.723 7f3ebb5f2700 0 -- 172.21.15.196:6803/1011850 >> 172.21.15.196:44288/11851 conn(0x55ee1ddc2880 legacy :6803 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2 accept replacing existing (lossy) channel (new one lossy= 1) -1> 2018-11-27 12:25:58.723 7f3ebb5f2700 1 -- 172.21.15.196:6803/1011850 >> 172.21.15.196:44288/11851 conn(0x55ee1ddc2880 legacy :6803 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).replace replacing on lossy channel, failing existing 0> 2018-11-27 12:25:58.723 7f3ebb5f2700 -1 *** Caught signal (Segmentation fault) **
i think the connection in bt was the one being replaced.
/a/kchai-2018-11-27_11:44:27-rados-wip-kefu2-testing-2018-11-27-1724-distro-basic-smithi/3285107
Updated by Greg Farnum about 5 years ago
- Project changed from RADOS to Messengers
Updated by Kefu Chai over 4 years ago
in thread 7f4b9502cec0 thread_name:ceph-osd ceph version 15.0.0-9471-g65938ef (65938ef1914f3a62020dd13164a810ab60ec5e77) octopus (dev) 1: (()+0x12d80) [0x7f4b92ff5d80] 2: (pthread_cond_wait()+0x1fc) [0x7f4b92ff148c] 3: (std::condition_variable::wait(std::unique_lock<std::mutex>&)+0x10) [0x7f4b926b3780] 4: (AsyncMessenger::wait()+0x1ff) [0x55f7dda274bf] 5: (main()+0x49ad) [0x55f7dd1ceead] 6: (__libc_start_main()+0xf3) [0x7f4b91cbc813] 7: (_start()+0x2e) [0x55f7dd20a5ee] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
/a/kchai-2020-01-19_03:33:01-rados-wip-kefu-testing-2020-01-18-2208-distro-basic-smithi/4682422
Actions