Project

General

Profile

Actions

Bug #24851

open

msg/async: segv in C_clean_handler::do_request during shutdown

Added by Sage Weil almost 6 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
AsyncMessenger
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

(gdb) bt
#0  0x00007f56c7e5a59b in raise () from /lib64/libpthread.so.0
#1  0x0000559073e58495 in reraise_fatal (signum=11) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:80
#2  handle_fatal_signal (signum=11) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:290
#3  <signal handler called>
#4  0x0000559075242770 in ?? ()
#5  0x00007f56cb425558 in cleanup (this=0x5590756aca00) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncConnection.h:394
#6  C_clean_handler::do_request (this=0x559075242710, id=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncConnection.cc:88
#7  0x00007f56cb432ef7 in EventCenter::process_events (this=this@entry=0x559075258c80, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f56bf510490) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Event.cc:439
#8  0x00007f56cb435b0c in operator() (__closure=0x5590755ab2f8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Stack.cc:53
#9  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316
#10 0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0
#11 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f56c6d1abad in clone () from /lib64/libc.so.6
(gdb) info th
  Id   Target Id         Frame 
  21   Thread 0x7f56d3eb5140 (LWP 30698) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  20   Thread 0x7f56bc50f700 (LWP 30832) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  19   Thread 0x7f56b9d0a700 (LWP 30837) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  18   Thread 0x7f56bad0c700 (LWP 30835) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  17   Thread 0x7f56b9509700 (LWP 30838) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  16   Thread 0x7f56c0517700 (LWP 30802) 0x00007f56c6d1b183 in epoll_wait () from /lib64/libc.so.6
  15   Thread 0x7f56c293d700 (LWP 30739) 0x00007f56c6d0ff0d in poll () from /lib64/libc.so.6
  14   Thread 0x7f56c40bb700 (LWP 30704) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13   Thread 0x7f56bbd0e700 (LWP 30833) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12   Thread 0x7f56bfd16700 (LWP 30803) 0x00007f56c6d1b183 in epoll_wait () from /lib64/libc.so.6
  11   Thread 0x7f56bd511700 (LWP 30830) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10   Thread 0x7f56c18d4700 (LWP 30764) 0x00007f56c6d0ff0d in poll () from /lib64/libc.so.6
  9    Thread 0x7f56c0f19700 (LWP 30800) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7f56ba50b700 (LWP 30836) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7    Thread 0x7f56c313e700 (LWP 30736) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6    Thread 0x7f56be513700 (LWP 30828) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7f56bcd10700 (LWP 30831) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7f56bb50d700 (LWP 30834) 0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7f56bdd12700 (LWP 30829) 0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7f56bed14700 (LWP 30805) 0x00007f56c6ce156d in nanosleep () from /lib64/libc.so.6
* 1    Thread 0x7f56bf515700 (LWP 30804) 0x00007f56c7e5a59b in raise () from /lib64/libpthread.so.0

/a/sage-2018-07-10_00:40:55-rados-wip-sage3-testing-2018-07-09-1727-distro-basic-smithi/2760645
  -118> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 shutdown_connections 
  -117> 2018-07-10 08:19:24.593 7f56c0517700 10 cephx: verify_authorizer global_id=0
  -116> 2018-07-10 08:19:24.593 7f56c0517700 10 cephx: cephx_verify_authorizer adding server_challenge 7443361343173355738
  -115> 2018-07-10 08:19:24.593 7f56c0517700  0 mon.l@11(probing) e1 ms_verify_authorizer bad authorizer from mon 172.21.15.189:6790/0
  -114> 2018-07-10 08:19:24.593 7f56c0517700  0 -- 172.21.15.189:6792/0 >> 172.21.15.189:6790/0 conn(0x5590756ad800 legacy :6792 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: challenging authorizer
  -113> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756aae00 172.21.15.82:6794/0
  -112> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756ab500 -
  -111> 2018-07-10 08:19:24.593 7f56bf515700 10 mon.l@11(probing) e1 ms_verify_authorizer 172.21.15.189:6793/0 mon protocol 2
  -110> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615
  -109> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756aca00 172.21.15.189:6793/0
  -108> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer global_id=0
  -107> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: cephx_verify_authorizer adding server_challenge 96238160679238695
  -106> 2018-07-10 08:19:24.593 7f56bf515700  0 mon.l@11(probing) e1 ms_verify_authorizer bad authorizer from mon 172.21.15.189:6793/0
  -105> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756ad800 172.21.15.189:6790/0
  -104> 2018-07-10 08:19:24.593 7f56bf515700  0 -- 172.21.15.189:6792/0 >> 172.21.15.189:6793/0 conn(0x5590756aca00 legacy :6792 s=STATE_CLOSED pgs=0 cs=0 l=0).handle_connect_msg: challenging authorizer
  -103> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 _reset
  -102> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 cancel_probe_timeout (none scheduled)
  -101> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 timecheck_finish
  -100> 2018-07-10 08:19:24.593 7f56d3eb5140 15 mon.l@11(probing) e1 health_tick_stop
   -99> 2018-07-10 08:19:24.593 7f56d3eb5140 15 mon.l@11(probing) e1 health_interval_stop
   -98> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 scrub_event_cancel
   -97> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 scrub_reset
   -96> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxos(paxos recovering c 1..220) restart -- canceling timeouts
   -95> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(mdsmap 1..1) restart
   -94> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(osdmap 1..10) restart
   -93> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(logm 1..71) restart
   -92> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(monmap 1..1) restart
   -91> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(auth 1..24) restart
   -90> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(mgr 1..3) restart
   -89> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(mgrstat 1..46) restart
   -88> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(health 1..67) restart
   -87> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing).paxosservice(config 1..1) restart
   -86> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 cancel_probe_timeout (none scheduled)
   -85> 2018-07-10 08:19:24.593 7f56bf515700 10 mon.l@11(probing) e1 ms_verify_authorizer 172.21.15.189:6793/0 mon protocol 2
   -84> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615
   -83> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 reset_probe_timeout 0x559075956570 after 2 seconds
   -82> 2018-07-10 08:19:24.593 7f56d3eb5140 10 mon.l@11(probing) e1 probing other monitors
   -81> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer global_id=0
   -80> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: cephx_verify_authorizer got server_challenge+1 96238160679238696 expecting 96238160679238696
   -79> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.0 172.21.15.82:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907527b900
   -78> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: verify_authorizer ok nonce e09e68b4c761c83 reply_bl.length()=36
   -77> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907527b900 con 0
   -76> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.1 172.21.15.184:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907527bb80
   -75> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907527bb80 con 0
   -74> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.2 172.21.15.189:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597a000
   -73> 2018-07-10 08:19:24.593 7f56bf515700 10 In get_auth_session_handler for protocol 2
   -72> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6789/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597a000 con 0
   -71> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.3 172.21.15.82:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597a280
   -70> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597a280 con 0
   -69> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.4 172.21.15.184:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597a500
   -68> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597a500 con 0
   -67> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.5 172.21.15.189:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597a780
   -66> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6790/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597a780 con 0
   -65> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.6 172.21.15.82:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597aa00
   -64> 2018-07-10 08:19:24.593 7f56c0517700  1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6789/0 conn(0x5590756ab500 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed 
   -63> 2018-07-10 08:19:24.593 7f56bfd16700  1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6790/0 conn(0x559075814300 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed 
   -62> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x5590756ab500 172.21.15.82:6789/0
   -61> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x559075814300 172.21.15.82:6790/0
   -60> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597aa00 con 0
   -59> 2018-07-10 08:19:24.593 7f56bfd16700  1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6791/0 conn(0x5590756ac300 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed 
   -58> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x5590756ac300 172.21.15.82:6791/0
   -57> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.7 172.21.15.184:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597ac80
   -56> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597ac80 con 0
   -55> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.8 172.21.15.189:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597af00
   -54> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6791/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597af00 con 0
   -53> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.9 172.21.15.82:6792/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597b180
   -52> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6792/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597b180 con 0
   -51> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.10 172.21.15.184:6792/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597b400
   -50> 2018-07-10 08:19:24.593 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon
   -49> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6792/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597b400 con 0
   -48> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.12 172.21.15.82:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597b680
   -47> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597b680 con 0
   -46> 2018-07-10 08:19:24.593 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
   -45> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.13 172.21.15.184:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597b900
   -44> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597b900 con 0
   -43> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.14 172.21.15.189:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x55907597bb80
   -42> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6793/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x55907597bb80 con 0
   -41> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.15 172.21.15.82:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6000
   -40> 2018-07-10 08:19:24.593 7f56bf515700  1 -- 172.21.15.189:6792/0 >> 172.21.15.189:6793/0 conn(0x5590756aca00 legacy :6792 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=13 cs=1 l=0).handle_connect_msg existing race replacing process for addr=172.21.15.189:6793/0 just fail later one(this)
   -39> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6000 con 0
   -38> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.16 172.21.15.184:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6280
   -37> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6280 con 0
   -36> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.17 172.21.15.189:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6500
   -35> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6794/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6500 con 0
   -34> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756aca00 172.21.15.189:6793/0
   -33> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.18 172.21.15.82:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6780
   -32> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.82:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6780 con 0
   -31> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.19 172.21.15.184:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6a00
   -30> 2018-07-10 08:19:24.593 7f56bf515700  1 -- 172.21.15.189:37330/30698 learned_addr learned my addr 172.21.15.189:37330/30698 (peer_addr_for_me 172.21.15.189:37330/0)
   -29> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.184:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6a00 con 0
   -28> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 _send_message--> mon.20 172.21.15.189:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- ?+0 0x5590759a6c80
   -27> 2018-07-10 08:19:24.593 7f56bf515700 10 mon.l@11(probing) e1 ms_get_authorizer for mgr
   -26> 2018-07-10 08:19:24.593 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon
   -25> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx keyserverdata: get_caps: name=mon.
   -24> 2018-07-10 08:19:24.593 7f56bf515700 10 cephx: build_service_ticket service mgr secret_id 2 ticket_info.ticket.name=mon.
   -23> 2018-07-10 08:19:24.593 7f56d3eb5140  1 -- 172.21.15.189:6792/0 --> 172.21.15.189:6795/0 -- mon_probe(probe 7d2eacfb-d334-4194-975b-4ac1d8cfe47a name l) v6 -- 0x5590759a6c80 con 0
   -22> 2018-07-10 08:19:24.593 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
   -21> 2018-07-10 08:19:24.594 7f56bfd16700  1 -- 172.21.15.189:6792/0 >> 172.21.15.184:6793/0 conn(0x559075990300 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed 
   -20> 2018-07-10 08:19:24.594 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x559075990300 172.21.15.184:6793/0
   -19> 2018-07-10 08:19:24.594 7f56bfd16700  1 -- 172.21.15.189:6792/0 >> 172.21.15.184:6794/0 conn(0x559075991800 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed 
   -18> 2018-07-10 08:19:24.594 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x559075991800 172.21.15.184:6794/0
   -17> 2018-07-10 08:19:24.594 7f56bfd16700 10 mon.l@11(probing) e1 ms_get_authorizer for mon
   -16> 2018-07-10 08:19:24.594 7f56bfd16700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
   -15> 2018-07-10 08:19:24.594 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon
   -14> 2018-07-10 08:19:24.594 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
   -13> 2018-07-10 08:19:24.594 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon
   -12> 2018-07-10 08:19:24.594 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
   -11> 2018-07-10 08:19:24.594 7f56c0517700 10 mon.l@11(probing) e1 ms_get_authorizer for mon
   -10> 2018-07-10 08:19:24.594 7f56c0517700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
    -9> 2018-07-10 08:19:24.594 7f56c0517700  1 -- 172.21.15.189:6792/0 >> 172.21.15.184:6795/0 conn(0x5590759b0e00 legacy :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection reconnect failed 
    -8> 2018-07-10 08:19:24.594 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_refused 0x5590759b0e00 172.21.15.184:6795/0
    -7> 2018-07-10 08:19:24.594 7f56c0517700  1 -- 172.21.15.189:6792/0 >> 172.21.15.189:6790/0 conn(0x559075813c00 legacy :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got WAIT (connection race)
    -6> 2018-07-10 08:19:24.594 7f56c0517700  1 -- 172.21.15.189:6792/0 >> 172.21.15.189:6790/0 conn(0x559075813c00 legacy :-1 s=STATE_WAIT pgs=0 cs=0 l=0).process enter wait state, failing
    -5> 2018-07-10 08:19:24.594 7f56bfd16700 10 mon.l@11(probing) e1 ms_get_authorizer for mon
    -4> 2018-07-10 08:19:24.594 7f56bfd16700 10 cephx: build_service_ticket service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
    -3> 2018-07-10 08:19:24.594 7f56bfd16700  1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6792/0 conn(0x55907598ee00 legacy :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got WAIT (connection race)
    -2> 2018-07-10 08:19:24.594 7f56bfd16700  1 -- 172.21.15.189:6792/0 >> 172.21.15.82:6792/0 conn(0x55907598ee00 legacy :-1 s=STATE_WAIT pgs=0 cs=0 l=0).process enter wait state, failing
    -1> 2018-07-10 08:19:24.594 7f56bf515700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f56bf515700 thread_name:msgr-worker-2

 ceph version 14.0.0-1224-g0087bf8 (0087bf835dfd0670408c4d798f595d99ff3cbe26) nautilus (dev)
 1: (()+0xf6d0) [0x7f56c7e5a6d0]
 2: [0x559075242770]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #1

Updated by Sage Weil almost 6 years ago

  • Description updated (diff)
  • Status changed from New to 12
Actions #2

Updated by Sage Weil almost 6 years ago

Thread 21 (Thread 0x7f56d3eb5140 (LWP 30698)):
#0  0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb42d687 in Wait (mutex=..., this=0x559075815e30) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48
#2  AsyncMessenger::wait (this=0x559075815800) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncMessenger.cc:520
#3  0x0000559073b2a432 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/ceph_mon.cc:802

Thread 20 (Thread 0x7f56bc50f700 (LWP 30832)):
#0  0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb2f75c9 in WaitUntil (when=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64
#2  WaitInterval (interval=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73
#3  ThreadPool::worker (this=0x5590756b82a0, wt=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.cc:141
#4  0x00007f56cb2f8eb0 in ThreadPool::WorkThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.h:449
#5  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f56b9d0a700 (LWP 30837)):
#0  0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb36aa6c in Wait (mutex=..., this=0x5590756aa8d8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48
#2  DispatchQueue::entry (this=0x5590756aa870) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.cc:212
#3  0x00007f56cb40e42d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.h:102
#4  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f56bad0c700 (LWP 30835)):
#0  0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb36a370 in Wait (mutex=..., this=0x559075815b88) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48
#2  DispatchQueue::run_local_delivery (this=0x559075815970) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.cc:117
#3  0x00007f56cb40e53d in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.h:116
#4  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f56b9509700 (LWP 30838)):
#0  0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb36a370 in Wait (mutex=..., this=0x5590756aaa88) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48
#2  DispatchQueue::run_local_delivery (this=0x5590756aa870) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.cc:117
#3  0x00007f56cb40e53d in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.h:116
#4  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f56c0517700 (LWP 30802)):
#0  0x00007f56c6d1b183 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f56cb687959 in EpollDriver::event_wait (this=0x5590755ab4d0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/EventEpoll.cc:113
#2  0x00007f56cb432587 in EventCenter::process_events (this=this@entry=0x559075259280, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f56c0512490) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Event.cc:399
#3  0x00007f56cb435b0c in operator() (__closure=0x5590755ab2c8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Stack.cc:53
#4  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316
#5  0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0
#6  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f56c293d700 (LWP 30739)):
#0  0x00007f56c6d0ff0d in poll () from /lib64/libc.so.6
#1  0x00007f56cb2cf3f7 in poll (__timeout=-1, __nfds=2, __fds=0x7f56c29384b0) at /usr/include/bits/poll2.h:46
#2  AdminSocket::entry (this=0x55907524aea0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/admin_socket.cc:234
#3  0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0
#4  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f56c40bb700 (LWP 30704)):
#0  0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb31498d in ceph::logging::Log::entry (this=0x559075280680) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/log/Log.cc:542
#2  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f56bbd0e700 (LWP 30833)):
#0  0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb2f75c9 in WaitUntil (when=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64
#2  WaitInterval (interval=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73
#3  ThreadPool::worker (this=0x5590756b82a0, wt=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.cc:141
#4  0x00007f56cb2f8eb0 in ThreadPool::WorkThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.h:449
#5  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f56bfd16700 (LWP 30803)):
#0  0x00007f56c6d1b183 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f56cb687959 in EpollDriver::event_wait (this=0x5590755ab110, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/EventEpoll.cc:113
#2  0x00007f56cb432587 in EventCenter::process_events (this=this@entry=0x559075258a80, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f56bfd11490) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Event.cc:399
#3  0x00007f56cb435b0c in operator() (__closure=0x5590755ab358) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Stack.cc:53
#4  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316
#5  0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0
#6  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f56bd511700 (LWP 30830)):
#0  0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb2f75c9 in WaitUntil (when=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64
#2  WaitInterval (interval=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73
---Type <return> to continue, or q <return> to quit---
#3  ThreadPool::worker (this=0x5590756b82a0, wt=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.cc:141
#4  0x00007f56cb2f8eb0 in ThreadPool::WorkThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.h:449
#5  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f56c18d4700 (LWP 30764)):
#0  0x00007f56c6d0ff0d in poll () from /lib64/libc.so.6
#1  0x0000559073e58de2 in poll (__timeout=-1, __nfds=4, __fds=0x7f56c18cf260) at /usr/include/bits/poll2.h:41
#2  SignalHandler::entry (this=0x5590752809c0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:453
#3  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f56c0f19700 (LWP 30800)):
#0  0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb2f0bc5 in Wait (mutex=..., this=0x5590756aa580) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48
#2  Finisher::finisher_thread_entry (this=0x5590756aa520) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Finisher.cc:87
#3  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f56ba50b700 (LWP 30836)):
#0  0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb2eef19 in Wait (mutex=..., this=0x5590756b89f8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48
#2  SafeTimer::timer_thread (this=0x5590756b89e0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Timer.cc:108
#3  0x00007f56cb2f037d in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Timer.cc:30
#4  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f56c313e700 (LWP 30736)):
#0  0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb4fde60 in WaitUntil (when=..., mutex=..., this=0x55907524aff0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64
#2  WaitInterval (interval=..., mutex=..., this=0x55907524aff0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73
#3  CephContextServiceThread::entry (this=0x55907524af70) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/ceph_context.cc:149
#4  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f56be513700 (LWP 30828)):
#0  0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb2f0bc5 in Wait (mutex=..., this=0x5590756b81c0) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48
#2  Finisher::finisher_thread_entry (this=0x5590756b8160) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Finisher.cc:87
#3  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f56bcd10700 (LWP 30831)):
#0  0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb2f75c9 in WaitUntil (when=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64
#2  WaitInterval (interval=..., mutex=..., this=0x5590756b8320) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:73
#3  ThreadPool::worker (this=0x5590756b82a0, wt=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.cc:141
#4  0x00007f56cb2f8eb0 in ThreadPool::WorkThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/WorkQueue.h:449
#5  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f56bb50d700 (LWP 30834)):
#0  0x00007f56c7e56995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb36aa6c in Wait (mutex=..., this=0x5590758159d8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:48
#2  DispatchQueue::entry (this=0x559075815970) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.cc:212
#3  0x00007f56cb40e42d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/DispatchQueue.h:102
#4  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f56bdd12700 (LWP 30829)):
#0  0x00007f56c7e56d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f56cb2ef155 in WaitUntil (when=..., mutex=..., this=0x5590756b80a8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Cond.h:64
#2  SafeTimer::timer_thread (this=0x5590756b8090) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Timer.cc:110
#3  0x00007f56cb2f037d in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/Timer.cc:30
#4  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f56bed14700 (LWP 30805)):
#0  0x00007f56c6ce156d in nanosleep () from /lib64/libc.so.6
#1  0x00007f56c6d12404 in usleep () from /lib64/libc.so.6
#2  0x00007f56cb30b24b in OpHistoryServiceThread::entry (this=0x5590756b91f8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/common/TrackedOp.cc:44
#3  0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f56c6d1abad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f56bf515700 (LWP 30804)):
#0  0x00007f56c7e5a59b in raise () from /lib64/libpthread.so.0
#1  0x0000559073e58495 in reraise_fatal (signum=11) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:80
#2  handle_fatal_signal (signum=11) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/global/signal_handler.cc:290
#3  <signal handler called>
#4  0x0000559075242770 in ?? ()
#5  0x00007f56cb425558 in cleanup (this=0x5590756aca00) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncConnection.h:394
#6  C_clean_handler::do_request (this=0x559075242710, id=<optimized out>) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/AsyncConnection.cc:88
#7  0x00007f56cb432ef7 in EventCenter::process_events (this=this@entry=0x559075258c80, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f56bf510490) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Event.cc:439
#8  0x00007f56cb435b0c in operator() (__closure=0x5590755ab2f8) at /usr/src/debug/ceph-14.0.0-1224-g0087bf8/src/msg/async/Stack.cc:53
#9  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316
#10 0x00007f56cb7266cf in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0
#11 0x00007f56c7e52e25 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#12 0x00007f56c6d1abad in clone () from /lib64/libc.so.6
Actions #3

Updated by Sage Weil almost 6 years ago

(gdb) p this
$1 = (AsyncConnection * const) 0x5590756aca00

which recently

   -40> 2018-07-10 08:19:24.593 7f56bf515700  1 -- 172.21.15.189:6792/0 >> 172.21.15.189:6793/0 conn(0x5590756aca00 legacy :6792 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=13 cs=1 l=0).handle_connect_msg existing race replacing process for addr=172.21.15.189:6793/0 just fail later one(this)
   -34> 2018-07-10 08:19:24.593 7f56bb50d700 10 mon.l@11(probing) e1 ms_handle_reset 0x5590756aca00 172.21.15.189:6793/0
Actions #4

Updated by Kefu Chai over 5 years ago

 ceph version 14.0.1-1145-g9379360 (937936047bc4fe0667467fcde9a8630519d3c4b5) nautilus (dev)
 1: (()+0x12890) [0x7f3ebfc90890]
 2: (AsyncConnection::cleanup()+0x57) [0x55ee12a23137]
 3: (C_clean_handler::do_request(unsigned long)+0x12) [0x55ee12a29692]
 4: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x1fdf) [0x55ee128ac70f]
 5: (()+0xdcd4ea) [0x55ee128b44ea]
 6: (()+0xbe733) [0x7f3ebf364733]
 7: (()+0x76db) [0x7f3ebfc856db]
 8: (clone()+0x3f) [0x7f3ebea2088f]
    -2> 2018-11-27 12:25:58.723 7f3ebb5f2700  0 -- 172.21.15.196:6803/1011850 >> 172.21.15.196:44288/11851 conn(0x55ee1ddc2880 legacy :6803 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2 accept replacing existing (lossy) channel (new one lossy=
1)
    -1> 2018-11-27 12:25:58.723 7f3ebb5f2700  1 -- 172.21.15.196:6803/1011850 >> 172.21.15.196:44288/11851 conn(0x55ee1ddc2880 legacy :6803 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).replace replacing on lossy channel, failing existing
     0> 2018-11-27 12:25:58.723 7f3ebb5f2700 -1 *** Caught signal (Segmentation fault) **

i think the connection in bt was the one being replaced.

/a/kchai-2018-11-27_11:44:27-rados-wip-kefu2-testing-2018-11-27-1724-distro-basic-smithi/3285107

Actions #5

Updated by Greg Farnum about 5 years ago

  • Project changed from RADOS to Messengers
Actions #6

Updated by Greg Farnum about 5 years ago

  • Category set to AsyncMessenger
Actions #7

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions #8

Updated by Kefu Chai over 4 years ago

 in thread 7f4b9502cec0 thread_name:ceph-osd

 ceph version 15.0.0-9471-g65938ef (65938ef1914f3a62020dd13164a810ab60ec5e77) octopus (dev)
 1: (()+0x12d80) [0x7f4b92ff5d80]
 2: (pthread_cond_wait()+0x1fc) [0x7f4b92ff148c]
 3: (std::condition_variable::wait(std::unique_lock<std::mutex>&)+0x10) [0x7f4b926b3780]
 4: (AsyncMessenger::wait()+0x1ff) [0x55f7dda274bf]
 5: (main()+0x49ad) [0x55f7dd1ceead]
 6: (__libc_start_main()+0xf3) [0x7f4b91cbc813]
 7: (_start()+0x2e) [0x55f7dd20a5ee]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/a/kchai-2020-01-19_03:33:01-rados-wip-kefu-testing-2020-01-18-2208-distro-basic-smithi/4682422

Actions

Also available in: Atom PDF