Bug #43808
closedsegv in MonClient::handle_auth_done
0%
Description
-157> 2020-01-24T10:35:17.317+0000 1200a700 -1 *** Caught signal (Segmentation fault) ** in thread 1200a700 thread_name:msgr-worker-1 ceph version 15.0.0-9704-gb1192ba (b1192ba8ca76fda4688de6ceec97751d65ca697f) octopus (dev) 1: (()+0x12dd0) [0xc253dd0] 2: (MonClient::handle_auth_done(Connection*, AuthConnectionMeta*, unsigned long, unsigned int, ceph::buffer::v14_2_0::list const&, CryptoKey*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x78) [0xf244c8] 3: (ProtocolV2::handle_auth_done(ceph::buffer::v14_2_0::list&)+0x2f6) [0x10b6b06] 4: (ProtocolV2::handle_frame_payload()+0x1e3) [0x10c4e23] 5: (ProtocolV2::handle_read_frame_dispatch()+0x160) [0x10c50d0] 6: (ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr<ceph::buffer::v14_2_0::ptr_node, ceph::buffer::v14_2_0::ptr_node::disposer>&&, int)+0x475) [0x10c56a5] 7: (ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x3c) [0x10abe5c] 8: (AsyncConnection::process()+0x8a9) [0x10826b9] 9: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xcb7) [0xed4447] 10: (()+0xdd1a2c) [0xed9a2c] 11: (()+0xc2a33) [0xcb7ba33] 12: (()+0x82de) [0xc2492de] 13: (clone()+0x43) [0xd4e44b3]
/a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/4698040
Updated by Sage Weil about 4 years ago
no core, no real clue in the log.
trying to reproduce here: http://pulpito.front.sepia.ceph.com:80/sage-2020-01-24_17:17:32-rados:verify-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/
from: teuthology-suite -s rados/verify --filter validater/valgrind.yaml -c wip-sage4-testing-2020-01-23-1347 -k distro -m smithi -p 90 --subset 1/9 -N 3
Updated by Sage Weil about 4 years ago
- Status changed from New to Need More Info
Updated by Sage Weil about 4 years ago
/a/sage-2020-01-29_20:14:58-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/4718136
ceph-osd.4.log
description: rados/verify/{centos_latest.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml}
d-thrash/default/{default.yaml thrashosds-health.yaml} msgr-failures/few.yaml msgr/async-v1only.yaml
objectstore/bluestore-stupid.yaml rados.yaml tasks/rados_api_tests.yaml validater/valgrind.yaml}
Updated by Sage Weil about 4 years ago
/a/sage-2020-02-05_03:10:48-rados-wip-sage2-testing-2020-02-04-1448-distro-basic-smithi/4733235
ceph-osd.2.log
description: rados/verify/{centos_latest.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml}
d-thrash/default/{default.yaml thrashosds-health.yaml} msgr-failures/few.yaml msgr/async-v1only.yaml
objectstore/filestore-xfs.yaml rados.yaml tasks/rados_api_tests.yaml validater/valgrind.yaml}
Updated by Brad Hubbard about 4 years ago
- Related to Bug #44062: LibRadosWatchNotify.WatchNotify failure added
Updated by Brad Hubbard about 4 years ago
- Crash signature (v1) updated (diff)
- Affected Versions v0.80.11 added
Updated by Brad Hubbard about 4 years ago
- Affected Versions v15.0.0 added
- Affected Versions deleted (
v0.80.11)
Updated by Brad Hubbard about 4 years ago
See https://tracker.ceph.com/issues/44062#note-9
{ "crash_id": "2020-02-13T01:23:06.403017Z_b067bba7-fd83-4557-ae3d-a7b04d2d6c17", "timestamp": "2020-02-13T01:23:06.403017Z", "process_name": "memcheck-amd64-", "entity_name": "osd.1", "ceph_version": "15.1.0-441-g39dbd5e", "utsname_hostname": "smithi103", "utsname_sysname": "Linux", "utsname_release": "4.18.0-147.el8.x86_64", "utsname_version": "#1 SMP Wed Dec 4 21:51:45 UTC 2019", "utsname_machine": "x86_64", "os_name": "CentOS Linux", "os_id": "centos", "os_version_id": "8", "os_version": "8 (Core)", "backtrace": [ "(()+0x12dc0) [0xc23fdc0]", "(MonClient::handle_auth_done(Connection*, AuthConnectionMeta*, unsigned long, unsigned int, ceph::buffer::v14_2_0::list const&, CryptoKey*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x78) [0xf159b8]", "(ProtocolV2::handle_auth_done(ceph::buffer::v14_2_0::list&)+0x350) [0x10a7d40]", "(ProtocolV2::handle_frame_payload()+0x1e3) [0x10b6013]", "(ProtocolV2::handle_read_frame_dispatch()+0x160) [0x10b62c0]", "(ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr<ceph::buffer::v14_2_0::ptr_node, ceph::buffer::v14_2_0::ptr_node::disposer>&&, int)+0x37d) [0x10b679d]", "(ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x3c) [0x109cdfc]", "(AsyncConnection::process()+0x8a9) [0x1073499]", "(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xcb7) [0xec5d37]", "(()+0xdc331c) [0xecb31c]", "(()+0xc2b23) [0xcb67b23]", "(()+0x82de) [0xc2352de]", "(clone()+0x43) [0xd4d0133]" ] }
The only entries in the log that stand out to me are the following, although they are more than a second before the crash and probably the result of an injected error.
2020-02-13T01:23:05.310+0000 127f7700 1 -- 172.21.15.103:0/20065 >> [v2:172.21.15.109:6814/20141,v1:172.21.15.109:6815/20141] conn(0x14d683b0 msgr2=0x3dfa8d80 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send send error: (32) Broken pipe 2020-02-13T01:23:05.310+0000 127f7700 1 --2- 172.21.15.103:0/20065 >> [v2:172.21.15.109:6814/20141,v1:172.21.15.109:6815/20141] conn(0x14d683b0 0x3dfa8d80 crc :-1 s=SESSION_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0).write client ident write failed r=-32 ((32) Broken pipe)
From the valgrind log.
<error> <unique>0x0</unique> <tid>5</tid> <threadname>msgr-worker-2</threadname> <kind>InvalidRead</kind> <what>Invalid read of size 8</what> <stack> <frame> <ip>0xF159B8</ip> <obj>/usr/bin/ceph-osd</obj> <fn>MonClient::handle_auth_done(Connection*, AuthConnectionMeta*, unsigned long, unsigned int, ceph::buffer::v14_2_0::list const&, CryptoKey*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)</fn> <dir>/usr/src/debug/ceph-15.1.0-441.g39dbd5e.el8.x86_64/src/mon</dir> <file>MonClient.cc</file> <line>1559</line> ... <auxwhat>Address 0x0 is not stack'd, malloc'd or (recently) free'd</auxwhat>
So this looks like a null pointer read.
That's here.
1512 int MonClient::handle_auth_done(
1513 Connection *con,
1514 AuthConnectionMeta *auth_meta,
1515 uint64_t global_id,
1516 uint32_t con_mode,
1517 const ceph::buffer::list& bl,
1518 CryptoKey *session_key,
1519 std::string *connection_secret)
1520 {
...
1556 } else {
1557 // verify authorizer reply
1558 auto p = bl.begin();
1559 if (!auth_meta->authorizer->verify_reply(p, &auth_meta->connection_secret)) {
1560 ldout(cct, 0) << __func__ << " failed verifying authorizer reply"
Bad bufferlist maybe?
Looking at the disassembled ceph-osd (objdump -rdS).
This is our function and we segfaulted at an offset into the function of +0x78 according to the stack trace so that's 0xe0d940+0x78=0xe0d9b8.
0000000000e0d940 <_ZN9MonClient16handle_auth_doneEP10ConnectionP18AuthConnectionMetamjRKN4ceph6buffer7v14_2_04listEP9CryptoKeyPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@Base>: ... e0d950: 41 54 push %r12 e0d952: 49 89 d4 mov %rdx,%r12 ... e0d9a6: e8 d5 ed fc ff callq ddc780 <_ZN4ceph6buffer7v14_2_04list13iterator_implILb1EEC1EPKS2_j@@Base> e0d9ab: 49 8b 7c 24 78 mov 0x78(%r12),%rdi e0d9b0: 49 8d 54 24 58 lea 0x58(%r12),%rdx e0d9b5: 48 89 de mov %rbx,%rsi e0d9b8: 48 8b 07 mov (%rdi),%rax <===
To be continued.
Updated by Brad Hubbard about 4 years ago
Using gdb to break on MonClient::handle_auth_done in a local vstart cluster I see the following.
(gdb) p ((AuthConnectionMeta *)0x0)->connection_secret Cannot access memory at address 0x58 (gdb) p ((AuthConnectionMeta *)0x0)->authorizer Cannot access memory at address 0x78
They match the offsets we observed in the objdump assembly output and this leaads me to believe that the cause was that the authorizer field was NULL. A ceph_assert to check for that should catch it so that's the next strategy.
Updated by Neha Ojha about 4 years ago
/a/yuriw-2020-03-03_20:34:30-rados-master-distro-basic-smithi/4822427/ - trying to capture logs before the job goes dead
Updated by Sage Weil about 4 years ago
/a/sage-2020-03-03_19:20:19-rados-wip-sage2-testing-2020-03-03-0959-distro-basic-smithi/4821464
got the osd log.
<kind>InvalidRead</kind> <what>Invalid read of size 8</what> <stack> <frame> <ip>0xF1E698</ip> <obj>/usr/bin/ceph-osd</obj> <fn>MonClient::handle_auth_done(Connection*, AuthConnectionMeta*, unsigned long, unsigned int, ceph::buffer::v15_2_0::list const&, CryptoKey*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)</fn> <dir>/usr/src/debug/ceph-15.1.0-1606.gecd8670.el8.x86_64/src/mon</dir> <file>MonClient.cc</file> <line>1559</line> </frame> <frame> <ip>0x10B4CBF</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ProtocolV2::handle_auth_done(ceph::buffer::v15_2_0::list&)</fn> <dir>/usr/src/debug/ceph-15.1.0-1606.gecd8670.el8.x86_64/src/msg/async</dir> <file>ProtocolV2.cc</file> <line>1846</line> </frame> <frame> <ip>0x10C2E42</ip> <obj>/usr/bin/ceph-osd</obj> <fn>ProtocolV2::handle_frame_payload()</fn> <dir>/usr/src/debug/ceph-15.1.0-1606.gecd8670.el8.x86_64/src/msg/async</dir> <file>ProtocolV2.cc</file> <line>1246</line> </frame>
Updated by Sage Weil about 4 years ago
ok, in one thread, we are faulting:
2020-03-04T18:46:48.246+0000 12807700 1 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe 2020-03-04T18:46:48.247+0000 12807700 1 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=SESSION_RECONNECTING pgs=863 cs=41 l=0 rx=0 tx=0).write reconnect write failed r=-32 ((32) Broken pipe) 2020-03-04T18:46:48.247+0000 12807700 10 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=SESSION_RECONNECTING pgs=863 cs=41 l=0 rx=0 tx=0)._fault 2020-03-04T18:46:48.275+0000 12807700 5 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=SESSION_RECONNECTING pgs=863 cs=41 l=0 rx=0 tx=0).reset_recv_state 2020-03-04T18:46:48.275+0000 12807700 1 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=START_CONNECT pgs=863 cs=42 l=0 rx=0 tx=0)._fault waiting 0.200000 ...
which submits the reset_recv_state in the eventcenter,
connection->center->submit_to(connection->center->get_id(), [this] { ldout(cct, 5) << "reset_recv_state reseting crypto handlers" << dendl;
but, the submitting threads goes on to do new and interesting things,
2020-03-04T18:46:48.829+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTING l=0).process 2020-03-04T18:46:48.940+0000 12807700 10 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTING_RE l=0).process connect successfully, ready to send banner 2020-03-04T18:46:48.941+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=START_CONNECT pgs=863 cs=42 l=0 rx=0 tx=0).read_event 2020-03-04T18:46:48.941+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=START_CONNECT pgs=863 cs=42 l=0 rx=0 tx=0).start_client_banner_exchange 2020-03-04T18:46:48.941+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=BANNER_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0)._banner_exchange 2020-03-04T18:46:48.962+0000 12807700 10 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 26 remaining bytes 0 2020-03-04T18:46:48.962+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=10 2020-03-04T18:46:49.025+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=BANNER_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0)._handle_peer_banner r=0 2020-03-04T18:46:49.025+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=16 2020-03-04T18:46:49.025+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=BANNER_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0)._handle_peer_banner_payload r=0 2020-03-04T18:46:49.025+0000 12807700 1 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=BANNER_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0 2020-03-04T18:46:49.032+0000 12807700 10 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 85 remaining bytes 0 2020-03-04T18:46:49.032+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).read_frame 2020-03-04T18:46:49.033+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=32 2020-03-04T18:46:49.033+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_preamble_main r=0 2020-03-04T18:46:49.033+0000 12807700 10 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_preamble_main got new segment: len=36 align=8 2020-03-04T18:46:49.033+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).read_frame_segment 2020-03-04T18:46:49.033+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=36 2020-03-04T18:46:49.033+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_segment r=0 2020-03-04T18:46:49.033+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=17 2020-03-04T18:46:49.034+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_epilogue_main r=0 2020-03-04T18:46:49.034+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_epilogue_main message integrity check success: expected_crc=855695412 calculated_crc=855695412 2020-03-04T18:46:49.034+0000 12807700 10 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_dispatch tag=1 2020-03-04T18:46:49.034+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_hello payload.length()=36 2020-03-04T18:46:49.034+0000 12807700 5 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_hello received hello: peer_type=4 peer_addr_for_me=v2:172.21.15.180:47886/0 2020-03-04T18:46:49.034+0000 12807700 5 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_hello getsockname says I am 172.21.15.180:47886 when talking to v2:172.21.15.173:6809/534448 2020-03-04T18:46:49.035+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=HELLO_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).post_client_banner_exchange 2020-03-04T18:46:49.035+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).send_auth_request peer_type 4 auth_client 0x1ffefff190 2020-03-04T18:46:49.035+0000 12807700 10 monclient: get_auth_request con 0xfa60910 auth_method 2 2020-03-04T18:46:49.047+0000 12807700 10 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 243 remaining bytes 0 2020-03-04T18:46:49.047+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).read_frame 2020-03-04T18:46:49.047+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=32
like sending a new auth request.
then the reset_recv_state event actually happens in the event thread,
2020-03-04T18:46:49.651+0000 12807700 5 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 crc :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).reset_recv_state reseting crypto handlers
which clears auth_meta,
auth_meta.reset(new AuthConnectionMeta);
and then we do more interesting things like process the auth response
2020-03-04T18:46:50.783+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0).process 2020-03-04T18:46:50.783+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=32 2020-03-04T18:46:50.797+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 unknown :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_preamble_main r=0 2020-03-04T18:46:50.797+0000 12807700 10 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 unknown :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_preamble_main got new segment: len=52 align=8 2020-03-04T18:46:50.798+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 unknown :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).read_frame_segment 2020-03-04T18:46:50.798+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=52 2020-03-04T18:46:50.798+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 unknown :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_segment r=0 2020-03-04T18:46:50.798+0000 12807700 20 -- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 msgr2=0x14ccac20 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=17 2020-03-04T18:46:50.798+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 unknown :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_epilogue_main r=0 2020-03-04T18:46:50.798+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 unknown :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_epilogue_main message integrity check success: expected_crc=2822581327 calculated_crc=2822581327 2020-03-04T18:46:50.798+0000 12807700 10 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 unknown :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_read_frame_dispatch tag=6 2020-03-04T18:46:50.798+0000 12807700 20 --2- v2:172.21.15.180:6816/3161509 >> v2:172.21.15.173:6809/534448 conn(0xfa60910 0x14ccac20 unknown :-1 s=AUTH_CONNECTING pgs=863 cs=42 l=0 rx=0 tx=0).handle_auth_done payload.length()=52
...and crash because the auth_meta was pulled out from underneath us.
Updated by Sage Weil about 4 years ago
- Status changed from Need More Info to In Progress
- Assignee set to Radoslaw Zarzynski
This is related to 8a86719b062cd3b4b6c64a1c027564c092600117, which made this reset work async.
Updated by Radoslaw Zarzynski about 4 years ago
- Status changed from In Progress to Fix Under Review
Updated by Sage Weil about 4 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to nautilus
I'm guessing this should go to nautilus?
Updated by Radoslaw Zarzynski about 4 years ago
- Copied to Backport #44516: nautilus: segv in MonClient::handle_auth_done added
Updated by Nathan Cutler about 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".