Project

General

Profile

Bug #36497

FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::replace()

Added by Neha Ojha 10 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
10/17/2018
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

2018-10-17T01:21:01.616 INFO:tasks.ceph.mon.h.smithi061.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.0-4233-g8e2ed77/rpm/el7/BUILD/ceph-14.0.0-4233-g8e2ed77/src/msg/async/Protocol.cc: In function 'Ct<ProtocolV1>* ProtocolV1::replace(AsyncConnectionRef, ceph_msg_connect_reply&, ceph::bufferlist&)' thread 7f7dbc6a1700 time 2018-10-17 01:21:01.612883
2018-10-17T01:21:01.617 INFO:tasks.ceph.mon.h.smithi061.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.0-4233-g8e2ed77/rpm/el7/BUILD/ceph-14.0.0-4233-g8e2ed77/src/msg/async/Protocol.cc: 2173: FAILED ceph_assert(can_write == WriteStatus::NOWRITE)
2018-10-17T01:21:01.617 INFO:tasks.ceph.mon.h.smithi061.stderr: ceph version 14.0.0-4233-g8e2ed77 (8e2ed7796c88aec03322f396b75446e4435088e0) nautilus (dev)
2018-10-17T01:21:01.617 INFO:tasks.ceph.mon.h.smithi061.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x153) [0x7f7dcd40cf69]
2018-10-17T01:21:01.617 INFO:tasks.ceph.mon.h.smithi061.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f7dcd40d130]
2018-10-17T01:21:01.617 INFO:tasks.ceph.mon.h.smithi061.stderr: 3: (ProtocolV1::replace(boost::intrusive_ptr<AsyncConnection>, ceph_msg_connect_reply&, ceph::buffer::list&)+0x9ec) [0x7f7dcd6f64bc]
2018-10-17T01:21:01.618 INFO:tasks.ceph.mon.h.smithi061.stderr: 4: (ProtocolV1::handle_connect_message_2()+0x2177) [0x7f7dcd6f8657]
2018-10-17T01:21:01.618 INFO:tasks.ceph.mon.h.smithi061.stderr: 5: (ProtocolV1::handle_connect_message_auth(char*, int)+0x138) [0x7f7dcd6f9498]
2018-10-17T01:21:01.618 INFO:tasks.ceph.mon.h.smithi061.stderr: 6: (()+0x4f8c3d) [0x7f7dcd6dfc3d]
2018-10-17T01:21:01.618 INFO:tasks.ceph.mon.h.smithi061.stderr: 7: (AsyncConnection::process()+0x186) [0x7f7dcd6d26b6]
2018-10-17T01:21:01.618 INFO:tasks.ceph.mon.h.smithi061.stderr: 8: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x1585) [0x7f7dcd6ffd95]
2018-10-17T01:21:01.618 INFO:tasks.ceph.mon.h.smithi061.stderr: 9: (()+0x524807) [0x7f7dcd70b807]
2018-10-17T01:21:01.618 INFO:tasks.ceph.mon.h.smithi061.stderr: 10: (()+0x788fdf) [0x7f7dcd96ffdf]
2018-10-17T01:21:01.619 INFO:tasks.ceph.mon.h.smithi061.stderr: 11: (()+0x7e25) [0x7f7dca032e25]
2018-10-17T01:21:01.619 INFO:tasks.ceph.mon.h.smithi061.stderr: 12: (clone()+0x6d) [0x7f7dc9111bad]
2018-10-17T01:21:01.619 INFO:tasks.ceph.mon.h.smithi061.stderr:*** Caught signal (Aborted) **

/a/nojha-2018-10-16_22:54:08-rados-master-distro-basic-smithi/3149185/


Related issues

Copied to RADOS - Backport #37904: mimic: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::replace() Resolved
Copied to RADOS - Backport #37905: luminous: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::replace() Resolved

History

#1 Updated by Patrick Donnelly 10 months ago

  • Status changed from New to Verified
  • Priority changed from Normal to Urgent
  • Target version set to v14.0.0
  • Source set to Q/A

/ceph/teuthology-archive/pdonnell-2018-10-17_19:54:38-multimds-wip-pdonnell-testing-20181017.175152-distro-basic-smithi/3153740/teuthology.log

#2 Updated by Josh Durgin 8 months ago

/a/dzafman-2018-12-14_11:02:20-rados-wip-zafman-testing-distro-basic-smithi/3362409

#3 Updated by Sage Weil 8 months ago

/a/sage-2019-01-02_20:26:30-fs-wip-sage-testing-2019-01-02-1155-distro-basic-smithi/3416049

incoming connection...

   -44> 2019-01-03 16:17:58.709 21352700  1 -- 172.21.15.25:6790/0 >> - conn(0x1b070d50 legacy :6790 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=31 172.21.15.25:41154/0

but we mark_down_all...
   -43> 2019-01-03 16:17:58.728 405af00 10 mon.c@-1(probing) e0 bootstrap
   -42> 2019-01-03 16:17:58.730 405af00 10 mon.c@-1(probing) e0 sync_reset_requester
   -41> 2019-01-03 16:17:58.731 405af00 10 mon.c@-1(probing) e0 unregister_cluster_logger - not registered
   -40> 2019-01-03 16:17:58.732 405af00 10 mon.c@-1(probing) e0 cancel_probe_timeout (none scheduled)
   -39> 2019-01-03 16:17:58.732 405af00 10 mon.c@-1(probing) e0 reverting to legacy ranks for seed monmap (epoch 0)
   -38> 2019-01-03 16:17:58.735 405af00 10 mon.c@-1(probing) e0 monmap e0: 3 mons at {a=172.21.15.25:6789/0,b=172.21.15.205:6789/0,c=172.21.15.25:6790/0}
   -37> 2019-01-03 16:17:58.738 405af00  0 mon.c@-1(probing) e0  my rank is now 2 (was -1)
   -36> 2019-01-03 16:17:58.739 405af00  1 -- 172.21.15.25:6790/0 shutdown_connections 
...

which sets state=CLOSED and can_write=NOWRITE, but we continue with the connection a bit later,
    -9> 2019-01-03 16:17:58.836 1eb4d700 10 mon.c@2(probing) e0 ms_handle_reset 0x1b070d50 172.21.15.25:6789/0

and
    -8> 2019-01-03 16:17:58.866 21352700 10 cephx: verify_authorizer global_id=0
    -7> 2019-01-03 16:17:58.876 21352700 10 cephx: cephx_verify_authorizer adding server_challenge 1561489704070921529
    -6> 2019-01-03 16:17:58.903 21352700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615
    -5> 2019-01-03 16:17:58.903 21352700 10 cephx: verify_authorizer global_id=0
    -4> 2019-01-03 16:17:58.904 21352700 10 cephx: cephx_verify_authorizer got server_challenge+1 1561489704070921530 expecting 1561489704070921530
    -3> 2019-01-03 16:17:58.909 21352700 10 cephx: verify_authorizer ok nonce 3df5bc1ff15cc0de reply_bl.length()=36
    -2> 2019-01-03 16:17:58.921 21352700  1 -- 172.21.15.25:6790/0 >> 172.21.15.25:6789/0 conn(0x1b070d50 legacy :6790 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING
    -1> 2019-01-03 16:17:59.014 21352700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-2081-gecb01ef/rpm/el7/BUILD/ceph-14.0.1-2081-gecb01ef/src/msg/async/Protocol.cc: In function 'Ct<ProtocolV1>* ProtocolV1::replace(AsyncConnectionRef, ceph_msg_connect_reply&, ceph::bufferlist&)' thread 21352700 time 2019-01-03 16:17:58.930675
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-2081-gecb01ef/rpm/el7/BUILD/ceph-14.0.1-2081-gecb01ef/src/msg/async/Protocol.cc: 2182: FAILED ceph_assert(can_write == WriteStatus::NOWRITE)

 ceph version 14.0.1-2081-gecb01ef (ecb01ef429436cee37869ae48091019784855910) nautilus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x50652fc]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x50654ca]
 3: (ProtocolV1::replace(boost::intrusive_ptr<AsyncConnection>, ceph_msg_connect_reply&, ceph::buffer::list&)+0x97c) [0x536a92c]
 4: (ProtocolV1::handle_connect_message_2()+0x2161) [0x536cbd1]
 5: (ProtocolV1::handle_connect_message_auth(char*, int)+0x138) [0x536da18]
 6: (()+0x51d7bd) [0x53537bd]
 7: (AsyncConnection::process()+0x18c) [0x534615c]
 8: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x1585) [0x5376575]
 9: (()+0x54a127) [0x5380127]

not sure why the state is s=ACCEPTING_WAIT_CONNECT_MSG_AUTH. seems like once it was CLOSED it should have stayed that way. i suspect there is just a missing check somewhere after taking the lock...

#4 Updated by Ricardo Dias 8 months ago

  • Status changed from Verified to In Progress
  • Assignee set to Ricardo Dias

#5 Updated by Sage Weil 7 months ago

  • Status changed from In Progress to Need Review
  • Assignee changed from Ricardo Dias to Sage Weil

#6 Updated by Sage Weil 7 months ago

  -919> 2019-01-07 21:14:20.108 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth
  -918> 2019-01-07 21:14:20.108 7f6343646700 20 -- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=158
  -914> 2019-01-07 21:14:20.108 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0
  -913> 2019-01-07 21:14:20.108 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2
  -912> 2019-01-07 21:14:20.108 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 0 global_seq 17
  -911> 2019-01-07 21:14:20.108 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 1, policy.lossy=0 policy.server=0 poli
cy.standby=1 policy.resetcheck=1
  -910> 2019-01-07 21:14:20.108 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 13, their proto 13
  -909> 2019-01-07 21:14:20.108 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 158
  -908> 2019-01-07 21:14:20.108 7f635cba6f00 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).stop
  -907> 2019-01-07 21:14:20.108 7f635cba6f00  2 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).stop
  -906> 2019-01-07 21:14:20.108 7f635cba6f00 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).discard_out_queue started
  -903> 2019-01-07 21:14:20.108 7f634664c700 10 mon.g@7(probing) e0 ms_handle_reset 0x3719a80 v2:172.21.15.181:3303/0
  -884> 2019-01-07 21:14:20.108 7f635cba6f00  5 -- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] shutdown_connections delete 0x3719a80
  -559> 2019-01-07 21:14:20.116 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=CLOSED pgs=0 cs=0 l=0).handle_connect_message_2: challenging authorizer
  -556> 2019-01-07 21:14:20.116 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=CLOSED pgs=0 cs=0 l=0).send_connect_message_reply
  -553> 2019-01-07 21:14:20.116 7f6343646700 10 -- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=STATE_CLOSED l=0)._try_send sent bytes 58 remaining bytes 0
  -552> 2019-01-07 21:14:20.116 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=CLOSED pgs=0 cs=0 l=0).handle_connect_message_reply_write r=0
  -551> 2019-01-07 21:14:20.116 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=CLOSED pgs=0 cs=0 l=0).wait_connect_message
  -548> 2019-01-07 21:14:20.116 7f6343646700 20 -- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=STATE_CLOSED l=0).read start len=33
  -547> 2019-01-07 21:14:20.116 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=CLOSED pgs=0 cs=0 l=0).handle_connect_message_1 r=0
  -544> 2019-01-07 21:14:20.116 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth
  -543> 2019-01-07 21:14:20.116 7f6343646700 20 -- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=STATE_CLOSED l=0).read start len=158
  -540> 2019-01-07 21:14:20.116 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0
  -539> 2019-01-07 21:14:20.116 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2
  -538> 2019-01-07 21:14:20.116 7f6343646700 20 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 0 global_seq 17
  -537> 2019-01-07 21:14:20.116 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 1, policy.lossy=0 policy.server=0 poli
cy.standby=1 policy.resetcheck=1
  -536> 2019-01-07 21:14:20.116 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 13, their proto 13
  -533> 2019-01-07 21:14:20.116 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 158
  -523> 2019-01-07 21:14:20.116 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept setting up session_security.
  -498> 2019-01-07 21:14:20.116 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 existing 0x3719180 on [v2:172.21.15.181:3303/0,v1:172.21.1
5.181:6792/0]
  -497> 2019-01-07 21:14:20.116 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept existing 0x3719180.gseq 0 <= 17, looks ok
  -494> 2019-01-07 21:14:20.116 7f6343646700  1 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept connect_seq 0 vs existing csq=0 existing_state=STAT
E_CONNECTION_ESTABLISHED
  -492> 2019-01-07 21:14:20.116 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept connection race, existing 0x3719180.cseq 0 == 0, or
 we are server, replacing my attempt
  -490> 2019-01-07 21:14:20.116 7f6343646700 10 --2- [v2:172.21.15.74:3302/0,v1:172.21.15.74:6791/0] >> [v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0] conn(0x3719a80 msgr2 :45244 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).replace accept replacing 0x3719180
  -290> 2019-01-07 21:14:20.120 7f6343646700 -1 /build/ceph-14.0.1-2341-ge5f8cd8/src/msg/async/ProtocolV2.cc: In function 'Ct<ProtocolV2>* ProtocolV2::replace(AsyncConnectionRef, ceph_msg_connect_reply&, ceph::bufferlist&)' thread 7f6343646700 time 2019-01-07 21:14:20.123772
/build/ceph-14.0.1-2341-ge5f8cd8/src/msg/async/ProtocolV2.cc: 2242: FAILED ceph_assert(can_write == WriteStatus::NOWRITE)

 ceph version 14.0.1-2341-ge5f8cd8 (e5f8cd86d383cb496625e659e768ddbc7a748819) nautilus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f6353d33079]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6353d33254]
 3: (ProtocolV2::replace(boost::intrusive_ptr<AsyncConnection>, ceph_msg_connect_reply&, ceph::buffer::list&)+0xa60) [0x7f6354079bb0]
 4: (ProtocolV2::handle_connect_message_2()+0x241b) [0x7f635407c0ab]
 5: (ProtocolV2::handle_connect_message_auth(char*, int)+0x138) [0x7f635407ccc8]
 6: (ProtocolV2::read_event()+0x10c) [0x7f63540653ac]
 7: (AsyncConnection::process()+0x464) [0x7f6354038924]
 8: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x1fdf) [0x7f6354083bff]
 9: (()+0x61cbaa) [0x7f635408ebaa]
 10: (()+0x8f003f) [0x7f635436203f]
 11: (()+0x76ba) [0x7f63529056ba]
 12: (clone()+0x6d) [0x7f635212e41d]

/a/sage-2019-01-07_20:29:40-rados:multimon-wip-sage3-testing-2019-01-07-1253-distro-basic-smithi/3434750

#7 Updated by Sage Weil 7 months ago

  • Status changed from Need Review to Pending Backport
  • Backport set to luminous,mimic

#8 Updated by Nathan Cutler 7 months ago

  • Copied to Backport #37904: mimic: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::replace() added

#9 Updated by Nathan Cutler 7 months ago

  • Copied to Backport #37905: luminous: FAILED ceph_assert(can_write == WriteStatus::NOWRITE) in ProtocolV1::replace() added

#10 Updated by Nathan Cutler 6 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF