Project

General

Profile

Actions

Bug #25027

closed

mon: src/msg/async/AsyncConnection.cc: 1710: FAILED assert(can_write == WriteStatus::NOWRITE)

Added by Patrick Donnelly almost 6 years ago. Updated about 5 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-07-20 10:07:05.664 1fa41700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615
2018-07-20 10:07:05.664 1fa41700 10 cephx: verify_authorizer global_id=0
2018-07-20 10:07:05.664 1fa41700 10 cephx: cephx_verify_authorizer got server_challenge+1 5357087913821005252 expecting 5357087913821005252
2018-07-20 10:07:05.666 1fa41700 10 cephx: verify_authorizer ok nonce 656256e15b9d0386 reply_bl.length()=36
2018-07-20 10:07:05.669 1fa41700  1 -- 172.21.15.203:6790/0 >> 172.21.15.132:6789/0 conn(0x173d72c0 legacy :6790 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING
2018-07-20 10:07:05.684 1d63d700 10 mon.c@2(probing) e0 ms_handle_reset 0x173d72c0 172.21.15.132:6789/0
2018-07-20 10:07:05.686 1d63d700 10 mon.c@2(probing) e0 ms_handle_reset 0x173dcee0 -
2018-07-20 10:07:05.721 1fa41700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.0-1465-g87df86c/rpm/el7/BUILD/ceph-14.0.0-1465-g87df86c/src/msg/async/AsyncConnection.cc: In function 'ssize_t AsyncConnection::handle_connect_msg(ceph_msg_connect&, ceph::bufferlist&, ceph::bufferlist&)' thread 1fa41700 time 2018-07-20 10:07:05.674702
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.0-1465-g87df86c/rpm/el7/BUILD/ceph-14.0.0-1465-g87df86c/src/msg/async/AsyncConnection.cc: 1710: FAILED assert(can_write == WriteStatus::NOWRITE)

 ceph version 14.0.0-1465-g87df86c (87df86ce59edfaf7b4d7798ef52585152f1a85c3) nautilus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x50c828b]
 2: (()+0x292447) [0x50c8447]
 3: (AsyncConnection::handle_connect_msg(ceph_msg_connect&, ceph::buffer::list&, ceph::buffer::list&)+0x2904) [0x51f5b44]
 4: (AsyncConnection::_process_connection()+0xce1) [0x51f6fb1]
 5: (AsyncConnection::process()+0x600) [0x51fa5c0]
 6: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa67) [0x520c0f7]
 7: (()+0x3d8cc5) [0x520ecc5]
 8: (()+0x6d18df) [0x55078df]
 9: (()+0x7e25) [0x106b7e25]
 10: (clone()+0x6d) [0x1182cbad]

From: /ceph/teuthology-archive/pdonnell-2018-07-20_06:53:09-fs-wip-pdonnell-testing-20180720.044448-testing-basic-smithi/2800466/teuthology.log


Related issues 2 (1 open1 closed)

Related to Messengers - Bug #34525: MDS Daemon msgr-worker-2 thread crush New

Actions
Has duplicate Messengers - Bug #25208: msg/async/AsyncConnection.cc: 1710: FAILED assert(can_write == WriteStatus::NOWRITE)Duplicate07/31/2018

Actions
Actions #1

Updated by Josh Durgin over 5 years ago

  • Has duplicate Bug #25208: msg/async/AsyncConnection.cc: 1710: FAILED assert(can_write == WriteStatus::NOWRITE) added
Actions #2

Updated by Neha Ojha over 5 years ago

/a/yuriw-2018-08-01_19:35:55-rados-wip-yuri-testing-2018-08-01-1605-luminous-distro-basic-smithi/2849251/

Actions #3

Updated by Neha Ojha over 5 years ago

/a/yuriw-2018-08-09_19:46:31-rados-wip-yuri4-testing-2018-08-09-1603-luminous-distro-basic-smithi/2886515/

Actions #4

Updated by Ilya Dryomov over 5 years ago

http://qa-proxy.ceph.com/teuthology/dis-2018-08-31_21:17:20-krbd-wip-krbd-namespaces-testing-basic-smithi/2965219/teuthology.log:

2018-08-31T22:01:37.063 INFO:tasks.ceph:Starting mon daemons in cluster ceph...
2018-08-31T22:01:37.064 INFO:teuthology.orchestra.run.smithi104:Running: 'which systemctl'
2018-08-31T22:01:37.148 INFO:teuthology.orchestra.run.smithi104.stdout:/bin/systemctl
2018-08-31T22:01:37.148 INFO:tasks.ceph.mon.a:Restarting daemon
2018-08-31T22:01:37.149 INFO:teuthology.orchestra.run.smithi104:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i a'
2018-08-31T22:01:37.219 INFO:tasks.ceph.mon.a:Started
2018-08-31T22:01:37.219 INFO:tasks.ceph.mon.c:Restarting daemon
2018-08-31T22:01:37.219 INFO:teuthology.orchestra.run.smithi104:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i c'
2018-08-31T22:01:37.223 INFO:tasks.ceph.mon.c:Started
2018-08-31T22:01:37.223 INFO:teuthology.orchestra.run.smithi055:Running: 'which systemctl'
2018-08-31T22:01:37.233 INFO:teuthology.orchestra.run.smithi055.stdout:/bin/systemctl
2018-08-31T22:01:37.234 INFO:tasks.ceph.mon.b:Restarting daemon
2018-08-31T22:01:37.234 INFO:teuthology.orchestra.run.smithi055:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i b'
2018-08-31T22:01:37.303 INFO:tasks.ceph.mon.b:Started
2018-08-31T22:01:37.303 INFO:tasks.ceph:Starting mgr daemons in cluster ceph...
2018-08-31T22:01:37.304 INFO:tasks.ceph.mgr.x:Restarting daemon
2018-08-31T22:01:37.304 INFO:teuthology.orchestra.run.smithi104:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mgr -f --cluster ceph -i x'
2018-08-31T22:01:37.313 INFO:tasks.ceph.mgr.x:Started
2018-08-31T22:01:37.314 INFO:tasks.ceph.mgr.y:Restarting daemon
2018-08-31T22:01:37.314 INFO:teuthology.orchestra.run.smithi055:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mgr -f --cluster ceph -i y'
2018-08-31T22:01:37.316 INFO:tasks.ceph.mon.c.smithi104.stderr:/build/ceph-14.0.0-2724-g0a037ef/src/msg/async/AsyncConnection.cc: In function 'ssize_t AsyncConnection::handle_connect_msg(ceph_msg_connect&, ceph::bufferlist&, ceph::bufferlist&)' thread 7f476ef35700 time 2018-08-31 22:01:37.563207
2018-08-31T22:01:37.317 INFO:tasks.ceph.mon.c.smithi104.stderr:/build/ceph-14.0.0-2724-g0a037ef/src/msg/async/AsyncConnection.cc: 1710: FAILED ceph_assert(can_write == WriteStatus::NOWRITE)
2018-08-31T22:01:37.318 INFO:tasks.ceph.mgr.y:Started
2018-08-31T22:01:37.318 INFO:tasks.ceph:Setting crush tunables to default
2018-08-31T22:01:37.319 INFO:teuthology.orchestra.run.smithi104:Running: 'sudo ceph --cluster ceph osd crush tunables default'
2018-08-31T22:01:37.320 INFO:tasks.ceph.mon.c.smithi104.stderr: ceph version 14.0.0-2724-g0a037ef (0a037ef15dae9774d9a5032bf376d634c7797ed5) nautilus (dev)
2018-08-31T22:01:37.321 INFO:tasks.ceph.mon.c.smithi104.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f477d890bc3]
2018-08-31T22:01:37.321 INFO:tasks.ceph.mon.c.smithi104.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f477d890d45]
2018-08-31T22:01:37.321 INFO:tasks.ceph.mon.c.smithi104.stderr: 3: (AsyncConnection::handle_connect_msg(ceph_msg_connect&, ceph::buffer::list&, ceph::buffer::list&)+0x30a0) [0x7f477db19e70]
2018-08-31T22:01:37.321 INFO:tasks.ceph.mon.c.smithi104.stderr: 4: (AsyncConnection::_process_connection()+0x111f) [0x7f477db1b34f]
2018-08-31T22:01:37.321 INFO:tasks.ceph.mon.c.smithi104.stderr: 5: (AsyncConnection::process()+0x658) [0x7f477db1ec18]
2018-08-31T22:01:37.321 INFO:tasks.ceph.mon.c.smithi104.stderr: 6: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x6d5) [0x7f477db30b35]
2018-08-31T22:01:37.321 INFO:tasks.ceph.mon.c.smithi104.stderr: 7: (()+0x556f18) [0x7f477db34f18]
2018-08-31T22:01:37.322 INFO:tasks.ceph.mon.c.smithi104.stderr: 8: (()+0x7a13bf) [0x7f477dd7f3bf]
2018-08-31T22:01:37.322 INFO:tasks.ceph.mon.c.smithi104.stderr: 9: (()+0x76ba) [0x7f477c6bb6ba]
2018-08-31T22:01:37.322 INFO:tasks.ceph.mon.c.smithi104.stderr: 10: (clone()+0x6d) [0x7f477bee441d]
Actions #6

Updated by Victor Denisov over 5 years ago

  • Assignee set to Victor Denisov
Actions #7

Updated by Patrick Donnelly over 5 years ago

  • Related to Bug #34525: MDS Daemon msgr-worker-2 thread crush added
Actions #8

Updated by Victor Denisov over 5 years ago

  • Assignee deleted (Victor Denisov)
Actions #9

Updated by Patrick Donnelly over 5 years ago

  • Status changed from New to 12

/ceph/teuthology-archive/pdonnell-2018-09-13_04:59:57-multimds-wip-pdonnell-testing-20180913.024004-distro-basic-smithi/3015049/teuthology.log

Actions #10

Updated by Sage Weil over 5 years ago

/a/sage-2018-09-17_17:40:50-rados-wip-sage4-testing-2018-09-17-0823-distro-basic-smithi/3034156

rados/multimon/{clusters/21.yaml mon_kv_backend/leveldb.yaml msgr-failures/few.yaml msgr/async.yaml objectstore/filestore-xfs.yaml rados.yaml supported-random-distro$/{centos_latest.yaml} tasks/mon_recovery.yaml}

Actions #11

Updated by Neha Ojha over 5 years ago

/a/nojha-2018-10-16_22:54:08-rados-master-distro-basic-smithi/3149185/

Actions #12

Updated by Neha Ojha over 5 years ago

Seen in mimic: /a/yuriw-2018-10-24_20:03:55-rados-wip-yuri2-testing-2018-10-23-1513-mimic-distro-basic-smithi/3180310/

Actions #13

Updated by Victor Denisov over 5 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Victor Denisov
Actions #14

Updated by Patrick Donnelly over 5 years ago

The bug that keeps on giving: /ceph/teuthology-archive/pdonnell-2018-12-13_01:15:37-fs-wip-pdonnell-testing-20181212.231100-distro-basic-smithi/3334595/teuthology.log

Actions #15

Updated by Sage Weil over 5 years ago

  • Assignee changed from Victor Denisov to Sage Weil

seeing this (more) with the latest round of msg/async fixes.

Actions #16

Updated by Sage Weil over 5 years ago

  • Status changed from In Progress to Duplicate

marking this as a dup of #25208 ... reopen and unlink if this comes up again, though!

Actions #17

Updated by Ilya Dryomov over 5 years ago

Did you mean some other ticket? Josh closed #25208 as a Duplicate of this ticket 6 months ago.

Actions #18

Updated by Greg Farnum about 5 years ago

  • Project changed from RADOS to Messengers
  • Category deleted (Correctness/Safety)
Actions

Also available in: Atom PDF