Project

General

Profile

Actions

Bug #54529

closed

mon/mon-bind.sh: Failure due to cores found

Added by Laura Flores about 2 years ago. Updated about 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Looks like this failed due to external connection issues, but I'll log it for documentation.

/a/teuthology-2022-01-09_07:01:02-rados-master-distro-default-smithi/6604561
/a/yuriw-2022-03-10_02:41:10-rados-wip-yuri3-testing-2022-03-09-1350-distro-default-smithi/6729296$

2022-01-10T19:57:37.509 INFO:tasks.workunit.client.0.smithi038.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mon-bind.sh:105: TEST_mon_quorum:  ceph quorum_status --format=json
2022-01-10T19:57:37.632 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 19:57:37 socat[40133] E write(6, 0x55c37d3266c0, 153): Broken pipe
2022-01-10T19:57:48.484 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 19:57:48 socat[40189] E connect(5, AF=2 127.0.0.1:7136, 16): Connection refused

...

2022-01-10T20:47:03.286 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:03 socat[42705] E connect(5, AF=2 127.0.0.1:7137, 16): Connection refused
2022-01-10T20:47:03.287 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:03 socat[42706] E connect(5, AF=2 127.0.0.1:7136, 16): Connection refused
2022-01-10T20:47:05.351 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:05 socat[42707] E connect(5, AF=2 127.0.0.1:7137, 16): Connection refused
2022-01-10T20:47:06.683 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:06 socat[42708] E connect(5, AF=2 127.0.0.1:7136, 16): Connection refused
2022-01-10T20:47:18.300 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:18 socat[42717] E connect(5, AF=2 127.0.0.1:7137, 16): Connection refused
2022-01-10T20:47:18.302 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:18 socat[42718] E connect(5, AF=2 127.0.0.1:7137, 16): Connection refused
2022-01-10T20:47:18.303 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:18 socat[42719] E connect(5, AF=2 127.0.0.1:7136, 16): Connection refused
2022-01-10T20:47:20.367 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:20 socat[42720] E connect(5, AF=2 127.0.0.1:7137, 16): Connection refused
2022-01-10T20:47:21.699 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:21 socat[42721] E connect(5, AF=2 127.0.0.1:7136, 16): Connection refused
2022-01-10T20:47:33.313 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:33 socat[42722] E connect(5, AF=2 127.0.0.1:7137, 16): Connection refused
2022-01-10T20:47:33.318 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:33 socat[42723] E connect(5, AF=2 127.0.0.1:7137, 16): Connection refused
2022-01-10T20:47:33.319 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:33 socat[42724] E connect(5, AF=2 127.0.0.1:7136, 16): Connection refused
2022-01-10T20:47:35.383 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:35 socat[42725] E connect(5, AF=2 127.0.0.1:7137, 16): Connection refused
2022-01-10T20:47:36.715 INFO:tasks.workunit.client.0.smithi038.stderr:2022/01/10 20:47:36 socat[42726] E connect(5, AF=2 127.0.0.1:7136, 16): Connection refused

...

2022-01-10T20:47:37.606 INFO:tasks.workunit.client.0.smithi038.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mon-bind.sh:105: TEST_mon_quorum:  jqinput=
2022-01-10T20:47:37.607 INFO:tasks.workunit.client.0.smithi038.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mon-bind.sh:106: TEST_mon_quorum:  jq_success '' '.monmap.mons | length == 3'

...

2022-01-10T20:47:37.733 INFO:tasks.workunit.client.0.smithi038.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:199: teardown:  rm -rf /tmp/ceph-asok.39613
2022-01-10T20:47:37.734 INFO:tasks.workunit.client.0.smithi038.stdout:ERROR: Failure due to cores found
2022-01-10T20:47:37.735 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-01-10T20:47:37.736 INFO:tasks.workunit.client.0.smithi038.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:200: teardown:  '[' yes = yes ']'
2022-01-10T20:47:37.736 INFO:tasks.workunit.client.0.smithi038.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:201: teardown:  echo 'ERROR: Failure due to cores found'
2022-01-10T20:47:37.737 INFO:tasks.workunit.client.0.smithi038.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:202: teardown:  '[' -n '' ']'
2022-01-10T20:47:37.737 INFO:tasks.workunit.client.0.smithi038.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:205: teardown:  return 1
2022-01-10T20:47:37.737 INFO:tasks.workunit.client.0.smithi038.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:2263: main:  return 1
2022-01-10T20:47:37.738 INFO:tasks.workunit:Stopping ['mon'] on client.0...


Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the clusterResolvedKamoltat (Junior) Sirivadhna

Actions
Actions #1

Updated by Neha Ojha about 2 years ago

"Failure due to cores found" means that there is a coredump, and indeed there is a crash. Did we merge something recently or could this be in a testing batch?

2022-03-10T08:45:17.921+0000 7ff1dcbe9700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11051-gb5ca1477/rpm/el8/BUILD/ceph-17.0.0-11051-gb5ca1477/src/mon/MonMap.h: In function 'const entity_addrvec_t& MonMap::get_addrs(unsigned int) const' thread 7ff1dcbe9700 time 2022-03-10T08:45:17.921667+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11051-gb5ca1477/rpm/el8/BUILD/ceph-17.0.0-11051-gb5ca1477/src/mon/MonMap.h: 404: FAILED ceph_assert(m < ranks.size())

 ceph version 17.0.0-11051-gb5ca1477 (b5ca14771f888fa46234c86404e0aee4913031fe) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7ff1e8ea1204]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x284425) [0x7ff1e8ea1425]
 3: (Elector::send_peer_ping(int, utime_t const*)+0x440) [0x561824c71cc0]
 4: (Elector::begin_peer_ping(int)+0x1eb) [0x561824c71eeb]
 5: (Elector::handle_ping(boost::intrusive_ptr<MonOpRequest>)+0x281) [0x561824c72eb1]
 6: (Elector::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xc5) [0x561824c73a95]
 7: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xc1d) [0x561824bd393d]
 8: (Monitor::_ms_dispatch(Message*)+0x457) [0x561824bd4ad7]
 9: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x561824c0505c]
 10: (DispatchQueue::entry()+0x14fa) [0x7ff1e912873a]
 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff1e91dfa61]
 12: /lib64/libpthread.so.0(+0x817f) [0x7ff1e6e2917f]
 13: clone()
Actions #2

Updated by Laura Flores about 2 years ago

Happened in quincy:
/a/yuriw-2022-03-19_14:39:53-rados-quincy-distro-default-smithi/6747175

The first occurrence happened back on January 10th according to the Sentry history: https://sentry.ceph.com/organizations/ceph/issues/14198/events/85ed10d93c5a4379a278c4b1ab988dfc/events/?project=2. I will take a look and see if there were any relevant merges around that time.

Actions #3

Updated by Laura Flores about 2 years ago

  • Backport set to quincy
Actions #4

Updated by Laura Flores about 2 years ago

Expected output from a successful run:

/a/yuriw-2022-03-18_00:42:20-rados-wip-yuri6-testing-2022-03-17-1547-distro-default-smithi/6743533

2022-03-18T02:51:06.225 INFO:tasks.workunit.client.0.smithi049.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mon-bind.sh:105: TEST_mon_quorum:  ceph quorum_status --format=json
2022-03-18T02:51:09.756 INFO:tasks.workunit.client.0.smithi049.stderr:2022/03/18 02:51:09 socat[41600] E write(6, 0x561c37c116c0, 68): Broken pipe
2022-03-18T02:51:09.983 INFO:tasks.workunit.client.0.smithi049.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/mon/mon-bind.sh:105: TEST_mon_quorum:  jqinput='
2022-03-18T02:51:09.983 INFO:tasks.workunit.client.0.smithi049.stderr:{"election_epoch":4,"quorum":[0,1,2],"quorum_names":["a","b","c"],"quorum_leader_name":"a","quorum_age":1,"features":{"quorum_con":"4540138303579357183","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging","quincy"]},"monmap":{"epoch":1,"fsid":"c8f27b4a-0323-4ffc-8154-da3f6a79cb0d","modified":"2022-03-18T02:51:03.949146Z","created":"2022-03-18T02:51:03.949146Z","min_mon_release":17,"min_mon_release_name":"quincy","election_strategy":1,"disallowed_leaders: ":"","stretch_mode":false,"tiebreaker_mon":"","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific","elector-pinging","quincy"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"127.0.0.1:7132","nonce":0}]},"addr":"127.0.0.1:7132/0","public_addr":"127.0.0.1:7132/0","priority":0,"weight":0,"crush_location":"{}"},{"rank":1,"name":"b","public_addrs":{"addrvec":[{"type":"v2","addr":"127.0.0.1:7133","nonce":0}]},"addr":"127.0.0.1:7133/0","public_addr":"127.0.0.1:7133/0","priority":0,"weight":0,"crush_location":"{}"},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v2","addr":"127.0.0.1:7134","nonce":0}]},"addr":"127.0.0.1:7134/0","public_addr":"127.0.0.1:7134/0","priority":0,"weight":0,"crush_location":"{}"}]}}'

Actions #5

Updated by Laura Flores about 2 years ago

  • Is duplicate of Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the cluster added
Actions #6

Updated by Laura Flores about 2 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF