Project

General

Profile

Actions

Bug #50089

closed

mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the cluster

Added by Neha Ojha about 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

1bfa48148eee52e245e1d06fc24c58f9ce7afcb91c369a99f37e45a25aa52f83
3f975382699c40c8ac1ac12dba2e974a050365cf6f4cdb7efa680f93e6c14d49
3fe4f79db30422625bc0d5d967e6570000d166f2a621a96fc832dc27fedc31bf
4bd6d829bdd117a5f4c7f03eb85e9b6e889d009090af74421f1a21ac9bab4be6
541337866aef4900d3c4ab536694b5efa3c48a1422b9fdc2c35fbcd441614b4a
58ae1b1868b4566ed94ce7798ff840e0e611b4646bcddb014639cedfc6a7901f
5ad55dd4483662974892618f86c3484c74c939979ab6f781bdc165c297983a0f
614b25a1a3fff2ae344523df3d7f2d377ad653ea2f3cd14bc73a11f65551dd5c
6defcf68dc501e6ad721f1bb9154bb98d1b519cbfe8b1718a1497aeeae5a4517
7bb10076aaa32ffda8244ebce0ef12ba522af1d5162605c6225ce32b2b53d815
809365b772c688f5bcf09a9bddada817ea66f5a8ad30a12ce43068e74e56a0c9
96b49c839d59492286f04a76ececd021835a660aabcfedc92ead1b3b31aa9978
d8860eca1bbb09f5149f02114b62a0dcdfa0a65a399c9a628a3fa3f190518025
d92e036dce71f761a510d23ba1d3b7a857fc9c9ea01f60a363a91616dd74f28f
e932678c4790d707352613c005dc6074c072924c65db73dba35ad06ed159e3d7
e9e13cf41d815dd96f1d1014f9c144fa8e74c842164e8ed8d4fd4c268491ce16
f3fc8fc7e2bdbb7d14f1f6b000ef63b360ff153d1e8e73c9410b953559e71249
fe1851c46283d7dee4fed131b4bdac681635f617e9027d115f3ea0c1953550bf
13c496a12d26e28acfc1dc4160ed38576c2cc3e861548aaa98339fb85a031a40
4c54fe2531a5e37e16f9c4db836d98177671863fdf9b32160e49226edc2526b3
505059af5eee43a87e362af522e1b2d59d4b50af74fcac678a3de70c7caad121
f039aa5eece36634a4a9b4d9d6aff95374aa2950e775b18d7c609a2b2aa98e4a
227a4ff681489d6f6f93a5c516587508339956e8339b004849647dccedec5d71
505bc4de5eb8aec6e7f6b83c3d30f7b964c030ba9ca296c3b0f2543476258d8d
5ed46198af542faafdabb96d8b4189d853d082495671bce1412b4d54e0b347a2


Description

    -2> 2021-03-31T14:28:43.137+0000 7f348c4f3700  5 mon.pluto002@0(electing).elector(23)  so far i have { mon.0: features 4540138297136906239 mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus,octopus,pacific,elector-pinging]), mon.2: features 4540138297136906239 mon_feature_t([kraken,luminous,mimic,osdmap-prune,nautilus,octopus,pacific,elector-pinging]) }
    -1> 2021-03-31T14:28:43.420+0000 7f348ecf8700 -1 /builddir/build/BUILD/ceph-16.1.0-1323-g7e7e1f4e/src/mon/MonMap.h: In function 'const entity_addrvec_t& MonMap::get_addrs(unsigned int) const' thread 7f348ecf8700 time 2021-03-31T14:28:43.421216+0000
/builddir/build/BUILD/ceph-16.1.0-1323-g7e7e1f4e/src/mon/MonMap.h: 404: FAILED ceph_assert(m < ranks.size())
 ceph version 16.1.0-1323.el8cp (46ac37397f0332c20aceceb8022a1ac1ddf8fa73) pacific (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f349a0693b8]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x2765d2) [0x7f349a0695d2]
 3: (Elector::send_peer_ping(int, utime_t const*)+0x448) [0x55a4b92a5868]
 4: (Elector::ping_check(int)+0x30f) [0x55a4b92a618f]
 5: (Context::complete(int)+0xd) [0x55a4b9226fdd]
 6: (SafeTimer::timer_thread()+0x1b7) [0x7f349a157be7]
 7: (SafeTimerThread::entry()+0x11) [0x7f349a1591c1]
 8: /lib64/libpthread.so.0(+0x815a) [0x7f3497b5d15a]
 9: clone()

Steps to reproduce: reduce number of monitors from 5 to 3
Workaround: turn the crashed monitor back on (since the crash is a transient error)
Source: https://bugzilla.redhat.com/show_bug.cgi?id=1945266


Related issues 9 (2 open7 closed)

Related to RADOS - Bug #50088: rados: qa: suites do not test mon removalNew

Actions
Related to RADOS - Bug #55695: Shutting down a monitor forces Paxos to restart and sometimes disregard subsequent commandsFix Under ReviewKamoltat (Junior) Sirivadhna

Actions
Related to RADOS - Bug #58155: mon:ceph_assert(m < ranks.size()) `different code path than tracker 50089`ResolvedKamoltat (Junior) Sirivadhna

Actions
Has duplicate RADOS - Bug #52183: crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: assert(m < ranks.size())Duplicate

Actions
Has duplicate RADOS - Bug #52170: crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: assert(m < ranks.size())Duplicate

Actions
Has duplicate RADOS - Bug #54529: mon/mon-bind.sh: Failure due to cores foundDuplicate

Actions
Has duplicate RADOS - Bug #59757: crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: assert(m < ranks.size())DuplicateKamoltat (Junior) Sirivadhna

Actions
Copied to RADOS - Backport #57704: quincy: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the clusterResolvedKamoltat (Junior) SirivadhnaActions
Copied to RADOS - Backport #57705: pacific: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the clusterResolvedKamoltat (Junior) SirivadhnaActions
Actions

Also available in: Atom PDF