Project

General

Profile

Actions

Bug #59757

closed

crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: assert(m < ranks.size())

Added by Telemetry Bot 12 months ago. Updated 11 months ago.

Status:
Duplicate
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Telemetry
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

13c496a12d26e28acfc1dc4160ed38576c2cc3e861548aaa98339fb85a031a40
17fbc0171738851ffe6d7a79047b0d90efcfbf77382e8c1571c693ad57ef280e
227a4ff681489d6f6f93a5c516587508339956e8339b004849647dccedec5d71
2ea76e61543c5041a6f4d2a98425f1bb8a4488c69b2c385669f9fabdaf99e92c
3f975382699c40c8ac1ac12dba2e974a050365cf6f4cdb7efa680f93e6c14d49
499c8fad2548418212ddebba9fa150c83e953006f179ba93bb94066b33c48fb4
4bd6d829bdd117a5f4c7f03eb85e9b6e889d009090af74421f1a21ac9bab4be6
4c54fe2531a5e37e16f9c4db836d98177671863fdf9b32160e49226edc2526b3
505059af5eee43a87e362af522e1b2d59d4b50af74fcac678a3de70c7caad121
505bc4de5eb8aec6e7f6b83c3d30f7b964c030ba9ca296c3b0f2543476258d8d
537d13c2f601776e10e85fd07f3c6c1ae227ed69e1645538ca1a142ac4013606
58ae1b1868b4566ed94ce7798ff840e0e611b4646bcddb014639cedfc6a7901f
5ad55dd4483662974892618f86c3484c74c939979ab6f781bdc165c297983a0f
5ed46198af542faafdabb96d8b4189d853d082495671bce1412b4d54e0b347a2
6defcf68dc501e6ad721f1bb9154bb98d1b519cbfe8b1718a1497aeeae5a4517
809365b772c688f5bcf09a9bddada817ea66f5a8ad30a12ce43068e74e56a0c9
abf16b09ee44a60695c505922e57e32eeb683575874b089739ba92c28fd94c02
e932678c4790d707352613c005dc6074c072924c65db73dba35ad06ed159e3d7
e9e13cf41d815dd96f1d1014f9c144fa8e74c842164e8ed8d4fd4c268491ce16
f00b726c627095cfe369044b967b35bcf64aac33599970ad845c4661966939ad
f039aa5eece36634a4a9b4d9d6aff95374aa2950e775b18d7c609a2b2aa98e4a
f3fc8fc7e2bdbb7d14f1f6b000ef63b360ff153d1e8e73c9410b953559e71249


Description

New crash events were reported via Telemetry with newer versions (['17.2.1', '17.2.3', '17.2.4', '17.2.5']) than encountered in Tracker (17.2.0).

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=43c985d46c6cc0d5c60823bc8e8c2808af6be3872e3e6eb54d13c011253c1885

Assert condition: m < ranks.size()
Assert function: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const

Sanitized backtrace:

    Elector::send_peer_ping(int, utime_t const*)
    Elector::begin_peer_ping(int)
    Elector::handle_ping(boost::intrusive_ptr<MonOpRequest>)
    Elector::dispatch(boost::intrusive_ptr<MonOpRequest>)
    Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)
    Monitor::_ms_dispatch(Message*)
    Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)
    DispatchQueue::entry()
    DispatchQueue::DispatchThread::entry()

Crash dump sample:
{
    "assert_condition": "m < ranks.size()",
    "assert_file": "mon/MonMap.h",
    "assert_func": "const entity_addrvec_t& MonMap::get_addrs(unsigned int) const",
    "assert_line": 404,
    "assert_msg": "mon/MonMap.h: In function 'const entity_addrvec_t& MonMap::get_addrs(unsigned int) const' thread 7fbe0bb4e700 time 2023-05-09T21:53:57.748358+0000\nmon/MonMap.h: 404: FAILED ceph_assert(m < ranks.size())",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7fbe15d60cf0]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7fbe17db5499]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x269605) [0x7fbe17db5605]",
        "(Elector::send_peer_ping(int, utime_t const*)+0x46e) [0x560d4ac4063e]",
        "(Elector::begin_peer_ping(int)+0x1ce) [0x560d4ac41efe]",
        "(Elector::handle_ping(boost::intrusive_ptr<MonOpRequest>)+0xb5) [0x560d4ac423d5]",
        "(Elector::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xb8) [0x560d4ac44948]",
        "(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xebe) [0x560d4ab88bae]",
        "(Monitor::_ms_dispatch(Message*)+0x406) [0x560d4ab895e6]",
        "(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5d) [0x560d4abb9dad]",
        "(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x478) [0x7fbe1802fc88]",
        "(DispatchQueue::entry()+0x50f) [0x7fbe1802d0cf]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7fbe180f48f1]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7fbe15d561ca]",
        "clone()" 
    ],
    "ceph_version": "17.2.5",
    "crash_id": "2023-05-09T21:53:57.760563Z_ccca3bd2-cfae-4d33-82c1-a93823f47bdc",
    "entity_name": "mon.089f5890280faeef78cb08c5cf2a46d3baa0eca4",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mon",
    "stack_sig": "499c8fad2548418212ddebba9fa150c83e953006f179ba93bb94066b33c48fb4",
    "timestamp": "2023-05-09T21:53:57.760563Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.3.18-150300.59.106-default",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Mon Dec 12 13:16:24 UTC 2022 (774239c)" 
}


Related issues 4 (0 open4 closed)

Related to RADOS - Bug #52170: crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: assert(m < ranks.size())Duplicate

Actions
Related to RADOS - Backport #57704: quincy: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the clusterResolvedKamoltat (Junior) SirivadhnaActions
Related to RADOS - Backport #57705: pacific: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the clusterResolvedKamoltat (Junior) SirivadhnaActions
Is duplicate of RADOS - Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the clusterResolvedKamoltat (Junior) Sirivadhna

Actions
Actions #1

Updated by Telemetry Bot 12 months ago

  • Related to Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the cluster added
Actions #2

Updated by Telemetry Bot 12 months ago

  • Related to Bug #52170: crash: const entity_addrvec_t& MonMap::get_addrs(unsigned int) const: assert(m < ranks.size()) added
Actions #3

Updated by Telemetry Bot 12 months ago

  • Related to Backport #57704: quincy: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the cluster added
Actions #4

Updated by Telemetry Bot 12 months ago

  • Related to Backport #57705: pacific: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the cluster added
Actions #5

Updated by Telemetry Bot 12 months ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.0.0, v16.2.0, v16.2.1, v16.2.10, v16.2.4, v16.2.5, v16.2.6, v16.2.7, v16.2.9, v17.2.0, v17.2.1, v17.2.3, v17.2.4, v17.2.5 added
Actions #6

Updated by Kamoltat (Junior) Sirivadhna 11 months ago

  • Status changed from New to Resolved
  • Assignee set to Kamoltat (Junior) Sirivadhna
  • Crash signature (v1) updated (diff)

By looking at http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=43c985d46c6cc0d5c60823bc8e8c2808af6be3872e3e6eb54d13c011253c1885
we can see that the highest version for quincy is 17.2.5, for pacific it's v16.2.10. The PR that should initially fix the issue was first introduced after the release of 17.2.5 and 16.2.10.

quincy PR: https://github.com/ceph/ceph/pull/48321 was merged on Oct 21, 2022 while 17.2.5 was released on Oct 17, 2022.
pacific PR: https://github.com/ceph/ceph/pull/48320 was merged on Oct 14, 2022 while 16.2.10 was released on Jul 21, 2022.

In conclusion, the reports of this issue are due to the fix not being in the version that these users are using.

Marking it as resolved

Actions #7

Updated by Yaarit Hatuka 11 months ago

  • Related to deleted (Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the cluster)
Actions #8

Updated by Yaarit Hatuka 11 months ago

  • Is duplicate of Bug #50089: mon/MonMap.h: FAILED ceph_assert(m < ranks.size()) when reducing number of monitors in the cluster added
Actions #9

Updated by Yaarit Hatuka 11 months ago

  • Status changed from Resolved to Duplicate

Changing the status to Duplicate, since it's an extension of the original issue, which is Resolved (https://tracker.ceph.com/issues/50089).

Actions

Also available in: Atom PDF