Project

General

Profile

Actions

Bug #52392

open

crashes rebooting colocated mds/mgr with mon at boot

Added by Dan van der Ster over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I just updated a box in a test cluster to 15.2.14, then rebooted the box.
This box has MON/MGR/MDS/OSD all co-located.

The mgr and mds both crashed similarly while establishing a mon client, but they restarted correctly the second attempt.
Here are the crash infos -- for some reason I cannot post them (:

# ceph crash ls
ID                                                                ENTITY             NEW
2021-08-24T12:39:17.798297Z_a95ebe7d-8043-4e23-b647-2a68ba0f5b8c  mds.cephoctopus-2   *
2021-08-24T12:39:18.569890Z_3dae9cef-32b3-4eca-8630-22d8f8c92aca  mgr.cephoctopus-2   *

# ceph crash post
malformed crash metadata: Expecting value: line 1 column 1 (char 0)

# ceph crash info 2021-08-24T12:39:17.798297Z_a95ebe7d-8043-4e23-b647-2a68ba0f5b8c
{
    "backtrace": [
        "(()+0x12b20) [0x7fa222114b20]",
        "(MonClient::_add_conn(unsigned int)+0x1e1) [0x7fa2236541f1]",
        "(MonClient::_add_conns()+0x2c9) [0x7fa223654969]",
        "(MonClient::_reopen_session(int)+0x2b0) [0x7fa22365af70]",
        "(MonClient::authenticate(double)+0x265) [0x7fa22365cbe5]",
        "(MDSDaemon::init()+0x11d) [0x5566d9c6dc0d]",
        "(main()+0xf86) [0x5566d9c5e7d6]",
        "(__libc_start_main()+0xf3) [0x7fa220b4a493]",
        "(_start()+0x2e) [0x5566d9c645ae]" 
    ],
    "ceph_version": "15.2.14",
    "crash_id": "2021-08-24T12:39:17.798297Z_a95ebe7d-8043-4e23-b647-2a68ba0f5b8c",
    "entity_name": "mds.cephoctopus-2",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mds",
    "stack_sig": "f4399fc7e37a98751c21745d38e813191cf2c681fbf1a72ca37061c80aa2e013",
    "timestamp": "2021-08-24T12:39:17.798297Z",
    "utsname_hostname": "cephoctopus-2.cern.ch",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-193.14.2.el8_2.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Sun Jul 26 03:54:29 UTC 2020" 
}
# ceph crash info 2021-08-24T12:39:18.569890Z_3dae9cef-32b3-4eca-8630-22d8f8c92aca
{
    "backtrace": [
        "(()+0x12b20) [0x7f75e23e8b20]",
        "(std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph_mon_subscribe_item>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph_mon_subscribe_item> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph_mon_subscribe_item> > >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3c) [0x7f75e409066c]",
        "(MonSub::want(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned int)+0x44) [0x7f75e40901d4]",
        "(MgrStandby::init()+0x175) [0x55d0fab6a6f5]",
        "(main()+0x1b3) [0x55d0faa636e3]",
        "(__libc_start_main()+0xf3) [0x7f75e0e1e493]",
        "(_start()+0x2e) [0x55d0faa6865e]" 
    ],
    "ceph_version": "15.2.14",
    "crash_id": "2021-08-24T12:39:18.569890Z_3dae9cef-32b3-4eca-8630-22d8f8c92aca",
    "entity_name": "mgr.cephoctopus-2",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "bcec6d90a6621b7f045ab716abb2d49f1cc01bb5cf2cce3fed450b6f8a94dbc6",
    "timestamp": "2021-08-24T12:39:18.569890Z",
    "utsname_hostname": "cephoctopus-2.cern.ch",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-193.14.2.el8_2.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Sun Jul 26 03:54:29 UTC 2020" 
}

I uploaded the mds and mgr log files:
ceph-post-file: 57b862ed-5af4-40d7-87e2-40a3ec0e236c
ceph-post-file: 5055a7f2-e754-4d43-84dc-68d44ad3e080


Related issues 1 (1 open0 closed)

Related to CephFS - Bug #51756: crash: std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)Need More Info

Actions
Actions #1

Updated by Dan van der Ster over 2 years ago

mgr crash above is a dup of #51756

Actions #2

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #51756: crash: std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) added
Actions

Also available in: Atom PDF