Project

General

Profile

Actions

Bug #48022

closed

mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)

Added by Neha Ojha over 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-10-26T19:31:33.678 INFO:teuthology.orchestra.run.smithi120.stderr:2020-10-26T19:31:33.679+0000 7f5b56d9e700  1 -- 172.21.15.120:0/2411503374 shutdown_connections
2020-10-26T19:31:33.679 INFO:teuthology.orchestra.run.smithi120.stderr:2020-10-26T19:31:33.679+0000 7f5b56d9e700  1 -- 172.21.15.120:0/2411503374 wait complete.
2020-10-26T19:31:33.681 INFO:tasks.ceph.mgr.z.smithi105.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-6666-g5af54f2e/rpm/el8/BUILD/ceph-16.0.0-6666-g5af54f2e/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f1d96de1700 time 2020-10-26T19:31:33.682982+0000
2020-10-26T19:31:33.681 INFO:tasks.ceph.mgr.z.smithi105.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-6666-g5af54f2e/rpm/el8/BUILD/ceph-16.0.0-6666-g5af54f2e/src/mgr/DaemonServer.cc: 2816: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: ceph version 16.0.0-6666-g5af54f2e (5af54f2ebc54c8319e53a7e64fe7b4bdbbcd15bb) pacific (dev)
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f1da0acc274]
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x27148e) [0x7f1da0acc48e]
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 3: (DaemonServer::got_service_map()+0xb2d) [0x55c193bb051d]
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 4: (Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x14b) [0x55c193bdfffb]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 5: (Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x2c7) [0x55c193be2117]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 6: (MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xa5) [0x55c193beb2e5]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 7: (DispatchQueue::entry()+0x126a) [0x7f1da0cf077a]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f1da0da0291]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 9: /lib64/libpthread.so.0(+0x82de) [0x7f1d9ef592de]
2020-10-26T19:31:33.684 INFO:tasks.ceph.mgr.z.smithi105.stderr: 10: clone()

https://jenkins.ceph.com/blue/organizations/jenkins/ceph-api-nightly-octopus-backend/detail/ceph-api-nightly-octopus-backend/233/pipeline
/a/yuriw-2020-10-26_17:47:19-rados-wip-yuri-testing-2020-10-26-0817-distro-basic-smithi/5562236


Related issues 11 (1 open10 closed)

Related to mgr - Bug #51835: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)ResolvedMykola Golub

Actions
Related to mgr - Bug #54700: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch)New

Actions
Has duplicate mgr - Bug #49255: src/mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)Duplicate

Actions
Has duplicate mgr - Bug #49476: DaemonServer.cc: 2827: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)Duplicate

Actions
Has duplicate mgr - Bug #51916: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch)Duplicate

Actions
Has duplicate mgr - Bug #51922: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch)Duplicate

Actions
Has duplicate mgr - Bug #51926: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch)Duplicate

Actions
Has duplicate mgr - Bug #51929: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch)Duplicate

Actions
Has duplicate mgr - Bug #51913: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch)Duplicate

Actions
Copied to mgr - Backport #49908: pacific: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)ResolvedNeha OjhaActions
Copied to mgr - Backport #53198: octopus: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)ResolvedActions
Actions #1

Updated by Neha Ojha over 3 years ago

  • Has duplicate Bug #49255: src/mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
Actions #2

Updated by Neha Ojha over 3 years ago

  • Priority changed from Normal to High
Actions #3

Updated by Neha Ojha about 3 years ago

  • Has duplicate Bug #49476: DaemonServer.cc: 2827: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
Actions #4

Updated by Neha Ojha about 3 years ago

  • Priority changed from High to Urgent
nojha@reesi001:~$ sudo ceph crash info 2021-03-15T17:01:04.327050Z_07ce30e5-460a-4a7a-9eff-70902f36f327
{
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-736-g8191ac78/rpm/el8/BUILD/ceph-16.1.0-736-g8191ac78/src/mgr/DaemonServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2924,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-736-g8191ac78/rpm/el8/BUILD/ceph-16.1.0-736-g8191ac78/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f6f979a0700 time 2021-03-15T17:01:04.322742+0000\n/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-736-g8191ac78/rpm/el8/BUILD/ceph-16.1.0-736-g8191ac78/src/mgr/DaemonServer.cc: 2924: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7f6f9fdb2b20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f6fa11c738b]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x276554) [0x7f6fa11c7554]",
        "(DaemonServer::got_service_map()+0xb2d) [0x555f7b3c730d]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x14b) [0x555f7b3f5b7b]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x884) [0x555f7b3f8764]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xa5) [0x555f7b401d25]",
        "(DispatchQueue::entry()+0x126a) [0x7f6fa13ff52a]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f6fa14ada41]",
        "/lib64/libpthread.so.0(+0x814a) [0x7f6f9fda814a]",
        "clone()" 
    ],
    "ceph_version": "16.1.0-736-g8191ac78",
    "crash_id": "2021-03-15T17:01:04.327050Z_07ce30e5-460a-4a7a-9eff-70902f36f327",
    "entity_name": "mgr.reesi004.tplfrt",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "aa3ccd51d5334df9e1e6472bcb9c51691650124acbe5d28d1759e0f9c13c079e",
    "timestamp": "2021-03-15T17:01:04.327050Z",
    "utsname_hostname": "reesi004",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-66-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#74~18.04.2-Ubuntu SMP Fri Feb 5 11:17:31 UTC 2021" 
}
Actions #6

Updated by Mykola Golub about 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Mykola Golub
Actions #7

Updated by Mykola Golub about 3 years ago

Neha, do you know any new enough case on the teuthology so I could look at the mgr and mon logs?

Actions #8

Updated by Neha Ojha about 3 years ago

Mykola Golub wrote:

Neha, do you know any new enough case on the teuthology so I could look at the mgr and mon logs?

https://tracker.ceph.com/issues/49255 is the most recent failure from teuthology I have seen, but it does not have logs because the job died. I have not been able to find logs for any of these yet.

Actions #9

Updated by Sage Weil about 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Assignee deleted (Mykola Golub)
  • Pull request ID set to 40219
Actions #10

Updated by Sage Weil about 3 years ago

  • Backport set to pacific
Actions #11

Updated by Sage Weil about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #12

Updated by Backport Bot about 3 years ago

  • Copied to Backport #49908: pacific: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
Actions #13

Updated by Loïc Dachary about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #14

Updated by Wout van Heeswijk about 3 years ago

We are experiencing the same crash reports on Octopus 15.2.8. I've not found a backport issue for this issue to Octopus. Can and should this also be backported to Octopus?

Our crash report:

{
    "archived": "2021-04-22 07:53:13.871413",
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/mgr/Daem
onServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2796,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/mgr/Daemo
nServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7fe68dfc8700 time 2021-04-22T08:12:49.479400+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABL
E_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/mgr/DaemonServer.cc: 2796: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n" 
,
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "(()+0x12b20) [0x7fe69d76bb20]",
        "(gsignal()+0x10f) [0x7fe69c1bc7ff]",
        "(abort()+0x127) [0x7fe69c1a6c35]",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7fe69f2ca735]",
        "(()+0x27a8fe) [0x7fe69f2ca8fe]",
        "(DaemonServer::got_service_map()+0x9b5) [0x560a272d3d95]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x14b) [0x560a2730cb6b]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x2c7) [0x560a2730ebf7]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xb0) [0x560a27317ba0]",
        "(DispatchQueue::entry()+0x126a) [0x7fe69f4e909a]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7fe69f58b8a1]",
        "(()+0x814a) [0x7fe69d76114a]",
        "(clone()+0x43) [0x7fe69c281f23]" 
    ],
    "ceph_version": "15.2.8",
    "crash_id": "2021-04-22T06:12:49.482382Z_c23cd42f-1c72-4598-a909-6519f6fcb842",
    "entity_name": "mgr.alpha",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "86ab3a48e25206c884ac52034d1562c68826296d4fcbe7ceff0b9d8b9b4a56a1",
    "timestamp": "2021-04-22T06:12:49.482382Z",
    "utsname_hostname": "alpha",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-240.10.1.el8_3.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Mon Jan 18 17:05:51 UTC 2021" 
}
Actions #15

Updated by Yongseok Oh almost 3 years ago

{
"os_version_id": "7",
"assert_condition": "pending_service_map.epoch > service_map.epoch",
"utsname_release": "3.10.0-1062.18.1.el7.x86_64",
"os_name": "CentOS Linux",
"entity_name": "mgr.LNVSFSMDS1502",
"assert_file": "/builddir/build/BUILD/ceph-14.2.16/src/mgr/DaemonServer.cc",
"timestamp": "2021-06-05 06:07:23.855365Z",
"process_name": "ceph-mgr",
"utsname_machine": "x86_64",
"assert_line": 2795,
"utsname_sysname": "Linux",
"os_version": "7 (Core)",
"os_id": "centos",
"assert_thread_name": "ms_dispatch",
"utsname_version": "#1 SMP Tue Mar 17 23:49:17 UTC 2020",
"backtrace": [
"(()+0xc4c7aa) [0x55df1e00e7aa]",
"(()+0xf5f0) [0x7f6435c3e5f0]",
"(gsignal()+0x37) [0x7f643481c337]",
"(abort()+0x148) [0x7f643481da28]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x379) [0x7f6439394709]",
"(()+0x12a17bc) [0x7f64393947bc]",
"(()+0x9a179c) [0x55df1dd6379c]",
"(()+0x9a470a) [0x55df1dd6670a]",
"(DaemonServer::got_service_map()+0x6a) [0x55df1dd63888]",
"(Mgr::handle_service_map(MServiceMap*)+0x140) [0x55df1ddb83fe]",
"(Mgr::ms_dispatch(Message*)+0x3e8) [0x55df1ddb8aa8]",
"(MgrStandby::ms_dispatch(Message*)+0x20a) [0x55df1ddcfdf2]",
"(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x55df1dd6d090]",
"(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xe9) [0x7f643955ff99]",
"(DispatchQueue::entry()+0x5ec) [0x7f643955eb3a]",
"(DispatchQueue::DispatchThread::entry()+0x1c) [0x7f64396c9366]",
"(Thread::entry_wrapper()+0x78) [0x7f6439337228]",
"(Thread::_entry_func(void*)+0x18) [0x7f64393371a6]",
"(()+0x7e65) [0x7f6435c36e65]",
"(clone()+0x6d) [0x7f64348e488d]"
],
"utsname_hostname": "MDS002",
"assert_msg": "/builddir/build/BUILD/ceph-14.2.16/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f642d0e7700 time 2021-06-05 15:07:23.828349\n/builddir/build/BUILD/ceph-14.2.16/src/mgr/DaemonServer.cc: 2795: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
"crash_id": "2021-06-05_06:07:23.855365Z_badf1c8e-7978-4fb1-92a2-82b2137dbde0",
"assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
"ceph_version": "14.2.16"
}

Actions #16

Updated by Yongseok Oh almost 3 years ago

Can I edit or remove the above message written by me?

Actions #17

Updated by Neha Ojha almost 3 years ago

  • Backport changed from pacific to pacific, octopus
Actions #18

Updated by Neha Ojha almost 3 years ago

  • Related to Bug #51835: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
Actions #19

Updated by Telemetry Bot almost 3 years ago

  • Related to Bug #51913: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #20

Updated by Sage Weil almost 3 years ago

  • Has duplicate Bug #51916: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #21

Updated by Sage Weil almost 3 years ago

  • Has duplicate Bug #51922: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #22

Updated by Sage Weil almost 3 years ago

  • Has duplicate Bug #51926: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #23

Updated by Sage Weil almost 3 years ago

  • Has duplicate Bug #51929: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #24

Updated by Sage Weil almost 3 years ago

  • Related to deleted (Bug #51913: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch))
Actions #25

Updated by Sage Weil almost 3 years ago

  • Has duplicate Bug #51913: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions #26

Updated by Sage Weil almost 3 years ago

  • Has duplicate Bug #51924: crash: Client::resolve_mds(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mds_gid_t, std::allocator<mds_gid_t> >*) added
Actions #27

Updated by Sage Weil almost 3 years ago

  • Has duplicate deleted (Bug #51924: crash: Client::resolve_mds(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mds_gid_t, std::allocator<mds_gid_t> >*))
Actions #29

Updated by David Galloway over 2 years ago

  • Status changed from Resolved to Pending Backport
Actions #30

Updated by Backport Bot over 2 years ago

  • Copied to Backport #53198: octopus: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
Actions #31

Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #32

Updated by Telemetry Bot about 2 years ago

  • Related to Bug #54700: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
Actions

Also available in: Atom PDF