Bug #48022
mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)
0%
Description
2020-10-26T19:31:33.678 INFO:teuthology.orchestra.run.smithi120.stderr:2020-10-26T19:31:33.679+0000 7f5b56d9e700 1 -- 172.21.15.120:0/2411503374 shutdown_connections 2020-10-26T19:31:33.679 INFO:teuthology.orchestra.run.smithi120.stderr:2020-10-26T19:31:33.679+0000 7f5b56d9e700 1 -- 172.21.15.120:0/2411503374 wait complete. 2020-10-26T19:31:33.681 INFO:tasks.ceph.mgr.z.smithi105.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-6666-g5af54f2e/rpm/el8/BUILD/ceph-16.0.0-6666-g5af54f2e/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f1d96de1700 time 2020-10-26T19:31:33.682982+0000 2020-10-26T19:31:33.681 INFO:tasks.ceph.mgr.z.smithi105.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-6666-g5af54f2e/rpm/el8/BUILD/ceph-16.0.0-6666-g5af54f2e/src/mgr/DaemonServer.cc: 2816: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) 2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: ceph version 16.0.0-6666-g5af54f2e (5af54f2ebc54c8319e53a7e64fe7b4bdbbcd15bb) pacific (dev) 2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f1da0acc274] 2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x27148e) [0x7f1da0acc48e] 2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 3: (DaemonServer::got_service_map()+0xb2d) [0x55c193bb051d] 2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 4: (Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x14b) [0x55c193bdfffb] 2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 5: (Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x2c7) [0x55c193be2117] 2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 6: (MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xa5) [0x55c193beb2e5] 2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 7: (DispatchQueue::entry()+0x126a) [0x7f1da0cf077a] 2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f1da0da0291] 2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 9: /lib64/libpthread.so.0(+0x82de) [0x7f1d9ef592de] 2020-10-26T19:31:33.684 INFO:tasks.ceph.mgr.z.smithi105.stderr: 10: clone()
https://jenkins.ceph.com/blue/organizations/jenkins/ceph-api-nightly-octopus-backend/detail/ceph-api-nightly-octopus-backend/233/pipeline
/a/yuriw-2020-10-26_17:47:19-rados-wip-yuri-testing-2020-10-26-0817-distro-basic-smithi/5562236
Related issues
History
#1 Updated by Neha Ojha almost 3 years ago
- Duplicated by Bug #49255: src/mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
#2 Updated by Neha Ojha almost 3 years ago
- Priority changed from Normal to High
#3 Updated by Neha Ojha over 2 years ago
- Duplicated by Bug #49476: DaemonServer.cc: 2827: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
#4 Updated by Neha Ojha over 2 years ago
- Priority changed from High to Urgent
nojha@reesi001:~$ sudo ceph crash info 2021-03-15T17:01:04.327050Z_07ce30e5-460a-4a7a-9eff-70902f36f327 { "assert_condition": "pending_service_map.epoch > service_map.epoch", "assert_file": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-736-g8191ac78/rpm/el8/BUILD/ceph-16.1.0-736-g8191ac78/src/mgr/DaemonServer.cc", "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>", "assert_line": 2924, "assert_msg": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-736-g8191ac78/rpm/el8/BUILD/ceph-16.1.0-736-g8191ac78/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f6f979a0700 time 2021-03-15T17:01:04.322742+0000\n/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-736-g8191ac78/rpm/el8/BUILD/ceph-16.1.0-736-g8191ac78/src/mgr/DaemonServer.cc: 2924: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n", "assert_thread_name": "ms_dispatch", "backtrace": [ "/lib64/libpthread.so.0(+0x12b20) [0x7f6f9fdb2b20]", "gsignal()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f6fa11c738b]", "/usr/lib64/ceph/libceph-common.so.2(+0x276554) [0x7f6fa11c7554]", "(DaemonServer::got_service_map()+0xb2d) [0x555f7b3c730d]", "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x14b) [0x555f7b3f5b7b]", "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x884) [0x555f7b3f8764]", "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xa5) [0x555f7b401d25]", "(DispatchQueue::entry()+0x126a) [0x7f6fa13ff52a]", "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f6fa14ada41]", "/lib64/libpthread.so.0(+0x814a) [0x7f6f9fda814a]", "clone()" ], "ceph_version": "16.1.0-736-g8191ac78", "crash_id": "2021-03-15T17:01:04.327050Z_07ce30e5-460a-4a7a-9eff-70902f36f327", "entity_name": "mgr.reesi004.tplfrt", "os_id": "centos", "os_name": "CentOS Linux", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mgr", "stack_sig": "aa3ccd51d5334df9e1e6472bcb9c51691650124acbe5d28d1759e0f9c13c079e", "timestamp": "2021-03-15T17:01:04.327050Z", "utsname_hostname": "reesi004", "utsname_machine": "x86_64", "utsname_release": "5.4.0-66-generic", "utsname_sysname": "Linux", "utsname_version": "#74~18.04.2-Ubuntu SMP Fri Feb 5 11:17:31 UTC 2021" }
#5 Updated by Neha Ojha over 2 years ago
#6 Updated by Mykola Golub over 2 years ago
- Status changed from New to In Progress
- Assignee set to Mykola Golub
#7 Updated by Mykola Golub over 2 years ago
Neha, do you know any new enough case on the teuthology so I could look at the mgr and mon logs?
#8 Updated by Neha Ojha over 2 years ago
Mykola Golub wrote:
Neha, do you know any new enough case on the teuthology so I could look at the mgr and mon logs?
https://tracker.ceph.com/issues/49255 is the most recent failure from teuthology I have seen, but it does not have logs because the job died. I have not been able to find logs for any of these yet.
#9 Updated by Sage Weil over 2 years ago
- Status changed from In Progress to Fix Under Review
- Assignee deleted (
Mykola Golub) - Pull request ID set to 40219
#10 Updated by Sage Weil over 2 years ago
- Backport set to pacific
#11 Updated by Sage Weil over 2 years ago
- Status changed from Fix Under Review to Pending Backport
#12 Updated by Backport Bot over 2 years ago
- Copied to Backport #49908: pacific: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
#13 Updated by Loïc Dachary over 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
#14 Updated by Wout van Heeswijk over 2 years ago
We are experiencing the same crash reports on Octopus 15.2.8. I've not found a backport issue for this issue to Octopus. Can and should this also be backported to Octopus?
Our crash report:
{ "archived": "2021-04-22 07:53:13.871413", "assert_condition": "pending_service_map.epoch > service_map.epoch", "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/mgr/Daem onServer.cc", "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>", "assert_line": 2796, "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/mgr/Daemo nServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7fe68dfc8700 time 2021-04-22T08:12:49.479400+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABL E_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/mgr/DaemonServer.cc: 2796: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n" , "assert_thread_name": "ms_dispatch", "backtrace": [ "(()+0x12b20) [0x7fe69d76bb20]", "(gsignal()+0x10f) [0x7fe69c1bc7ff]", "(abort()+0x127) [0x7fe69c1a6c35]", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7fe69f2ca735]", "(()+0x27a8fe) [0x7fe69f2ca8fe]", "(DaemonServer::got_service_map()+0x9b5) [0x560a272d3d95]", "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x14b) [0x560a2730cb6b]", "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x2c7) [0x560a2730ebf7]", "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xb0) [0x560a27317ba0]", "(DispatchQueue::entry()+0x126a) [0x7fe69f4e909a]", "(DispatchQueue::DispatchThread::entry()+0x11) [0x7fe69f58b8a1]", "(()+0x814a) [0x7fe69d76114a]", "(clone()+0x43) [0x7fe69c281f23]" ], "ceph_version": "15.2.8", "crash_id": "2021-04-22T06:12:49.482382Z_c23cd42f-1c72-4598-a909-6519f6fcb842", "entity_name": "mgr.alpha", "os_id": "centos", "os_name": "CentOS Linux", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mgr", "stack_sig": "86ab3a48e25206c884ac52034d1562c68826296d4fcbe7ceff0b9d8b9b4a56a1", "timestamp": "2021-04-22T06:12:49.482382Z", "utsname_hostname": "alpha", "utsname_machine": "x86_64", "utsname_release": "4.18.0-240.10.1.el8_3.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Mon Jan 18 17:05:51 UTC 2021" }
#15 Updated by Yongseok Oh over 2 years ago
{
"os_version_id": "7",
"assert_condition": "pending_service_map.epoch > service_map.epoch",
"utsname_release": "3.10.0-1062.18.1.el7.x86_64",
"os_name": "CentOS Linux",
"entity_name": "mgr.LNVSFSMDS1502",
"assert_file": "/builddir/build/BUILD/ceph-14.2.16/src/mgr/DaemonServer.cc",
"timestamp": "2021-06-05 06:07:23.855365Z",
"process_name": "ceph-mgr",
"utsname_machine": "x86_64",
"assert_line": 2795,
"utsname_sysname": "Linux",
"os_version": "7 (Core)",
"os_id": "centos",
"assert_thread_name": "ms_dispatch",
"utsname_version": "#1 SMP Tue Mar 17 23:49:17 UTC 2020",
"backtrace": [
"(()+0xc4c7aa) [0x55df1e00e7aa]",
"(()+0xf5f0) [0x7f6435c3e5f0]",
"(gsignal()+0x37) [0x7f643481c337]",
"(abort()+0x148) [0x7f643481da28]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x379) [0x7f6439394709]",
"(()+0x12a17bc) [0x7f64393947bc]",
"(()+0x9a179c) [0x55df1dd6379c]",
"(()+0x9a470a) [0x55df1dd6670a]",
"(DaemonServer::got_service_map()+0x6a) [0x55df1dd63888]",
"(Mgr::handle_service_map(MServiceMap*)+0x140) [0x55df1ddb83fe]",
"(Mgr::ms_dispatch(Message*)+0x3e8) [0x55df1ddb8aa8]",
"(MgrStandby::ms_dispatch(Message*)+0x20a) [0x55df1ddcfdf2]",
"(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x55df1dd6d090]",
"(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xe9) [0x7f643955ff99]",
"(DispatchQueue::entry()+0x5ec) [0x7f643955eb3a]",
"(DispatchQueue::DispatchThread::entry()+0x1c) [0x7f64396c9366]",
"(Thread::entry_wrapper()+0x78) [0x7f6439337228]",
"(Thread::_entry_func(void*)+0x18) [0x7f64393371a6]",
"(()+0x7e65) [0x7f6435c36e65]",
"(clone()+0x6d) [0x7f64348e488d]"
],
"utsname_hostname": "MDS002",
"assert_msg": "/builddir/build/BUILD/ceph-14.2.16/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f642d0e7700 time 2021-06-05 15:07:23.828349\n/builddir/build/BUILD/ceph-14.2.16/src/mgr/DaemonServer.cc: 2795: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
"crash_id": "2021-06-05_06:07:23.855365Z_badf1c8e-7978-4fb1-92a2-82b2137dbde0",
"assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
"ceph_version": "14.2.16"
}
#16 Updated by Yongseok Oh over 2 years ago
Can I edit or remove the above message written by me?
#17 Updated by Neha Ojha over 2 years ago
- Backport changed from pacific to pacific, octopus
#18 Updated by Neha Ojha over 2 years ago
- Related to Bug #51835: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
#19 Updated by Telemetry Bot over 2 years ago
- Related to Bug #51913: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
#20 Updated by Sage Weil over 2 years ago
- Duplicated by Bug #51916: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
#21 Updated by Sage Weil over 2 years ago
- Duplicated by Bug #51922: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
#22 Updated by Sage Weil over 2 years ago
- Duplicated by Bug #51926: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
#23 Updated by Sage Weil over 2 years ago
- Duplicated by Bug #51929: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
#24 Updated by Sage Weil over 2 years ago
- Related to deleted (Bug #51913: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch))
#25 Updated by Sage Weil over 2 years ago
- Duplicated by Bug #51913: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added
#26 Updated by Sage Weil over 2 years ago
- Duplicated by Bug #51924: crash: Client::resolve_mds(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mds_gid_t, std::allocator<mds_gid_t> >*) added
#27 Updated by Sage Weil over 2 years ago
- Duplicated by deleted (Bug #51924: crash: Client::resolve_mds(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mds_gid_t, std::allocator<mds_gid_t> >*))
#29 Updated by David Galloway about 2 years ago
- Status changed from Resolved to Pending Backport
#30 Updated by Backport Bot about 2 years ago
- Copied to Backport #53198: octopus: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added
#31 Updated by Loïc Dachary almost 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
#32 Updated by Telemetry Bot over 1 year ago
- Related to Bug #54700: crash: DaemonServer::got_service_map()::<lambda(const ServiceMap&)>: assert(pending_service_map.epoch > service_map.epoch) added