Project

General

Profile

Bug #48022

mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)

Added by Neha Ojha 7 months ago. Updated 22 days ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-10-26T19:31:33.678 INFO:teuthology.orchestra.run.smithi120.stderr:2020-10-26T19:31:33.679+0000 7f5b56d9e700  1 -- 172.21.15.120:0/2411503374 shutdown_connections
2020-10-26T19:31:33.679 INFO:teuthology.orchestra.run.smithi120.stderr:2020-10-26T19:31:33.679+0000 7f5b56d9e700  1 -- 172.21.15.120:0/2411503374 wait complete.
2020-10-26T19:31:33.681 INFO:tasks.ceph.mgr.z.smithi105.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-6666-g5af54f2e/rpm/el8/BUILD/ceph-16.0.0-6666-g5af54f2e/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f1d96de1700 time 2020-10-26T19:31:33.682982+0000
2020-10-26T19:31:33.681 INFO:tasks.ceph.mgr.z.smithi105.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-6666-g5af54f2e/rpm/el8/BUILD/ceph-16.0.0-6666-g5af54f2e/src/mgr/DaemonServer.cc: 2816: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: ceph version 16.0.0-6666-g5af54f2e (5af54f2ebc54c8319e53a7e64fe7b4bdbbcd15bb) pacific (dev)
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f1da0acc274]
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x27148e) [0x7f1da0acc48e]
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 3: (DaemonServer::got_service_map()+0xb2d) [0x55c193bb051d]
2020-10-26T19:31:33.682 INFO:tasks.ceph.mgr.z.smithi105.stderr: 4: (Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x14b) [0x55c193bdfffb]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 5: (Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x2c7) [0x55c193be2117]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 6: (MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xa5) [0x55c193beb2e5]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 7: (DispatchQueue::entry()+0x126a) [0x7f1da0cf077a]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 8: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f1da0da0291]
2020-10-26T19:31:33.683 INFO:tasks.ceph.mgr.z.smithi105.stderr: 9: /lib64/libpthread.so.0(+0x82de) [0x7f1d9ef592de]
2020-10-26T19:31:33.684 INFO:tasks.ceph.mgr.z.smithi105.stderr: 10: clone()

https://jenkins.ceph.com/blue/organizations/jenkins/ceph-api-nightly-octopus-backend/detail/ceph-api-nightly-octopus-backend/233/pipeline
/a/yuriw-2020-10-26_17:47:19-rados-wip-yuri-testing-2020-10-26-0817-distro-basic-smithi/5562236


Related issues

Duplicated by mgr - Bug #49255: src/mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) Duplicate
Duplicated by mgr - Bug #49476: DaemonServer.cc: 2827: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) Duplicate
Copied to mgr - Backport #49908: pacific: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) Resolved

History

#1 Updated by Neha Ojha 3 months ago

  • Duplicated by Bug #49255: src/mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added

#2 Updated by Neha Ojha 3 months ago

  • Priority changed from Normal to High

#3 Updated by Neha Ojha 2 months ago

  • Duplicated by Bug #49476: DaemonServer.cc: 2827: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added

#4 Updated by Neha Ojha about 2 months ago

  • Priority changed from High to Urgent
nojha@reesi001:~$ sudo ceph crash info 2021-03-15T17:01:04.327050Z_07ce30e5-460a-4a7a-9eff-70902f36f327
{
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-736-g8191ac78/rpm/el8/BUILD/ceph-16.1.0-736-g8191ac78/src/mgr/DaemonServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2924,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-736-g8191ac78/rpm/el8/BUILD/ceph-16.1.0-736-g8191ac78/src/mgr/DaemonServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7f6f979a0700 time 2021-03-15T17:01:04.322742+0000\n/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-736-g8191ac78/rpm/el8/BUILD/ceph-16.1.0-736-g8191ac78/src/mgr/DaemonServer.cc: 2924: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7f6f9fdb2b20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f6fa11c738b]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x276554) [0x7f6fa11c7554]",
        "(DaemonServer::got_service_map()+0xb2d) [0x555f7b3c730d]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x14b) [0x555f7b3f5b7b]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x884) [0x555f7b3f8764]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xa5) [0x555f7b401d25]",
        "(DispatchQueue::entry()+0x126a) [0x7f6fa13ff52a]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f6fa14ada41]",
        "/lib64/libpthread.so.0(+0x814a) [0x7f6f9fda814a]",
        "clone()" 
    ],
    "ceph_version": "16.1.0-736-g8191ac78",
    "crash_id": "2021-03-15T17:01:04.327050Z_07ce30e5-460a-4a7a-9eff-70902f36f327",
    "entity_name": "mgr.reesi004.tplfrt",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "aa3ccd51d5334df9e1e6472bcb9c51691650124acbe5d28d1759e0f9c13c079e",
    "timestamp": "2021-03-15T17:01:04.327050Z",
    "utsname_hostname": "reesi004",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-66-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#74~18.04.2-Ubuntu SMP Fri Feb 5 11:17:31 UTC 2021" 
}

#6 Updated by Mykola Golub about 2 months ago

  • Status changed from New to In Progress
  • Assignee set to Mykola Golub

#7 Updated by Mykola Golub about 2 months ago

Neha, do you know any new enough case on the teuthology so I could look at the mgr and mon logs?

#8 Updated by Neha Ojha about 2 months ago

Mykola Golub wrote:

Neha, do you know any new enough case on the teuthology so I could look at the mgr and mon logs?

https://tracker.ceph.com/issues/49255 is the most recent failure from teuthology I have seen, but it does not have logs because the job died. I have not been able to find logs for any of these yet.

#9 Updated by Sage Weil about 2 months ago

  • Status changed from In Progress to Fix Under Review
  • Assignee deleted (Mykola Golub)
  • Pull request ID set to 40219

#10 Updated by Sage Weil about 2 months ago

  • Backport set to pacific

#11 Updated by Sage Weil about 2 months ago

  • Status changed from Fix Under Review to Pending Backport

#12 Updated by Backport Bot about 2 months ago

  • Copied to Backport #49908: pacific: mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) added

#13 Updated by Loïc Dachary about 1 month ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

#14 Updated by Wout van Heeswijk 22 days ago

We are experiencing the same crash reports on Octopus 15.2.8. I've not found a backport issue for this issue to Octopus. Can and should this also be backported to Octopus?

Our crash report:

{
    "archived": "2021-04-22 07:53:13.871413",
    "assert_condition": "pending_service_map.epoch > service_map.epoch",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/mgr/Daem
onServer.cc",
    "assert_func": "DaemonServer::got_service_map()::<lambda(const ServiceMap&)>",
    "assert_line": 2796,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/mgr/Daemo
nServer.cc: In function 'DaemonServer::got_service_map()::<lambda(const ServiceMap&)>' thread 7fe68dfc8700 time 2021-04-22T08:12:49.479400+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABL
E_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/mgr/DaemonServer.cc: 2796: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch)\n" 
,
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "(()+0x12b20) [0x7fe69d76bb20]",
        "(gsignal()+0x10f) [0x7fe69c1bc7ff]",
        "(abort()+0x127) [0x7fe69c1a6c35]",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7fe69f2ca735]",
        "(()+0x27a8fe) [0x7fe69f2ca8fe]",
        "(DaemonServer::got_service_map()+0x9b5) [0x560a272d3d95]",
        "(Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>)+0x14b) [0x560a2730cb6b]",
        "(Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x2c7) [0x560a2730ebf7]",
        "(MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xb0) [0x560a27317ba0]",
        "(DispatchQueue::entry()+0x126a) [0x7fe69f4e909a]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7fe69f58b8a1]",
        "(()+0x814a) [0x7fe69d76114a]",
        "(clone()+0x43) [0x7fe69c281f23]" 
    ],
    "ceph_version": "15.2.8",
    "crash_id": "2021-04-22T06:12:49.482382Z_c23cd42f-1c72-4598-a909-6519f6fcb842",
    "entity_name": "mgr.alpha",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "86ab3a48e25206c884ac52034d1562c68826296d4fcbe7ceff0b9d8b9b4a56a1",
    "timestamp": "2021-04-22T06:12:49.482382Z",
    "utsname_hostname": "alpha",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-240.10.1.el8_3.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Mon Jan 18 17:05:51 UTC 2021" 
}

Also available in: Atom PDF