Project

General

Profile

Bug #42347

nautilus assert during osd shutdown: FAILED ceph_assert((sharded_in_flight_list.back())->ops_in_flight_sharded.empty())

Added by Dan van der Ster over 4 years ago. Updated almost 4 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):

ff3b83173fba8c6a2bbd8dae328b69122fcafcc9410598fce18bc7528ca0d63f
e40e4f7fc7a2abd7de8ef2752ac31ba8ec92c769d572cb01d2ab400983124fb6
e6c8a528a50b59c42ad87bd3be3f5c1e683cf5b9d8f299caf498a7b2ef5c0f80
1dded87d42eecaef0287539022b86423608c9e0f76ae62b9d283d0a34da0c351

Crash signature (v2):

Description

Looks like #38377, but that is already fixed in nautilus.

We see this occasionally during OSD shutdown:

2019-10-17 10:54:45.473 7fb454aeedc0 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABL
E_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUILD/ceph-14.2.
4/src/common/TrackedOp.cc: In function 'OpTracker::~OpTracker()' thread 7fb454aeedc0 time 2019-10-17 10:54:
45.454071
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUILD/ceph-14.2.4/src/common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back())->ops_in_flight_sharded.empty())

 ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x556087d77a24]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x556087d77bf2]
 3: (OpTracker::~OpTracker()+0x1c4) [0x55608809d714]
 4: (OSD::~OSD()+0x45c) [0x556087e2e96c]
 5: (OSD::~OSD()+0x9) [0x556087e2ed49]
 6: (main()+0x1693) [0x556087d7c373]
 7: (__libc_start_main()+0xf5) [0x7fb450515505]
 8: (()+0x4b2695) [0x556087db1695]

Related issues

Related to RADOS - Bug #44715: common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back())->ops_in_flight_sharded.empty())mon: Resolved
Related to RADOS - Bug #45008: [osd crash]The ceph-osd assert with rbd bench io New

History

#1 Updated by Dan van der Ster over 4 years ago

coredump and log file @ ceph-post-file: a0fcd877-46da-4491-9e58-5ae117cfb92b

#2 Updated by Sage Weil over 4 years ago

Seeing 3 clusters hitting this on 14.2.2 via telemetry.

#3 Updated by Sage Weil over 4 years ago

  • Crash signature (v1) updated (diff)

#4 Updated by Neha Ojha over 4 years ago

  • Priority changed from Normal to High

#5 Updated by David Zafman about 4 years ago

#6 Updated by Sage Weil about 4 years ago

  • Crash signature (v1) updated (diff)

#7 Updated by Sage Weil about 4 years ago

  • Crash signature (v1) updated (diff)

#8 Updated by Sage Weil about 4 years ago

  • Status changed from New to Won't Fix

we've backported the osd fast shutdown ( https://github.com/ceph/ceph/pull/32743 ), so this will effectively go away for users.

#9 Updated by Bastian Mäuser about 4 years ago

This is still an issue on 14.2.6 (at least the one shipped with proxmox)

#10 Updated by Sage Weil about 4 years ago

Bastian Mäuser wrote:

This is still an issue on 14.2.6 (at least the one shipped with proxmox)

It will appear in the next nautilus

#11 Updated by Brad Hubbard almost 4 years ago

  • Crash signature (v1) updated (diff)

#12 Updated by Josh Durgin almost 4 years ago

  • Related to Bug #44715: common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back())->ops_in_flight_sharded.empty())mon: added

#13 Updated by Bastian Mäuser almost 4 years ago

This is still an issue on 14.2.8 (at least the one shipped with proxmox):

root@pve05pp:/var/log# ceph crash info 2020-04-08_21:30:31.440500Z_0ed9bd70-a973-4567-80ee-7c60269483b9
{
    "os_version_id": "10", 
    "assert_condition": "(sharded_in_flight_list.back())->ops_in_flight_sharded.empty()", 
    "utsname_release": "5.3.10-1-pve", 
    "os_name": "Debian GNU/Linux 10 (buster)", 
    "entity_name": "mon.pve05pp", 
    "assert_file": "/mnt/pve/ceph-dev/ceph/ceph-14.2.8/src/common/TrackedOp.cc", 
    "timestamp": "2020-04-08 21:30:31.440500Z", 
    "process_name": "ceph-mon", 
    "utsname_machine": "x86_64", 
    "assert_line": 163, 
    "utsname_sysname": "Linux", 
    "os_version": "10 (buster)", 
    "os_id": "10", 
    "assert_thread_name": "ceph-mon", 
    "utsname_version": "#1 SMP PVE 5.3.10-1 (Thu, 14 Nov 2019 10:43:13 +0100)", 
    "backtrace": [
        "(()+0x12730) [0x7f95fd5a8730]", 
        "(gsignal()+0x10b) [0x7f95fd08b7bb]", 
        "(abort()+0x121) [0x7f95fd076535]", 
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f95fe6cdd33]", 
        "(()+0x26deba) [0x7f95fe6cdeba]", 
        "(OpTracker::~OpTracker()+0x35) [0x7f95fe7a8a25]", 
        "(Monitor::~Monitor()+0x2bb) [0x5587a073bffb]", 
        "(Monitor::~Monitor()+0x9) [0x5587a073c569]", 
        "(main()+0x26fa) [0x5587a06cfb3a]", 
        "(__libc_start_main()+0xeb) [0x7f95fd07809b]", 
        "(_start()+0x2a) [0x5587a06fed5a]" 
    ], 
    "utsname_hostname": "pve05pp", 
    "assert_msg": "/mnt/pve/ceph-dev/ceph/ceph-14.2.8/src/common/TrackedOp.cc: In function 'OpTracker::~OpTracker()' thread 7f95fcb62280 time 2020-04-08 23:30:31.438583\n/mnt/pve/ceph-dev/ceph/ceph-14.2.8/src/common/TrackedOp.cc: 163: FAILED ceph_assert((sharded_in_flight_list.back())->ops_in_flight_sharded.empty())\n", 
    "crash_id": "2020-04-08_21:30:31.440500Z_0ed9bd70-a973-4567-80ee-7c60269483b9", 
    "assert_func": "OpTracker::~OpTracker()", 
    "ceph_version": "14.2.8" 
}

#14 Updated by Neha Ojha almost 4 years ago

  • Related to Bug #45008: [osd crash]The ceph-osd assert with rbd bench io added

#15 Updated by Brad Hubbard almost 4 years ago

/a/teuthology-2020-04-26_02:30:03-rados-octopus-distro-basic-smithi/4984693

Also available in: Atom PDF