Project

General

Profile

Bug #38405

FAILED ceph_assert(weak_refs.empty()) on osd shutdown

Added by Casey Bodley about 5 years ago. Updated about 5 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This has been showing up again in almost every run of the rgw/multisite suite. The main difference in this suite is that it runs with 'wait-for-scrub: false' to cut down on runtimes.

Example job with failed assertions in c1.osd.0 c2.osd.0 and c2.osd.2:

http://qa-proxy.ceph.com/teuthology/cbodley-2019-02-20_13:22:19-rgw-wip-cbodley-testing-distro-basic-smithi/3617546/teuthology.log

2019-02-20T14:55:21.147 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:20.308 15fb6700 -1 received  signal: Terminated from /usr/bin/python /bin/daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/c1.osd.0.log --time-stamp=yes --tool=memcheck ceph-osd -f --cluster c1 -i 0  (PID: 33895) UID: 0
2019-02-20T14:55:21.217 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:20.969 15fb6700 -1 osd.0 80 *** Got signal Terminated ***
2019-02-20T14:55:24.284 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.119 15fb6700 -1 osd.0 81 pgid 3.c has ref count of 2
2019-02-20T14:55:24.284 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 2.3c has ref count of 2
2019-02-20T14:55:24.285 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 2.4 has ref count of 2
2019-02-20T14:55:24.288 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 8.3c has ref count of 2
2019-02-20T14:55:24.288 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 1.4 has ref count of 2
2019-02-20T14:55:24.289 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 2.1c has ref count of 2
2019-02-20T14:55:24.309 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 3.1c has ref count of 2
2019-02-20T14:55:24.309 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 12.c has ref count of 2
2019-02-20T14:55:24.309 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 3.2c has ref count of 2
2019-02-20T14:55:24.310 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 12.3c has ref count of 2
2019-02-20T14:55:24.310 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 3.34 has ref count of 2
2019-02-20T14:55:24.310 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 6.4 has ref count of 2
2019-02-20T14:55:24.311 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 7.4 has ref count of 2
2019-02-20T14:55:25.527 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3884-gfef444a/rpm/el7/BUILD/ceph-14.0.1-3884-gfef444a/src/common/shared_cache.hpp: In function 'SharedLRU<K, V>::~SharedLRU() [with K = unsigned int; V = const OSDMap]' thread 9cbb140 time 2019-02-20 14:55:25.535887
2019-02-20T14:55:25.527 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3884-gfef444a/rpm/el7/BUILD/ceph-14.0.1-3884-gfef444a/src/common/shared_cache.hpp: 121: FAILED ceph_assert(weak_refs.empty())
2019-02-20T14:55:25.568 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: ceph version 14.0.1-3884-gfef444a (fef444a751955eb131303c9024fb2f08dbf30017) nautilus (dev)
2019-02-20T14:55:25.568 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x57488e]
2019-02-20T14:55:25.568 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x574a5c]
2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 3: (()+0x6294ce) [0x7314ce]
2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 4: (OSDService::~OSDService()+0x14d) [0x6e197d]
2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 5: (OSD::~OSD()+0x29c) [0x6e6a1c]
2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 6: (OSD::~OSD()+0x9) [0x6e7009]
2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 7: (main()+0x16e5) [0x578635]
2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 8: (__libc_start_main()+0xf5) [0xe4ce445]
2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 9: (()+0x5641c5) [0x66c1c5]
2019-02-20T14:55:25.571 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:*** Caught signal (Aborted) **

Related issues

Duplicates RADOS - Bug #37808: osd: osdmap cache weak_refs assert during shutdown New

History

#1 Updated by Casey Bodley about 5 years ago

  • Duplicates Bug #37808: osd: osdmap cache weak_refs assert during shutdown added

#2 Updated by Casey Bodley about 5 years ago

  • Status changed from New to Duplicate

Also available in: Atom PDF