Actions
Bug #38405
closedFAILED ceph_assert(weak_refs.empty()) on osd shutdown
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This has been showing up again in almost every run of the rgw/multisite suite. The main difference in this suite is that it runs with 'wait-for-scrub: false' to cut down on runtimes.
Example job with failed assertions in c1.osd.0 c2.osd.0 and c2.osd.2:
2019-02-20T14:55:21.147 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:20.308 15fb6700 -1 received signal: Terminated from /usr/bin/python /bin/daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/c1.osd.0.log --time-stamp=yes --tool=memcheck ceph-osd -f --cluster c1 -i 0 (PID: 33895) UID: 0 2019-02-20T14:55:21.217 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:20.969 15fb6700 -1 osd.0 80 *** Got signal Terminated *** 2019-02-20T14:55:24.284 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.119 15fb6700 -1 osd.0 81 pgid 3.c has ref count of 2 2019-02-20T14:55:24.284 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 2.3c has ref count of 2 2019-02-20T14:55:24.285 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 2.4 has ref count of 2 2019-02-20T14:55:24.288 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 8.3c has ref count of 2 2019-02-20T14:55:24.288 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 1.4 has ref count of 2 2019-02-20T14:55:24.289 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 2.1c has ref count of 2 2019-02-20T14:55:24.309 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 3.1c has ref count of 2 2019-02-20T14:55:24.309 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 12.c has ref count of 2 2019-02-20T14:55:24.309 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 3.2c has ref count of 2 2019-02-20T14:55:24.310 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 12.3c has ref count of 2 2019-02-20T14:55:24.310 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 3.34 has ref count of 2 2019-02-20T14:55:24.310 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 6.4 has ref count of 2 2019-02-20T14:55:24.311 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:2019-02-20 14:55:24.120 15fb6700 -1 osd.0 81 pgid 7.4 has ref count of 2 2019-02-20T14:55:25.527 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3884-gfef444a/rpm/el7/BUILD/ceph-14.0.1-3884-gfef444a/src/common/shared_cache.hpp: In function 'SharedLRU<K, V>::~SharedLRU() [with K = unsigned int; V = const OSDMap]' thread 9cbb140 time 2019-02-20 14:55:25.535887 2019-02-20T14:55:25.527 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3884-gfef444a/rpm/el7/BUILD/ceph-14.0.1-3884-gfef444a/src/common/shared_cache.hpp: 121: FAILED ceph_assert(weak_refs.empty()) 2019-02-20T14:55:25.568 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: ceph version 14.0.1-3884-gfef444a (fef444a751955eb131303c9024fb2f08dbf30017) nautilus (dev) 2019-02-20T14:55:25.568 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x57488e] 2019-02-20T14:55:25.568 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x574a5c] 2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 3: (()+0x6294ce) [0x7314ce] 2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 4: (OSDService::~OSDService()+0x14d) [0x6e197d] 2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 5: (OSD::~OSD()+0x29c) [0x6e6a1c] 2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 6: (OSD::~OSD()+0x9) [0x6e7009] 2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 7: (main()+0x16e5) [0x578635] 2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 8: (__libc_start_main()+0xf5) [0xe4ce445] 2019-02-20T14:55:25.569 INFO:tasks.ceph.c1.osd.0.smithi184.stderr: 9: (()+0x5641c5) [0x66c1c5] 2019-02-20T14:55:25.571 INFO:tasks.ceph.c1.osd.0.smithi184.stderr:*** Caught signal (Aborted) **
Updated by Casey Bodley about 5 years ago
- Is duplicate of Bug #37808: osd: osdmap cache weak_refs assert during shutdown added
Actions