Bug #20000
closedosd assert in shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
0%
Description
version:
root@node0:~# ceph -v
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
bluestore+ec+overwrite ,ec-k/m-4/2
2017-05-19 17:14:08.553617 7f290b016c80 -1 /build/ceph-12.0.2/src/common/shared_cache.hpp: In function 'SharedLRU<K, V, C, H>::~SharedLRU() [with K = unsigned int; V = const OSDMap; C = std::less<unsigned int>; H = std::hash<unsigned int>]' thread 7f290b016c80 time 2017-05-19 17:14:08.533752
/build/ceph-12.0.2/src/common/shared_cache.hpp: 107: FAILED assert(weak_refs.empty())
ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55aa6593a072]
2: (()+0x79d7f3) [0x55aa653c87f3]
3: (OSDService::~OSDService()+0x158) [0x55aa65350138]
4: (OSD::~OSD()+0x125) [0x55aa653a5f15]
5: (OSD::~OSD()+0x9) [0x55aa653a66b9]
6: (main()+0x30ff) [0x55aa6532787f]
7: (__libc_start_main()+0xf0) [0x7f2908481830]
8: (_start()+0x29) [0x55aa65333cc9]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this
Updated by Sage Weil almost 7 years ago
- Status changed from New to 12
- Priority changed from Normal to Urgent
2017-06-06T23:24:29.946 INFO:tasks.ceph.osd.1.smithi011.stderr:dump_weak_refs 0xf6b5480 weak_refs: 521 = 0xf590950 with 5 refs 2017-06-06T23:24:29.946 INFO:tasks.ceph.osd.1.smithi011.stderr: 2017-06-06T23:24:29.946 INFO:tasks.ceph.osd.1.smithi011.stderr: 0> 2017-06-06 23:24:29.556669 9635080 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.2-2450-g0804911/rpm/el7/BUILD/ceph-12.0.2-2450-g0804911/src/common/shared_cache.hpp: In function 'SharedLRU<K, V, C, H>::~SharedLRU() [with K = unsigned int; V = const OSDMap; C = std::less<unsigned int>; H = std::hash<unsigned int>]' thread 9635080 time 2017-06-06 23:24:29.520142 2017-06-06T23:24:29.946 INFO:tasks.ceph.osd.1.smithi011.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.2-2450-g0804911/rpm/el7/BUILD/ceph-12.0.2-2450-g0804911/src/common/shared_cache.hpp: 105: FAILED assert(weak_refs.empty()) 2017-06-06T23:24:29.946 INFO:tasks.ceph.osd.1.smithi011.stderr: 2017-06-06T23:24:29.946 INFO:tasks.ceph.osd.1.smithi011.stderr: ceph version 12.0.2-2450-g0804911 (0804911540df0c38883250e101725c383f2486b5) luminous (dev) 2017-06-06T23:24:29.946 INFO:tasks.ceph.osd.1.smithi011.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0xafb9f0] 2017-06-06T23:24:29.946 INFO:tasks.ceph.osd.1.smithi011.stderr: 2: (()+0x5036da) [0x60b6da] 2017-06-06T23:24:29.946 INFO:tasks.ceph.osd.1.smithi011.stderr: 3: (OSDService::~OSDService()+0x17c) [0x594f3c] 2017-06-06T23:24:29.947 INFO:tasks.ceph.osd.1.smithi011.stderr: 4: (OSD::~OSD()+0x133) [0x5e37a3] 2017-06-06T23:24:29.947 INFO:tasks.ceph.osd.1.smithi011.stderr: 5: (OSD::~OSD()+0x9) [0x5e3de9] 2017-06-06T23:24:29.947 INFO:tasks.ceph.osd.1.smithi011.stderr: 6: (main()+0x2f48) [0x4dc4e8] 2017-06-06T23:24:29.947 INFO:tasks.ceph.osd.1.smithi011.stderr: 7: (__libc_start_main()+0xf5) [0xd5c5b35] 2017-06-06T23:24:29.947 INFO:tasks.ceph.osd.1.smithi011.stderr: 8: (()+0x46d4a6) [0x5754a6]
/a/sage-2017-06-06_21:54:14-rados-wip-sage-testing-distro-basic-smithi/1265660
Updated by Zengran Zhang almost 7 years ago
we found that the msg threads still working after the `delete osd` in asyncmsg env, its because the asyncmsg::wait() will not join its workers's threads. may this cause the issue here?
Updated by Sage Weil almost 7 years ago
- Related to Bug #20273: osd/OSD.h: 1957: FAILED assert(peerin g_queue.empty()) added
Updated by Sage Weil almost 7 years ago
- Status changed from 12 to Need More Info
Updated by Yuri Weinstein almost 7 years ago
2017-06-21T01:36:20.088 INFO:tasks.ceph.c2.osd.2.smithi044.stderr:/build/ceph-12.0.3-1946-g503de20/src/common/shared_cache.hpp: In function 'SharedLRU<K, V, C, H>::~SharedLRU() [with K = unsigned int; V = const OSDMap; C = std::less<unsigned int>; H = std::hash<unsigned int>]' thread 96903c0 time 2017-06-21 01:36:20.091545 2017-06-21T01:36:20.089 INFO:tasks.ceph.c2.osd.2.smithi044.stderr:/build/ceph-12.0.3-1946-g503de20/src/common/shared_cache.hpp: 108: FAILED assert(weak_refs.empty()) 2017-06-21T01:36:20.127 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: ceph version 12.0.3-1946-g503de20 (503de209275b5d54a41747e19bca5495259bec43) luminous (dev) 2017-06-21T01:36:20.127 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0xaea5be] 2017-06-21T01:36:20.127 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: 2: (()+0x512445) [0x61a445] 2017-06-21T01:36:20.127 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: 3: (OSDService::~OSDService()+0x16c) [0x5a871c] 2017-06-21T01:36:20.127 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: 4: (OSD::~OSD()+0x123) [0x5f5c83] 2017-06-21T01:36:20.127 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: 5: (OSD::~OSD()+0x9) [0x5f62a9] 2017-06-21T01:36:20.128 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: 6: (main()+0x2ae7) [0x4f3037] 2017-06-21T01:36:20.128 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: 7: (__libc_start_main()+0xf5) [0xc760f45] 2017-06-21T01:36:20.128 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: 8: (()+0x485096) [0x58d096] 2017-06-21T01:36:20.128 INFO:tasks.ceph.c2.osd.2.smithi044.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Casey Bodley almost 7 years ago
These osd assertion failures reproduce consistently on shutdown in the rgw:multisite suite.
Updated by Sage Weil almost 7 years ago
- Related to Bug #20432: pgid 0.7 has ref count of 2 added
Updated by Sage Weil almost 7 years ago
/a/sage-2017-06-27_05:44:05-rados-wip-sage-testing-distro-basic-smithi/1331957
Updated by Patrick Donnelly almost 7 years ago
/a/pdonnell-2017-06-27_19:50:40-fs-wip-pdonnell-20170627---basic-smithi/1333726
/a/pdonnell-2017-06-27_19:50:40-fs-wip-pdonnell-20170627---basic-smithi/1333675
Updated by Kefu Chai almost 7 years ago
- Priority changed from Urgent to High
lower the priority since we haven't spotted it for a while.
Updated by Sage Weil over 6 years ago
- Related to Bug #21823: on_flushed: object ... obc still alive (ec + cache tiering) added
Updated by Sage Weil over 6 years ago
- Status changed from Need More Info to Can't reproduce
Updated by Sage Weil over 5 years ago
- Status changed from Can't reproduce to 12
i see a zillion of these in this run
http://pulpito.ceph.com/teuthology-2019-01-05_03:09:02-powercycle-master-distro-basic-smithi/#
for example,
/a/teuthology-2019-01-05_03:09:02-powercycle-master-distro-basic-smithi/3424141
2019-01-07 20:43:38.514 7f98ab336900 1 -- shutdown_connections 2019-01-07 20:43:38.514 7f98ab336900 1 -- wait complete. 2019-01-07 20:43:38.518 7f98ab336900 -1 leaked refs: dump_weak_refs 0xbf768e0 weak_refs: 24 = 0xc2a6d00 with 1 refs 2019-01-07 20:43:38.522 7f98ab336900 -1 /build/ceph-14.0.1-2314-g4331a92/src/common/shared_cache.hpp: In function 'SharedLRU<K, V>::~SharedLRU() [with K = unsigned int; V = const OSDMap]' thread 7f98ab336900 time 2019-01-07 20:43:38.519177 /build/ceph-14.0.1-2314-g4331a92/src/common/shared_cache.hpp: 121: FAILED ceph_assert(weak_refs.empty()) ceph version 14.0.1-2314-g4331a92 (4331a92ab70f878fac574bd60a1ce3bc310680f2) nautilus (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x82e1a3] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x82e37e] 3: ceph-osd() [0x9d5001] 4: (OSDService::~OSDService()+0x144) [0x988d14] 5: (OSD::~OSD()+0x2c3) [0x98e7b3] 6: (OSD::~OSD()+0x9) [0x98ede9]
Updated by Josh Durgin over 4 years ago
- Status changed from 12 to Can't reproduce
Re-open if this still occurs.