Actions
Bug #10892
closedOSD: leaked pg refs on shutdown breaking FS nightlies
Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
The FS runs have been seeing 2-4 jobs in every run fail on this for a couple (several?) weeks, but not at all previously. We were under the impression that it was the same bug as in #9630, #10421, etc.
Unfortunately this is recurring despite the fix for #10421.
2015-02-14T12:32:15.502 INFO:tasks.ceph.osd.1.burnupi26.stderr:osd/OSD.cc: In function 'int OSD::shutdown()' thread 2bb3a700 time 2015-02-14 12:32:15.477439 2015-02-14T12:32:15.502 INFO:tasks.ceph.osd.1.burnupi26.stderr:osd/OSD.cc: 2368: FAILED assert(0) 2015-02-14T12:32:15.540 INFO:tasks.ceph.osd.1.burnupi26.stderr: ceph version 0.92-1099-gf62fb72 (f62fb72f78c5d3956bcbb5e6582c283ca77abe7b) 2015-02-14T12:32:15.541 INFO:tasks.ceph.osd.1.burnupi26.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc2b0b] 2015-02-14T12:32:15.541 INFO:tasks.ceph.osd.1.burnupi26.stderr: 2: (OSD::shutdown()+0x1808) [0x6881b8] 2015-02-14T12:32:15.541 INFO:tasks.ceph.osd.1.burnupi26.stderr: 3: (OSD::handle_signal(int)+0x60) [0x688840] 2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: 4: (SignalHandler::entry()+0x117) [0xacca37] 2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: 5: (()+0x8182) [0x5d72182] 2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: 6: (clone()+0x6d) [0x784438d] 2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2015-02-14T12:32:15.549 INFO:tasks.ceph.osd.1.burnupi26.stderr:2015-02-14 12:32:15.475196 2bb3a700 -1 osd.1 10 pgid 1.0 has ref count of 65 2015-02-14T12:32:15.549 INFO:tasks.ceph.osd.1.burnupi26.stderr:2015-02-14 12:32:15.541058 2bb3a700 -1 osd/OSD.cc: In function 'int OSD::shutdown()' thread 2bb3a700 time 2015-02-14 12:32:15.477439 2015-02-14T12:32:15.549 INFO:tasks.ceph.osd.1.burnupi26.stderr:osd/OSD.cc: 2368: FAILED assert(0)
Some recent examples:
http://pulpito.ceph.com/teuthology-2015-02-12_23:04:02-fs-hammer-testing-basic-multi/755699/
http://pulpito.ceph.com/teuthology-2015-02-12_23:04:02-fs-hammer-testing-basic-multi/755697/
http://pulpito.ceph.com/teuthology-2015-02-12_23:04:02-fs-hammer-testing-basic-multi/755701/
Updated by Greg Farnum about 9 years ago
I should also note that many of these runs include dirty valgrind logs, so there are probably plenty of hints about whatever's going wrong!
Updated by Sage Weil about 9 years ago
- Status changed from 12 to Fix Under Review
Updated by Sage Weil about 9 years ago
- Status changed from Fix Under Review to Resolved
Actions