Project

General

Profile

Actions

Bug #10892

closed

OSD: leaked pg refs on shutdown breaking FS nightlies

Added by Greg Farnum about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The FS runs have been seeing 2-4 jobs in every run fail on this for a couple (several?) weeks, but not at all previously. We were under the impression that it was the same bug as in #9630, #10421, etc.

Unfortunately this is recurring despite the fix for #10421.

2015-02-14T12:32:15.502 INFO:tasks.ceph.osd.1.burnupi26.stderr:osd/OSD.cc: In function 'int OSD::shutdown()' thread 2bb3a700 time 2015-02-14 12:32:15.477439
2015-02-14T12:32:15.502 INFO:tasks.ceph.osd.1.burnupi26.stderr:osd/OSD.cc: 2368: FAILED assert(0)
2015-02-14T12:32:15.540 INFO:tasks.ceph.osd.1.burnupi26.stderr: ceph version 0.92-1099-gf62fb72 (f62fb72f78c5d3956bcbb5e6582c283ca77abe7b)
2015-02-14T12:32:15.541 INFO:tasks.ceph.osd.1.burnupi26.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc2b0b]
2015-02-14T12:32:15.541 INFO:tasks.ceph.osd.1.burnupi26.stderr: 2: (OSD::shutdown()+0x1808) [0x6881b8]
2015-02-14T12:32:15.541 INFO:tasks.ceph.osd.1.burnupi26.stderr: 3: (OSD::handle_signal(int)+0x60) [0x688840]
2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: 4: (SignalHandler::entry()+0x117) [0xacca37]
2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: 5: (()+0x8182) [0x5d72182]
2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: 6: (clone()+0x6d) [0x784438d]
2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-02-14T12:32:15.549 INFO:tasks.ceph.osd.1.burnupi26.stderr:2015-02-14 12:32:15.475196 2bb3a700 -1 osd.1 10 pgid 1.0 has ref count of 65
2015-02-14T12:32:15.549 INFO:tasks.ceph.osd.1.burnupi26.stderr:2015-02-14 12:32:15.541058 2bb3a700 -1 osd/OSD.cc: In function 'int OSD::shutdown()' thread 2bb3a700 time 2015-02-14 12:32:15.477439
2015-02-14T12:32:15.549 INFO:tasks.ceph.osd.1.burnupi26.stderr:osd/OSD.cc: 2368: FAILED assert(0)

Some recent examples:
http://pulpito.ceph.com/teuthology-2015-02-12_23:04:02-fs-hammer-testing-basic-multi/755699/
http://pulpito.ceph.com/teuthology-2015-02-12_23:04:02-fs-hammer-testing-basic-multi/755697/
http://pulpito.ceph.com/teuthology-2015-02-12_23:04:02-fs-hammer-testing-basic-multi/755701/

Actions #1

Updated by Greg Farnum about 9 years ago

I should also note that many of these runs include dirty valgrind logs, so there are probably plenty of hints about whatever's going wrong!

Actions #2

Updated by Sage Weil about 9 years ago

  • Status changed from 12 to Fix Under Review
Actions #3

Updated by Sage Weil about 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF