Project

General

Profile

Bug #10892

OSD: leaked pg refs on shutdown breaking FS nightlies

Added by Greg Farnum over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
Start date:
02/16/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

The FS runs have been seeing 2-4 jobs in every run fail on this for a couple (several?) weeks, but not at all previously. We were under the impression that it was the same bug as in #9630, #10421, etc.

Unfortunately this is recurring despite the fix for #10421.

2015-02-14T12:32:15.502 INFO:tasks.ceph.osd.1.burnupi26.stderr:osd/OSD.cc: In function 'int OSD::shutdown()' thread 2bb3a700 time 2015-02-14 12:32:15.477439
2015-02-14T12:32:15.502 INFO:tasks.ceph.osd.1.burnupi26.stderr:osd/OSD.cc: 2368: FAILED assert(0)
2015-02-14T12:32:15.540 INFO:tasks.ceph.osd.1.burnupi26.stderr: ceph version 0.92-1099-gf62fb72 (f62fb72f78c5d3956bcbb5e6582c283ca77abe7b)
2015-02-14T12:32:15.541 INFO:tasks.ceph.osd.1.burnupi26.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc2b0b]
2015-02-14T12:32:15.541 INFO:tasks.ceph.osd.1.burnupi26.stderr: 2: (OSD::shutdown()+0x1808) [0x6881b8]
2015-02-14T12:32:15.541 INFO:tasks.ceph.osd.1.burnupi26.stderr: 3: (OSD::handle_signal(int)+0x60) [0x688840]
2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: 4: (SignalHandler::entry()+0x117) [0xacca37]
2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: 5: (()+0x8182) [0x5d72182]
2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: 6: (clone()+0x6d) [0x784438d]
2015-02-14T12:32:15.542 INFO:tasks.ceph.osd.1.burnupi26.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-02-14T12:32:15.549 INFO:tasks.ceph.osd.1.burnupi26.stderr:2015-02-14 12:32:15.475196 2bb3a700 -1 osd.1 10 pgid 1.0 has ref count of 65
2015-02-14T12:32:15.549 INFO:tasks.ceph.osd.1.burnupi26.stderr:2015-02-14 12:32:15.541058 2bb3a700 -1 osd/OSD.cc: In function 'int OSD::shutdown()' thread 2bb3a700 time 2015-02-14 12:32:15.477439
2015-02-14T12:32:15.549 INFO:tasks.ceph.osd.1.burnupi26.stderr:osd/OSD.cc: 2368: FAILED assert(0)

Some recent examples:
http://pulpito.ceph.com/teuthology-2015-02-12_23:04:02-fs-hammer-testing-basic-multi/755699/
http://pulpito.ceph.com/teuthology-2015-02-12_23:04:02-fs-hammer-testing-basic-multi/755697/
http://pulpito.ceph.com/teuthology-2015-02-12_23:04:02-fs-hammer-testing-basic-multi/755701/

Associated revisions

Revision 487c2053 (diff)
Added by Sage Weil over 4 years ago

osd: clear obc cache on_shutdown

Fixes: #10892
Signed-off-by: Sage Weil <>

History

#1 Updated by Greg Farnum over 4 years ago

I should also note that many of these runs include dirty valgrind logs, so there are probably plenty of hints about whatever's going wrong!

#2 Updated by Sage Weil over 4 years ago

  • Status changed from Verified to Need Review

#3 Updated by Sage Weil over 4 years ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF