Project

General

Profile

Actions

Bug #3221

closed

disconnect_session_watchers missing pg

Added by Samuel Just over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/teuthology-2012-09-24_04:00:04-regression-stable-master-basic/28971/remote

--- end dump of recent events ---
2012-09-24 17:03:56.764890 7fb2e9495700 -1 ** Caught signal (Aborted) *
in thread 7fb2e9495700

ceph version 0.48.1argonaut-37-g3e02b2f (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x6fd971]
2: (()+0xfcb0) [0x7fb2f6c1ccb0]
3: (gsignal()+0x35) [0x7fb2f4efc445]
4: (abort()+0x17b) [0x7fb2f4effbab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7fb2f584a69d]
6: (()+0xb5846) [0x7fb2f5848846]
7: (()+0xb5873) [0x7fb2f5848873]
8: (()+0xb596e) [0x7fb2f584896e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1e9) [0x7bc799]
10: (OSD::disconnect_session_watches(OSD::Session*)+0x917) [0x5bd3f7]
11: (OSD::ms_handle_reset(Connection*)+0x11a) [0x5bd53a]
12: (SimpleMessenger::DispatchQueue::entry()+0xe35) [0x799a95]
13: (SimpleMessenger::dispatch_entry()+0x27) [0x79a3b7]
14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x75889d]
15: (()+0x7e9a) [0x7fb2f6c14e9a]
16: (clone()+0x6d) [0x7fb2f4fb84bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2012-09-24 17:03:56.764890 7fb2e9495700 -1 ** Caught signal (Aborted) *
in thread 7fb2e9495700

ceph version 0.48.1argonaut-37-g3e02b2f (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x6fd971]
2: (()+0xfcb0) [0x7fb2f6c1ccb0]
3: (gsignal()+0x35) [0x7fb2f4efc445]
4: (abort()+0x17b) [0x7fb2f4effbab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7fb2f584a69d]
6: (()+0xb5846) [0x7fb2f5848846]
7: (()+0xb5873) [0x7fb2f5848873]
8: (()+0xb596e) [0x7fb2f584896e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1e9) [0x7bc799]
10: (OSD::disconnect_session_watches(OSD::Session*)+0x917) [0x5bd3f7]
11: (OSD::ms_handle_reset(Connection*)+0x11a) [0x5bd53a]
12: (SimpleMessenger::DispatchQueue::entry()+0xe35) [0x799a95]
13: (SimpleMessenger::dispatch_entry()+0x27) [0x79a3b7]
14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x75889d]
15: (()+0x7e9a) [0x7fb2f6c14e9a]
16: (clone()+0x6d) [0x7fb2f4fb84bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- end dump of recent events ---


Related issues 2 (0 open2 closed)

Related to Ceph - Bug #2533: osd: watchers tracked by entity_name_t, not by cookieDuplicateSamuel Just06/07/2012

Actions
Has duplicate Ceph - Bug #3612: disconnect_session_watches assert(pg) failedDuplicateSamuel Just12/12/2012

Actions
Actions #1

Updated by Samuel Just over 11 years ago

  • Assignee set to Samuel Just
Actions #2

Updated by Sage Weil over 11 years ago

  • Status changed from New to Resolved
Actions #3

Updated by Tamilarasi muthamizhan over 11 years ago

  • Status changed from Resolved to In Progress

recent log: ubuntu@teuthology:/a/teuthology-2012-12-09_19:00:03-regression-master-testing-gcov/10911

Actions #4

Updated by Sage Weil over 11 years ago

I think the locking here is just broken. The obc always goes away at PG reset time when it is removed from the session watch list. This method grabs those pointers, drops, the lock, and then tries to take the PG.. but if that happens the PG or the obc could be gone because it didn't take a reference. Don't think it's trivially patched... this should get fixed along with the pending watch cleanup, #2533.

Actions #5

Updated by Sage Weil over 11 years ago

  • Status changed from In Progress to 12
  • Assignee deleted (Samuel Just)
Actions #6

Updated by Sage Weil over 11 years ago

  • Priority changed from Normal to Urgent

This has popped up again.

I think the easy fix is to make the map reference pg's and obc's by name, and do the lookups from the other direction after cribbing the locks. An annoying change, but cleaner than the fugly void* stuff anyway.

Actions #7

Updated by Ian Colle over 11 years ago

  • Assignee set to Samuel Just
Actions #8

Updated by Sage Weil over 11 years ago

  • Status changed from 12 to In Progress
Actions #9

Updated by Sage Weil over 11 years ago

  • Status changed from In Progress to 7
Actions #10

Updated by Sage Weil over 11 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF