Project

General

Profile

Bug #3221

disconnect_session_watchers missing pg

Added by Samuel Just almost 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

/a/teuthology-2012-09-24_04:00:04-regression-stable-master-basic/28971/remote

--- end dump of recent events ---
2012-09-24 17:03:56.764890 7fb2e9495700 -1 ** Caught signal (Aborted) *
in thread 7fb2e9495700

ceph version 0.48.1argonaut-37-g3e02b2f (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x6fd971]
2: (()+0xfcb0) [0x7fb2f6c1ccb0]
3: (gsignal()+0x35) [0x7fb2f4efc445]
4: (abort()+0x17b) [0x7fb2f4effbab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7fb2f584a69d]
6: (()+0xb5846) [0x7fb2f5848846]
7: (()+0xb5873) [0x7fb2f5848873]
8: (()+0xb596e) [0x7fb2f584896e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1e9) [0x7bc799]
10: (OSD::disconnect_session_watches(OSD::Session*)+0x917) [0x5bd3f7]
11: (OSD::ms_handle_reset(Connection*)+0x11a) [0x5bd53a]
12: (SimpleMessenger::DispatchQueue::entry()+0xe35) [0x799a95]
13: (SimpleMessenger::dispatch_entry()+0x27) [0x79a3b7]
14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x75889d]
15: (()+0x7e9a) [0x7fb2f6c14e9a]
16: (clone()+0x6d) [0x7fb2f4fb84bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2012-09-24 17:03:56.764890 7fb2e9495700 -1 ** Caught signal (Aborted) *
in thread 7fb2e9495700

ceph version 0.48.1argonaut-37-g3e02b2f (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x6fd971]
2: (()+0xfcb0) [0x7fb2f6c1ccb0]
3: (gsignal()+0x35) [0x7fb2f4efc445]
4: (abort()+0x17b) [0x7fb2f4effbab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7fb2f584a69d]
6: (()+0xb5846) [0x7fb2f5848846]
7: (()+0xb5873) [0x7fb2f5848873]
8: (()+0xb596e) [0x7fb2f584896e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1e9) [0x7bc799]
10: (OSD::disconnect_session_watches(OSD::Session*)+0x917) [0x5bd3f7]
11: (OSD::ms_handle_reset(Connection*)+0x11a) [0x5bd53a]
12: (SimpleMessenger::DispatchQueue::entry()+0xe35) [0x799a95]
13: (SimpleMessenger::dispatch_entry()+0x27) [0x79a3b7]
14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x75889d]
15: (()+0x7e9a) [0x7fb2f6c14e9a]
16: (clone()+0x6d) [0x7fb2f4fb84bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- end dump of recent events ---


Related issues

Related to Ceph - Bug #2533: osd: watchers tracked by entity_name_t, not by cookie Duplicate 06/07/2012
Duplicated by Ceph - Bug #3612: disconnect_session_watches assert(pg) failed Duplicate 12/12/2012

History

#1 Updated by Samuel Just almost 8 years ago

  • Assignee set to Samuel Just

#2 Updated by Sage Weil almost 8 years ago

  • Status changed from New to Resolved

#3 Updated by Tamilarasi muthamizhan almost 8 years ago

  • Status changed from Resolved to In Progress

recent log: ubuntu@teuthology:/a/teuthology-2012-12-09_19:00:03-regression-master-testing-gcov/10911

#4 Updated by Sage Weil almost 8 years ago

I think the locking here is just broken. The obc always goes away at PG reset time when it is removed from the session watch list. This method grabs those pointers, drops, the lock, and then tries to take the PG.. but if that happens the PG or the obc could be gone because it didn't take a reference. Don't think it's trivially patched... this should get fixed along with the pending watch cleanup, #2533.

#5 Updated by Sage Weil almost 8 years ago

  • Status changed from In Progress to 12
  • Assignee deleted (Samuel Just)

#6 Updated by Sage Weil almost 8 years ago

  • Priority changed from Normal to Urgent

This has popped up again.

I think the easy fix is to make the map reference pg's and obc's by name, and do the lookups from the other direction after cribbing the locks. An annoying change, but cleaner than the fugly void* stuff anyway.

#7 Updated by Ian Colle almost 8 years ago

  • Assignee set to Samuel Just

#8 Updated by Sage Weil almost 8 years ago

  • Status changed from 12 to In Progress

#9 Updated by Sage Weil almost 8 years ago

  • Status changed from In Progress to 7

#10 Updated by Sage Weil almost 8 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF