Bug #3221
closeddisconnect_session_watchers missing pg
0%
Description
/a/teuthology-2012-09-24_04:00:04-regression-stable-master-basic/28971/remote
--- end dump of recent events ---
2012-09-24 17:03:56.764890 7fb2e9495700 -1 ** Caught signal (Aborted) *
in thread 7fb2e9495700
ceph version 0.48.1argonaut-37-g3e02b2f (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x6fd971]
2: (()+0xfcb0) [0x7fb2f6c1ccb0]
3: (gsignal()+0x35) [0x7fb2f4efc445]
4: (abort()+0x17b) [0x7fb2f4effbab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7fb2f584a69d]
6: (()+0xb5846) [0x7fb2f5848846]
7: (()+0xb5873) [0x7fb2f5848873]
8: (()+0xb596e) [0x7fb2f584896e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1e9) [0x7bc799]
10: (OSD::disconnect_session_watches(OSD::Session*)+0x917) [0x5bd3f7]
11: (OSD::ms_handle_reset(Connection*)+0x11a) [0x5bd53a]
12: (SimpleMessenger::DispatchQueue::entry()+0xe35) [0x799a95]
13: (SimpleMessenger::dispatch_entry()+0x27) [0x79a3b7]
14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x75889d]
15: (()+0x7e9a) [0x7fb2f6c14e9a]
16: (clone()+0x6d) [0x7fb2f4fb84bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
0> 2012-09-24 17:03:56.764890 7fb2e9495700 -1 ** Caught signal (Aborted) *
in thread 7fb2e9495700
ceph version 0.48.1argonaut-37-g3e02b2f (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x6fd971]
2: (()+0xfcb0) [0x7fb2f6c1ccb0]
3: (gsignal()+0x35) [0x7fb2f4efc445]
4: (abort()+0x17b) [0x7fb2f4effbab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7fb2f584a69d]
6: (()+0xb5846) [0x7fb2f5848846]
7: (()+0xb5873) [0x7fb2f5848873]
8: (()+0xb596e) [0x7fb2f584896e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1e9) [0x7bc799]
10: (OSD::disconnect_session_watches(OSD::Session*)+0x917) [0x5bd3f7]
11: (OSD::ms_handle_reset(Connection*)+0x11a) [0x5bd53a]
12: (SimpleMessenger::DispatchQueue::entry()+0xe35) [0x799a95]
13: (SimpleMessenger::dispatch_entry()+0x27) [0x79a3b7]
14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x75889d]
15: (()+0x7e9a) [0x7fb2f6c14e9a]
16: (clone()+0x6d) [0x7fb2f4fb84bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- end dump of recent events ---
Updated by Tamilarasi muthamizhan over 11 years ago
- Status changed from Resolved to In Progress
recent log: ubuntu@teuthology:/a/teuthology-2012-12-09_19:00:03-regression-master-testing-gcov/10911
Updated by Sage Weil over 11 years ago
I think the locking here is just broken. The obc always goes away at PG reset time when it is removed from the session watch list. This method grabs those pointers, drops, the lock, and then tries to take the PG.. but if that happens the PG or the obc could be gone because it didn't take a reference. Don't think it's trivially patched... this should get fixed along with the pending watch cleanup, #2533.
Updated by Sage Weil over 11 years ago
- Status changed from In Progress to 12
- Assignee deleted (
Samuel Just)
Updated by Sage Weil over 11 years ago
- Priority changed from Normal to Urgent
This has popped up again.
I think the easy fix is to make the map reference pg's and obc's by name, and do the lookups from the other direction after cribbing the locks. An annoying change, but cleaner than the fugly void* stuff anyway.