Bug #5432
msgr: bad locking mark_down_all
Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2013-06-23 02:01:16.973299 ab43700 -1 osd.2 29 *** Got signal Terminated *** 2013-06-23 02:01:17.160994 ab43700 -1 *** Caught signal (Segmentation fault) ** in thread ab43700 ceph version 0.64-607-gb89d742 (b89d7420e3501247d6ed282d2253c95c758526b1) 1: ceph-osd() [0x7ee8ca] 2: (()+0xfcb0) [0x5043cb0] 3: (OSDService::prepare_to_stop()+0x98) [0x654b18] 4: (OSD::shutdown()+0x28) [0x663b18] 5: (OSD::handle_signal(int)+0x118) [0x665638] 6: (SignalHandler::entry()+0x1ac) [0x7ef76c] 7: (()+0x7e9a) [0x503be9a] 8: (clone()+0x6d) [0x6c71ccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
job is
ubuntu@teuthology:/a/teuthology-2013-06-23_01:00:12-rados-master-testing-basic/43249$ cat orig.config.yaml kernel: kdb: true sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32 machine_type: plana nuke-on-error: true overrides: admin_socket: branch: master ceph: conf: global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 20 debug paxos: 20 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: b89d7420e3501247d6ed282d2253c95c758526b1 valgrind: mds: - --tool=memcheck mon: - --tool=memcheck - --leak-check=full - --show-reachable=yes osd: - --tool=memcheck install: ceph: flavor: notcmalloc sha1: b89d7420e3501247d6ed282d2253c95c758526b1 s3tests: branch: master workunit: sha1: b89d7420e3501247d6ed282d2253c95c758526b1 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - mon_recovery: null
History
#1 Updated by Sage Weil almost 11 years ago
- Status changed from 12 to Fix Under Review
wip-msgr fixes this already, needs review!
#2 Updated by Sage Weil almost 11 years ago
- Subject changed from osd: segfault in prepare_to_stop to msgr: bad locking mark_down_all
#3 Updated by Greg Farnum almost 11 years ago
- Status changed from Fix Under Review to 7
Merged into master with 134d08a9654f66634b893d493e4a92f38acc63cf. Does wip-msgr need any backports? I think they're small enough we could if we want to.
#4 Updated by Sage Weil almost 11 years ago
- Status changed from 7 to Pending Backport
- Priority changed from Urgent to High
the crash was from the earlier changes that are only in master, and this whole series is just to fix Connection and *Session leaks, so i'm not in a hurry to backport until it's seen more testing in master/next. eventually it's probably a good idea, though!
#5 Updated by Greg Farnum over 10 years ago
It's been 23 days; time to backport or decide we don't need to?
#6 Updated by Sage Weil over 10 years ago
- Status changed from Pending Backport to Resolved
ooof, just looked and it's gonna be like 20 patches. i'm thinking we skip it, unless the more important reconnect stuff depends on it.