Project

General

Profile

Bug #5432

msgr: bad locking mark_down_all

Added by Sage Weil almost 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-06-23 02:01:16.973299 ab43700 -1 osd.2 29 *** Got signal Terminated ***
2013-06-23 02:01:17.160994 ab43700 -1 *** Caught signal (Segmentation fault) **
 in thread ab43700

 ceph version 0.64-607-gb89d742 (b89d7420e3501247d6ed282d2253c95c758526b1)
 1: ceph-osd() [0x7ee8ca]
 2: (()+0xfcb0) [0x5043cb0]
 3: (OSDService::prepare_to_stop()+0x98) [0x654b18]
 4: (OSD::shutdown()+0x28) [0x663b18]
 5: (OSD::handle_signal(int)+0x118) [0x665638]
 6: (SignalHandler::entry()+0x1ac) [0x7ef76c]
 7: (()+0x7e9a) [0x503be9a]
 8: (clone()+0x6d) [0x6c71ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job is
ubuntu@teuthology:/a/teuthology-2013-06-23_01:00:12-rados-master-testing-basic/43249$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: b89d7420e3501247d6ed282d2253c95c758526b1
    valgrind:
      mds:
      - --tool=memcheck
      mon:
      - --tool=memcheck
      - --leak-check=full
      - --show-reachable=yes
      osd:
      - --tool=memcheck
  install:
    ceph:
      flavor: notcmalloc
      sha1: b89d7420e3501247d6ed282d2253c95c758526b1
  s3tests:
    branch: master
  workunit:
    sha1: b89d7420e3501247d6ed282d2253c95c758526b1
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- mon_recovery: null

History

#1 Updated by Sage Weil almost 11 years ago

  • Status changed from 12 to Fix Under Review

wip-msgr fixes this already, needs review!

#2 Updated by Sage Weil almost 11 years ago

  • Subject changed from osd: segfault in prepare_to_stop to msgr: bad locking mark_down_all

#3 Updated by Greg Farnum almost 11 years ago

  • Status changed from Fix Under Review to 7

Merged into master with 134d08a9654f66634b893d493e4a92f38acc63cf. Does wip-msgr need any backports? I think they're small enough we could if we want to.

#4 Updated by Sage Weil almost 11 years ago

  • Status changed from 7 to Pending Backport
  • Priority changed from Urgent to High

the crash was from the earlier changes that are only in master, and this whole series is just to fix Connection and *Session leaks, so i'm not in a hurry to backport until it's seen more testing in master/next. eventually it's probably a good idea, though!

#5 Updated by Greg Farnum over 10 years ago

It's been 23 days; time to backport or decide we don't need to?

#6 Updated by Sage Weil over 10 years ago

  • Status changed from Pending Backport to Resolved

ooof, just looked and it's gonna be like 20 patches. i'm thinking we skip it, unless the more important reconnect stuff depends on it.

Also available in: Atom PDF