Project

General

Profile

Bug #5624

osd: prepare_to_stop() segfaults if we get a signal during startup

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

we probably need to delay the signal installation?

2013-07-13 02:04:37.379866 4035cc0  0 filestore(/var/lib/ceph/osd/ceph-2) mount FIEMAP ioctl is supported and appears to work
2013-07-13 02:04:37.379980 4035cc0  0 filestore(/var/lib/ceph/osd/ceph-2) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-07-13 02:04:37.380997 4035cc0  0 filestore(/var/lib/ceph/osd/ceph-2) mount detected btrfs
2013-07-13 02:04:37.381166 4035cc0  0 filestore(/var/lib/ceph/osd/ceph-2) mount btrfs CLONE_RANGE ioctl is supported
2013-07-13 02:04:37.496285 4035cc0  0 filestore(/var/lib/ceph/osd/ceph-2) mount btrfs SNAP_CREATE is supported
2013-07-13 02:04:37.498189 af43700 -1 osd.2 0 *** Got signal Terminated ***
2013-07-13 02:04:37.687195 af43700 -1 *** Caught signal (Segmentation fault) **
 in thread af43700

 ceph version 0.66-579-gbf4f802 (bf4f8024ba39c7a44258b5035698efa796300727)
 1: ceph-osd() [0x7fe0aa]
 2: (()+0xfcb0) [0x5043cb0]
 3: (OSDService::prepare_to_stop()+0x98) [0x664798]
 4: (OSD::shutdown()+0x28) [0x672fa8]
 5: (OSD::handle_signal(int)+0x118) [0x674ac8]
 6: (SignalHandler::entry()+0x1ac) [0x7fef4c]
 7: (()+0x7e9a) [0x503be9a]
 8: (clone()+0x6d) [0x6c71ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job was
ubuntu@teuthology:/a/teuthology-2013-07-13_01:00:18-rados-next-testing-basic/65369$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: bf4f8024ba39c7a44258b5035698efa796300727
    valgrind:
      mds:
      - --tool=memcheck
      mon:
      - --tool=memcheck
      - --leak-check=full
      - --show-reachable=yes
      osd:
      - --tool=memcheck
  install:
    ceph:
      flavor: notcmalloc
      sha1: bf4f8024ba39c7a44258b5035698efa796300727
  s3tests:
    branch: next
  workunit:
    sha1: bf4f8024ba39c7a44258b5035698efa796300727
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- workunit:
    clients:
      client.0:
      - rados/test.sh

History

#1 Updated by Sage Weil over 10 years ago

ubuntu@teuthology:/a/sage-2013-07-17_16:50:00-osdleaks-wip-osd-leaks-testing-basic/70827

#2 Updated by Sage Weil over 10 years ago

  • Assignee set to David Zafman

#3 Updated by David Zafman over 10 years ago

  • Status changed from New to Fix Under Review

In wip-4624

#4 Updated by Sage Weil over 10 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF