Project

General

Profile

Actions

Bug #4253

closed

radosgw: segfault in lockdep register

Added by Sage Weil about 11 years ago. Updated about 10 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

this is failing in the master branch:

2013-02-23T11:26:22.325 INFO:teuthology.orchestra.run.err:*** Caught signal (Segmentation fault) **
2013-02-23T11:26:22.325 INFO:teuthology.orchestra.run.err: in thread 7f06d7caa780
2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: ceph version 0.57-493-g704db85 (704db850131643b26bafe6594946cacce483c171)
2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: 1: radosgw-admin() [0x44d93a]
2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: 2: (()+0xfcb0) [0x7f06d6906cb0]
2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: 3: (lockdep_register(char const*)+0x151) [0x7f06d701ab51]
2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: 4: (Mutex::Mutex(char const*, bool, bool, bool, CephContext*)+0x1a4) [0x7f06d6ec3374]
2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 5: (OSDMap::OSDMap()+0x446) [0x7f06d6e5de66]
2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 6: (librados::RadosClient::RadosClient(CephContext*)+0x53) [0x7f06d6e5af33]
2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 7: (rados_create_with_context()+0x31) [0x7f06d6e4a971]
2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 8: (RGWRados::initialize()+0x3d) [0x4af53d]
2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 9: (RGWCache<RGWRados>::initialize()+0x17) [0x4c67e7]
2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 10: (RGWStoreManager::init_storage_provider(CephContext*, bool)+0x2c9) [0x4b3119]
2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 11: (main()+0x10b9) [0x439bc9]
2013-02-23T11:26:22.335 INFO:teuthology.orchestra.run.err: 12: (__libc_start_main()+0xed) [0x7f06d525376d]
2013-02-23T11:26:22.335 INFO:teuthology.orchestra.run.err: 13: radosgw-admin() [0x441b41]
2013-02-23T11:26:22.335 INFO:teuthology.orchestra.run.err:2013-02-23 11:26:17.960912 7f06d7caa780 -1 *** Caught signal (Segmentation fault) **
2013-02-23T11:26:22.335 INFO:teuthology.orchestra.run.err: in thread 7f06d7caa780

job is
ubuntu@teuthology:/a/sage-2013-02-23_08:44:35-regression-master-testing-basic$ cat 10368/orig.config.yaml 
kernel:
  kdb: true
  sha1: 92a49fb0f79f3300e6e50ddf56238e70678e4202
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        lockdep: true
        ms inject socket failures: 5000
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 704db850131643b26bafe6594946cacce483c171
  s3tests:
    branch: master
  workunit:
    sha1: 704db850131643b26bafe6594946cacce483c171
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph: null
- rgw:
    client.0:
      valgrind:
      - --tool=memcheck
- swift:
    client.0:
      rgw_server: client.0

Actions #1

Updated by Sage Weil about 11 years ago

  • Assignee set to Yehuda Sadeh
Actions #2

Updated by Yehuda Sadeh about 11 years ago

Can't reproduce it locally, only using the specific binary package. Looks like some linking issue:

Breakpoint 3, common_init_finish (cct=0xed0070) at common/common_init.cc:111
111    {
(gdb) p &g_lockdep_ceph_ctx
$13 = (CephContext **) 0x72b328
(gdb) c
Continuing.
[New Thread 0x7ffff4b55700 (LWP 2282)]
[New Thread 0x7ffff4354700 (LWP 2283)]

Breakpoint 5, lockdep_register_ceph_context (cct=0xed0070) at common/lockdep.cc:61
61    {
(gdb) p &g_lockdep_ceph_ctx
$14 = (CephContext **) 0x72b328
(gdb) c
Continuing.
2013-02-25 12:32:48.551868 7ffff7fe6780  0 lockdep is enabled

Program received signal SIGSEGV, Segmentation fault.
lockdep_register (name=0x7ffff738e0e3 "CrushWrapper::mapper_lock") at common/lockdep.cc:118
warning: Source file is more recent than executable.
118        lockdep_dout(10) << "registered '" << name << "' as " << id << dendl;
(gdb) p &g_lockdep_ceph_ctx
$15 = (CephContext **) 0x7ffff7638228

Note the different address for g_lockdep_ceph_ctx.

(gdb) info files
Symbols from "/usr/bin/radosgw-admin".
Unix child process:
    Using the running image of child Thread 0x7ffff7fe6780 (LWP 2276).
    While running this, GDB does not access memory from...
Local exec file:
    `/usr/bin/radosgw-admin', file type elf64-x86-64.
    Entry point: 0x441b18
    0x0000000000400238 - 0x0000000000400254 is .interp
    0x0000000000400254 - 0x0000000000400274 is .note.ABI-tag
    0x0000000000400274 - 0x0000000000400298 is .note.gnu.build-id
    0x0000000000400298 - 0x0000000000404d14 is .gnu.hash
    0x0000000000404d18 - 0x0000000000413dc0 is .dynsym
    0x0000000000413dc0 - 0x0000000000432433 is .dynstr
...
    0x0000000000729740 - 0x0000000000eccde0 is .bss
...
    0x00007ffff738d6a0 - 0x00007ffff73ca7e8 is .rodata in /usr/lib/librados.so.2
Actions #3

Updated by Yehuda Sadeh about 11 years ago

Probably due to dual linkage with libcommon (libglobal -> libcommon, librados -> libcommon) whereas libglobal is linked statically and librados is linked dynamically. A quick workaround may be to create a new libglobal that doesn't depend on libcommon.

Actions #4

Updated by Sage Weil about 11 years ago

  • Priority changed from Urgent to High

I updated the suite to not run lockdep against radosgw for now.

Actions #5

Updated by Tamilarasi muthamizhan about 11 years ago

recent log: ubuntu@teuthology:/a/teuthology-2013-02-25_01:00:05-regression-master-testing-gcov/11592

Actions #6

Updated by Ian Colle about 11 years ago

  • Assignee deleted (Yehuda Sadeh)
Actions #7

Updated by Sage Weil about 11 years ago

  • Project changed from rgw to Ceph
Actions #8

Updated by Sage Weil almost 11 years ago

  • Priority changed from High to Normal
Actions #9

Updated by Sage Weil about 10 years ago

  • Status changed from 12 to Can't reproduce
Actions

Also available in: Atom PDF