Actions
Bug #4253
closedradosgw: segfault in lockdep register
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
this is failing in the master branch:
2013-02-23T11:26:22.325 INFO:teuthology.orchestra.run.err:*** Caught signal (Segmentation fault) ** 2013-02-23T11:26:22.325 INFO:teuthology.orchestra.run.err: in thread 7f06d7caa780 2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: ceph version 0.57-493-g704db85 (704db850131643b26bafe6594946cacce483c171) 2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: 1: radosgw-admin() [0x44d93a] 2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: 2: (()+0xfcb0) [0x7f06d6906cb0] 2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: 3: (lockdep_register(char const*)+0x151) [0x7f06d701ab51] 2013-02-23T11:26:22.333 INFO:teuthology.orchestra.run.err: 4: (Mutex::Mutex(char const*, bool, bool, bool, CephContext*)+0x1a4) [0x7f06d6ec3374] 2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 5: (OSDMap::OSDMap()+0x446) [0x7f06d6e5de66] 2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 6: (librados::RadosClient::RadosClient(CephContext*)+0x53) [0x7f06d6e5af33] 2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 7: (rados_create_with_context()+0x31) [0x7f06d6e4a971] 2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 8: (RGWRados::initialize()+0x3d) [0x4af53d] 2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 9: (RGWCache<RGWRados>::initialize()+0x17) [0x4c67e7] 2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 10: (RGWStoreManager::init_storage_provider(CephContext*, bool)+0x2c9) [0x4b3119] 2013-02-23T11:26:22.334 INFO:teuthology.orchestra.run.err: 11: (main()+0x10b9) [0x439bc9] 2013-02-23T11:26:22.335 INFO:teuthology.orchestra.run.err: 12: (__libc_start_main()+0xed) [0x7f06d525376d] 2013-02-23T11:26:22.335 INFO:teuthology.orchestra.run.err: 13: radosgw-admin() [0x441b41] 2013-02-23T11:26:22.335 INFO:teuthology.orchestra.run.err:2013-02-23 11:26:17.960912 7f06d7caa780 -1 *** Caught signal (Segmentation fault) ** 2013-02-23T11:26:22.335 INFO:teuthology.orchestra.run.err: in thread 7f06d7caa780
job is
ubuntu@teuthology:/a/sage-2013-02-23_08:44:35-regression-master-testing-basic$ cat 10368/orig.config.yaml kernel: kdb: true sha1: 92a49fb0f79f3300e6e50ddf56238e70678e4202 nuke-on-error: true overrides: ceph: conf: global: lockdep: true ms inject socket failures: 5000 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: 704db850131643b26bafe6594946cacce483c171 s3tests: branch: master workunit: sha1: 704db850131643b26bafe6594946cacce483c171 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock: null - install: null - ceph: null - rgw: client.0: valgrind: - --tool=memcheck - swift: client.0: rgw_server: client.0
Updated by Yehuda Sadeh about 11 years ago
Can't reproduce it locally, only using the specific binary package. Looks like some linking issue:
Breakpoint 3, common_init_finish (cct=0xed0070) at common/common_init.cc:111 111 { (gdb) p &g_lockdep_ceph_ctx $13 = (CephContext **) 0x72b328 (gdb) c Continuing. [New Thread 0x7ffff4b55700 (LWP 2282)] [New Thread 0x7ffff4354700 (LWP 2283)] Breakpoint 5, lockdep_register_ceph_context (cct=0xed0070) at common/lockdep.cc:61 61 { (gdb) p &g_lockdep_ceph_ctx $14 = (CephContext **) 0x72b328 (gdb) c Continuing. 2013-02-25 12:32:48.551868 7ffff7fe6780 0 lockdep is enabled Program received signal SIGSEGV, Segmentation fault. lockdep_register (name=0x7ffff738e0e3 "CrushWrapper::mapper_lock") at common/lockdep.cc:118 warning: Source file is more recent than executable. 118 lockdep_dout(10) << "registered '" << name << "' as " << id << dendl; (gdb) p &g_lockdep_ceph_ctx $15 = (CephContext **) 0x7ffff7638228
Note the different address for g_lockdep_ceph_ctx.
(gdb) info files Symbols from "/usr/bin/radosgw-admin". Unix child process: Using the running image of child Thread 0x7ffff7fe6780 (LWP 2276). While running this, GDB does not access memory from... Local exec file: `/usr/bin/radosgw-admin', file type elf64-x86-64. Entry point: 0x441b18 0x0000000000400238 - 0x0000000000400254 is .interp 0x0000000000400254 - 0x0000000000400274 is .note.ABI-tag 0x0000000000400274 - 0x0000000000400298 is .note.gnu.build-id 0x0000000000400298 - 0x0000000000404d14 is .gnu.hash 0x0000000000404d18 - 0x0000000000413dc0 is .dynsym 0x0000000000413dc0 - 0x0000000000432433 is .dynstr ... 0x0000000000729740 - 0x0000000000eccde0 is .bss ... 0x00007ffff738d6a0 - 0x00007ffff73ca7e8 is .rodata in /usr/lib/librados.so.2
Updated by Yehuda Sadeh about 11 years ago
Probably due to dual linkage with libcommon (libglobal -> libcommon, librados -> libcommon) whereas libglobal is linked statically and librados is linked dynamically. A quick workaround may be to create a new libglobal that doesn't depend on libcommon.
Updated by Sage Weil about 11 years ago
- Priority changed from Urgent to High
I updated the suite to not run lockdep against radosgw for now.
Updated by Tamilarasi muthamizhan about 11 years ago
recent log: ubuntu@teuthology:/a/teuthology-2013-02-25_01:00:05-regression-master-testing-gcov/11592
Updated by Sage Weil about 10 years ago
- Status changed from 12 to Can't reproduce
Actions