Project

General

Profile

Actions

Bug #2536

closed

librados crashed while getting stat of an object

Added by Xiaopong Tran almost 12 years ago. Updated over 11 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
librados
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

librados crashed while getting stat of an object:

./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread 7f52adaf3700 time 2012-06-11 18:33:47.455897
./log/SubsystemMap.h: 74: FAILED assert(sub < m_subsys.size())
ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
1: (()+0xebbc0) [0x7f5298a92bc0]
2: (librados::IoCtxImpl::stat(object_t const&, unsigned long*, long*)+0x597) [0x7f5298ab2a87]
3: (rados_stat()+0x3a) [0x7f5298a975fa]
4: (x_stat(enif_environment_t*, int, unsigned long const*)+0x1cd) [0x7f5298f11f65]
5: (process_main()+0x4f32) [0x544122]
6: /usr/local/erlang/lib/erlang/erts-5.9.1/bin/beam.smp() [0x4a7d08]
7: /usr/local/erlang/lib/erlang/erts-5.9.1/bin/beam.smp() [0x5bcb20]
8: (()+0x7e9a) [0x7f52b378be9a]
9: (clone()+0x6d) [0x7f52b32b14bd]

NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted (core dumped)

Attached is the objdump file.


Files

objdump.txt (1.24 MB) objdump.txt Xiaopong Tran, 06/11/2012 04:01 AM
objdump.txt (8.24 MB) objdump.txt Benjamin Schulz, 11/25/2012 06:36 PM
Actions #1

Updated by Sage Weil almost 12 years ago

Have you seen this problem since then? It looks like it could be due to racing with rados startup or shutdown...

Actions #2

Updated by Sage Weil almost 12 years ago

  • Status changed from New to Need More Info
Actions #3

Updated by Sage Weil over 11 years ago

  • Priority changed from High to Normal
Actions #4

Updated by Sage Weil over 11 years ago

  • Status changed from Need More Info to Can't reproduce
Actions #5

Updated by Benjamin Schulz over 11 years ago

Hi,

I got the same assertion:

radosgw-admin user create
./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread 7f8fa8546760 time 2012-11-26 02:41:55.267779
./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size())
ceph version 0.48.1argonaut (a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
1: (()+0x17c3a7) [0x7f8fa770f3a7]
2: (MonClient::build_initial_monmap()+0x1e9) [0x7f8fa77da249]
3: (librados::RadosClient::connect()+0x48) [0x7f8fa77231b8]
4: (RGWRados::initialize()+0x49) [0x48ece9]
5: (RGWCache<RGWRados>::initialize()+0x17) [0x4a31e7]
6: (RGWRados::init_storage_provider(CephContext*)+0x30) [0x48ebe0]
7: (main()+0xfd0) [0x42c8c0]
8: (__libc_start_main()+0xfd) [0x7f8fa5cbcead]
9: radosgw-admin() [0x432761]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
  • Caught signal (Segmentation fault) *
    in thread 7f8fa8546760
    ceph version 0.48.1argonaut (a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
    1: radosgw-admin() [0x43d8b2]
    2: (()+0xf030) [0x7f8fa715c030]
    3: (ceph::__ceph_assert_fail(char const
    , char const*, int, char const*)+0x38f) [0x445edf]
    4: (()+0x17c3a7) [0x7f8fa770f3a7]
    5: (MonClient::build_initial_monmap()+0x1e9) [0x7f8fa77da249]
    6: (librados::RadosClient::connect()+0x48) [0x7f8fa77231b8]
    7: (RGWRados::initialize()+0x49) [0x48ece9]
    8: (RGWCache<RGWRados>::initialize()+0x17) [0x4a31e7]
    9: (RGWRados::init_storage_provider(CephContext*)+0x30) [0x48ebe0]
    10: (main()+0xfd0) [0x42c8c0]
    11: (__libc_start_main()+0xfd) [0x7f8fa5cbcead]
    12: radosgw-admin() [0x432761]
    Segmentation fault

I'm running v0.48.1 on debian wheezy. The system is setup in two VMs, I can reproduce it every time. Contact me, if you're interested in the VM-Images.

best Regards
-- Benjamin

Actions #6

Updated by Greg Farnum over 11 years ago

Hey Benjamin, this is the same assert but quite a different call chain — can you create a new bug? Preferably with some logs of the crash?

Actions

Also available in: Atom PDF