Project

General

Profile

Actions

Bug #2084

closed

segfault in tcmalloc

Added by Sage Weil about 12 years ago. Updated almost 12 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

heap corruption?


(gdb) bt
#0  0x00007f844f073a0b in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x00000000009bce23 in reraise_fatal (signum=11) at global/signal_handler.cc:59
#2  0x00000000009bcfa7 in handle_fatal_signal (signum=11) at global/signal_handler.cc:95
#3  <signal handler called>
#4  0x00007f844d3fdd52 in ?? () from /usr/lib/libunwind.so.7
#5  0x00007f844d3fbc75 in ?? () from /usr/lib/libunwind.so.7
#6  0x00007f844d3fc24c in ?? () from /usr/lib/libunwind.so.7
#7  0x00007f844d3fc409 in ?? () from /usr/lib/libunwind.so.7
#8  0x00007f844d3fe6ea in _ULx86_64_step () from /usr/lib/libunwind.so.7
#9  0x00007f844e15ba3b in GetStackTrace(void**, int, int) () from /usr/lib/libtcmalloc.so.0
#10 0x00007f844e142fb5 in ?? () from /usr/lib/libtcmalloc.so.0
#11 0x00007f844e15fe44 in tc_new () from /usr/lib/libtcmalloc.so.0
#12 0x00000000009dfd3c in __gnu_cxx::new_allocator<std::_List_node<Message*> >::allocate(unsigned long, void const*) ()
#13 0x00000000009de790 in std::_List_base<Message*, std::allocator<Message*> >::_M_get_node() ()
#14 0x00000000009dc055 in std::list<Message*, std::allocator<Message*> >::_M_create_node(Message* const&) ()
#15 0x00000000009da277 in std::list<Message*, std::allocator<Message*> >::_M_insert(std::_List_iterator<Message*>, Message* const&) ()
#16 0x00000000009d9224 in std::list<Message*, std::allocator<Message*> >::push_back(Message* const&) ()
#17 0x00000000009ce1bf in SimpleMessenger::Pipe::writer (this=0x2e1cc80) at msg/SimpleMessenger.cc:1764
#18 0x000000000077142c in SimpleMessenger::Pipe::Writer::entry (this=0x2e1cec8) at msg/SimpleMessenger.h:173
#19 0x00000000008cb311 in Thread::_entry_func (arg=0x2e1cec8) at common/Thread.cc:41
#20 0x00007f844f06b971 in start_thread (arg=<value optimized out>) at pthread_create.c:304
#21 0x00007f844d6f692d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#22 0x0000000000000000 in ?? ()

saw this yesterday, too.

Actions #1

Updated by Sage Weil about 12 years ago

and again (hammer b.yaml). right before the crash sched_scrub() was called...


2012-02-21 04:26:17.470245 7fbc67b7b700 -- 10.3.14.199:0/17191 <== osd.3 10.3.14.199:6802/17190 4069 ==== osd_ping(heartbeat e2072 as_of 2059) v1 ==== 39+0+0 (1304125321 0 0) 0x3edfa80 con 0x3085b40
2012-02-21 04:26:17.474387 7fbc6074d700 osd.4 2072 OSD::ms_get_authorizer type=osd
2012-02-21 04:26:17.474639 7fbc6074d700 cephx: verify_authorizer_reply exception in decode_decrypt with AQBYhENPSLe4EBAAZQ1y/Gvw1kmkJukiQpHlwQ==
2012-02-21 04:26:17.474658 7fbc6074d700 -- 10.3.14.199:6807/17191 >> 10.3.14.199:6803/17188 pipe(0x30d2500 sd=44 pgs=0 cs=0 l=0).failed verifying authorize reply
2012-02-21 04:26:17.523352 7fbc6d386700 journal do_write latency 0.292478
2012-02-21 04:26:17.523371 7fbc6d386700 journal do_write queueing finishers through seq 119056
2012-02-21 04:26:17.523385 7fbc6d386700 journal queue_completions_thru seq 119056 queueing seq 119056 0x358ba20 lat 0.292744
2012-02-21 04:26:17.523416 7fbc6d386700 journal put_throttle finished 1 ops and 160 bytes, now 0 ops and 0 bytes
2012-02-21 04:26:17.523432 7fbc6d386700 journal write_thread_entry going to sleep
2012-02-21 04:26:17.523468 7fbc6c384700 filestore(/tmp/cephtest/data/osd.4.data) _journaled_ahead 119056 0x37cf500
2012-02-21 04:26:17.523482 7fbc6c384700 journal op_apply_start 119056 open_ops 0 -> 1
2012-02-21 04:26:17.523497 7fbc6c384700 filestore(/tmp/cephtest/data/osd.4.data) queue_op 0x7aadb40 seq 119056 155 bytes   (queue has 1 ops and 155 bytes)
2012-02-21 04:26:17.523556 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) _do_op 0x7aadb40 119056 osr 0x304e0f0/0x3056770 start
2012-02-21 04:26:17.523587 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) _do_transaction on 0x37cf500
2012-02-21 04:26:17.523642 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) remove meta/1f9f1b4e/pglog_0.0p2/0
2012-02-21 04:26:17.551663 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) remove meta/1f9f1b4e/pglog_0.0p2/0 = 0
2012-02-21 04:26:17.551688 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) remove meta/a04c80d2/pginfo_0.0p2/0
2012-02-21 04:26:17.551840 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) remove meta/a04c80d2/pginfo_0.0p2/0 = 0
2012-02-21 04:26:17.551855 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) _destroy_collection /tmp/cephtest/data/osd.4.data/current/0.0p2_head
2012-02-21 04:26:17.551948 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) _destroy_collection /tmp/cephtest/data/osd.4.data/current/0.0p2_head = 0
2012-02-21 04:26:17.551960 7fbc6b382700 journal op_apply_finish 119056 open_ops 1 -> 0
2012-02-21 04:26:17.551971 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) _do_op 0x7aadb40 119056 r = 0, finisher 0x36fb300 0
2012-02-21 04:26:17.551982 7fbc6b382700 filestore(/tmp/cephtest/data/osd.4.data) _finish_op on osr 0x304e0f0/0x3056770
2012-02-21 04:26:17.733734 7fbc7038c700 osd.4 2072 tick
2012-02-21 04:26:17.733833 7fbc7038c700 osd.4 2072 scrub_should_schedule loadavg 3.66 < max 5 = yes
2012-02-21 04:26:17.733845 7fbc7038c700 osd.4 2072 sched_scrub
2012-02-21 04:26:17.733862 7fbc7038c700 osd.4 2072  on 2012-02-21 04:16:11.508702 2.e
2012-02-21 04:26:17.733888 7fbc7038c700 osd.4 2072  2.1p0 at 2012-02-21 04:16:35.823932 > 2012-02-21 04:16:17.733854 (600 seconds ago)
2012-02-21 04:26:17.733898 7fbc7038c700 osd.4 2072 sched_scrub done
 ceph version 0.42-69-g9927671 (commit:9927671b3ddce5c3edaa6be00ef2e8923aea6e6b)

Actions #2

Updated by Sage Weil about 12 years ago

  • Priority changed from High to Normal
Actions #3

Updated by Sage Weil about 12 years ago

  • Target version changed from v0.43 to v0.44
Actions #4

Updated by Sage Weil almost 12 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF