Project

General

Profile

Actions

Bug #11055

closed

firefly ceph-osd SEGV in tcmalloc testing wip-cot-firefly

Added by David Zafman about 9 years ago. Updated almost 9 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

One of the things related to my testing is that 3.3s2 was exported, removed and reimported then ceph-osd 5 was starting again. I don't see any relationship to the segmentation fault, but I show the action on that pg somewhat before the core dump. The thread that crashed seems to be finishing a flush in pg 2.39.

Could this be a tcmalloc bug?

2015-03-06 10:33:54.534707 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] state<Started>: Started advmap
2015-03-06 10:33:54.534717 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] check_recovery_sources no source osds () went down
2015-03-06 10:33:54.534728 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] handle_activate_map
2015-03-06 10:33:54.534736 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] take_waiters
2015-03-06 10:33:54.534743 7fc59e293700 20 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] handle_activate_map: Not dirtying info: last_persisted is 298 while current is 303
2015-03-06 10:33:54.534750 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] handle_peering_event: epoch_sent: 303 epoch_requested: 303 NullEvt
...
2015-03-06 10:33:54.554075 7fc5a81e1700 10 osd.5 pg_epoch: 303 pg[2.39( empty local-les=302 n=0 ec=1 les/c 302/302 303/303/256) [1,2] r=-1 lpr=303 pi=256-302/2 crt=0'0 inactive NOTIFY] flushed
...
2015-03-06 10:33:54.556541 7fc5a81e1700 -1 ** Caught signal (Segmentation fault) *
in thread 7fc5a81e1700

ceph version 0.80.8-180-g91b2aca (91b2acaadee1b62c1fcac73147908ec4477840f3)
1: ceph-osd() [0x98284f]
2: (()+0x10340) [0x7fc5b417d340]
3: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x103) [0x7fc5b43aeac3]
4: (tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long)+0x1b) [0x7fc5b43aeb7b]
5: (operator delete(void*)+0x1f8) [0x7fc5b43bdf68]
6: (std::_Rb_tree&lt;std::pair&lt;unsigned long, unsigned long&gt;, std::pair&lt;std::pair&lt;unsigned long, unsigned long&gt; const, std::pair&lt;unsigned int, unsigned int&gt; >, std::_Select1st&lt;std::pair&lt;std::pair&lt;unsigned long, unsigned long&gt; const, std::pair&lt;unsigned int, unsigned int&gt; > >, std::less&lt;std::pair&lt;unsigned long, unsigned long&gt; >, std::allocator&lt;std::pair&lt;std::pair&lt;unsigned long, unsigned long&gt; const, std::pair&lt;unsigned int, unsigned int&gt; > > >::_M_erase(std::_Rb_tree_node&lt;std::pair&lt;std::pair&lt;unsigned long, unsigned long&gt; const, std::pair&lt;unsigned int, unsigned int&gt; > >)+0x33) [0xa70083]
7: (std::_Rb_tree&lt;std::pair&lt;unsigned long, unsigned long&gt;, std::pair&lt;std::pair&lt;unsigned long, unsigned long&gt; const, std::pair&lt;unsigned int, unsigned int&gt; >, std::_Select1st&lt;std::pair&lt;std::pair&lt;unsigned long, unsigned long&gt; const, std::pair&lt;unsigned int, unsigned int&gt; > >, std::less&lt;std::pair&lt;unsigned long, unsigned long&gt; >, std::allocator&lt;std::pair&lt;std::pair&lt;unsigned long, unsigned long&gt; const, std::pair&lt;unsigned int, unsigned int&gt; > > >::_M_erase(std::_Rb_tree_node&lt;std::pair&lt;std::pair&lt;unsigned long, unsigned long&gt; const, std::pair&lt;unsigned int, unsigned int&gt; > >
)+0x27) [0xa70077]
8: (ceph::buffer::raw_posix_aligned::~raw_posix_aligned()+0x46) [0xa70916]
9: (ceph::buffer::ptr::release()+0x3e) [0xa6a57e]
10: (std::_List_base&lt;ceph::buffer::ptr, std::allocator&lt;ceph::buffer::ptr&gt; >::_M_clear()+0x27) [0x5f53c7]
11: (ObjectStore::C_DeleteTransaction::finish(int)+0x12) [0x665102]
12: (Context::complete(int)+0x9) [0x655a09]
13: (finish_contexts(CephContext*, std::list&lt;Context*, std::allocator&lt;Context*&gt; >&, int)+0x94) [0x65dbf4]
14: (Context::complete(int)+0x9) [0x655a09]
15: (Finisher::finisher_thread_entry()+0x1b8) [0x9a5998]
16: (()+0x8182) [0x7fc5b4175182]
17: (clone()+0x6d) [0x7fc5b28e7fbd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #9584: OpTracker segfault on shutdown (firefly)Can't reproduce09/24/2014

Actions
Actions #1

Updated by David Zafman about 9 years ago

Information in

ubuntu@teuthology:/a/dzafman-2015-03-06_10:19:48-rados-wip-cot-firefly---basic-multi/793119

Actions #2

Updated by Loïc Dachary almost 9 years ago

  • Status changed from New to Need More Info
  • Regression set to No

did it show up again ?

Actions #3

Updated by Sage Weil almost 9 years ago

  • Status changed from Need More Info to Duplicate
Actions

Also available in: Atom PDF