Bug #11055
closedfirefly ceph-osd SEGV in tcmalloc testing wip-cot-firefly
0%
Description
One of the things related to my testing is that 3.3s2 was exported, removed and reimported then ceph-osd 5 was starting again. I don't see any relationship to the segmentation fault, but I show the action on that pg somewhat before the core dump. The thread that crashed seems to be finishing a flush in pg 2.39.
Could this be a tcmalloc bug?
2015-03-06 10:33:54.534707 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] state<Started>: Started advmap
2015-03-06 10:33:54.534717 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] check_recovery_sources no source osds () went down
2015-03-06 10:33:54.534728 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] handle_activate_map
2015-03-06 10:33:54.534736 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] take_waiters
2015-03-06 10:33:54.534743 7fc59e293700 20 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] handle_activate_map: Not dirtying info: last_persisted is 298 while current is 303
2015-03-06 10:33:54.534750 7fc59e293700 10 osd.5 pg_epoch: 303 pg[3.3s2( v 215'115 (0'0,215'115] local-les=298 n=1 ec=9 les/c 298/298 297/297/9) [4,3,5] r=2 lpr=297 pi=231-296/2 luod=0'0 crt=215'115 active] handle_peering_event: epoch_sent: 303 epoch_requested: 303 NullEvt
...
2015-03-06 10:33:54.554075 7fc5a81e1700 10 osd.5 pg_epoch: 303 pg[2.39( empty local-les=302 n=0 ec=1 les/c 302/302 303/303/256) [1,2] r=-1 lpr=303 pi=256-302/2 crt=0'0 inactive NOTIFY] flushed
...
2015-03-06 10:33:54.556541 7fc5a81e1700 -1 ** Caught signal (Segmentation fault) *
in thread 7fc5a81e1700
ceph version 0.80.8-180-g91b2aca (91b2acaadee1b62c1fcac73147908ec4477840f3)
1: ceph-osd() [0x98284f]
2: (()+0x10340) [0x7fc5b417d340]
3: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0x103) [0x7fc5b43aeac3]
4: (tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long)+0x1b) [0x7fc5b43aeb7b]
5: (operator delete(void*)+0x1f8) [0x7fc5b43bdf68]
6: (std::_Rb_tree<std::pair<unsigned long, unsigned long>, std::pair<std::pair<unsigned long, unsigned long> const, std::pair<unsigned int, unsigned int> >, std::_Select1st<std::pair<std::pair<unsigned long, unsigned long> const, std::pair<unsigned int, unsigned int> > >, std::less<std::pair<unsigned long, unsigned long> >, std::allocator<std::pair<std::pair<unsigned long, unsigned long> const, std::pair<unsigned int, unsigned int> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::pair<unsigned long, unsigned long> const, std::pair<unsigned int, unsigned int> > >)+0x33) [0xa70083]
7: (std::_Rb_tree<std::pair<unsigned long, unsigned long>, std::pair<std::pair<unsigned long, unsigned long> const, std::pair<unsigned int, unsigned int> >, std::_Select1st<std::pair<std::pair<unsigned long, unsigned long> const, std::pair<unsigned int, unsigned int> > >, std::less<std::pair<unsigned long, unsigned long> >, std::allocator<std::pair<std::pair<unsigned long, unsigned long> const, std::pair<unsigned int, unsigned int> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::pair<unsigned long, unsigned long> const, std::pair<unsigned int, unsigned int> > >)+0x27) [0xa70077]
8: (ceph::buffer::raw_posix_aligned::~raw_posix_aligned()+0x46) [0xa70916]
9: (ceph::buffer::ptr::release()+0x3e) [0xa6a57e]
10: (std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()+0x27) [0x5f53c7]
11: (ObjectStore::C_DeleteTransaction::finish(int)+0x12) [0x665102]
12: (Context::complete(int)+0x9) [0x655a09]
13: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x94) [0x65dbf4]
14: (Context::complete(int)+0x9) [0x655a09]
15: (Finisher::finisher_thread_entry()+0x1b8) [0x9a5998]
16: (()+0x8182) [0x7fc5b4175182]
17: (clone()+0x6d) [0x7fc5b28e7fbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by David Zafman about 9 years ago
Information in
ubuntu@teuthology:/a/dzafman-2015-03-06_10:19:48-rados-wip-cot-firefly---basic-multi/793119
Updated by Loïc Dachary almost 9 years ago
- Status changed from New to Need More Info
- Regression set to No
did it show up again ?
Updated by Sage Weil almost 9 years ago
- Status changed from Need More Info to Duplicate