Project

General

Profile

Actions

Bug #5139

closed

Seg fault if listsnaps request with missing clones

Added by David Zafman almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After doing this to cloned object called obj2:

$ find dev name obj2_* -ls
660758 8 -rw-r--r-
1 dzafman dzafman 13 May 21 16:33 dev/osd3/current/3.0_head/obj2__head_3F1EE208__3
667244 8 rw-r--r- 1 dzafman dzafman 13 May 21 16:33 dev/osd3/current/3.0_head/obj2__64_3F1EE208__3
660683 8 rw-r--r- 1 dzafman dzafman 13 May 21 16:33 dev/osd0/current/3.0_head/obj2__head_3F1EE208__3
667235 8 rw-r--r- 1 dzafman dzafman 13 May 21 16:33 dev/osd0/current/3.0_head/obj2__64_3F1EE208__3
660703 8 rw-r--r- 1 dzafman dzafman 13 May 21 16:33 dev/osd4/current/3.0_head/obj2__head_3F1EE208__3
667240 8 rw-r--r- 1 dzafman dzafman 13 May 21 16:33 dev/osd4/current/3.0_head/obj2__64_3F1EE208__3
$ rm dev/osd3/current/3.0_head/obj2__64_3F1EE208__3 dev/osd0/current/3.0_head/obj2__64_3F1EE208__3 dev/osd4/current/3.0_head/obj2__64_3F1EE208__3
d$ find dev name obj2_* -ls
660758 8 -rw-r--r-
1 dzafman dzafman 13 May 21 16:33 dev/osd3/current/3.0_head/obj2__head_3F1EE208__3
660683 8 rw-r--r- 1 dzafman dzafman 13 May 21 16:33 dev/osd0/current/3.0_head/obj2__head_3F1EE208__3
660703 8 rw-r--r- 1 dzafman dzafman 13 May 21 16:33 dev/osd4/current/3.0_head/obj2__head_3F1EE208__3
$ rados -p testpool listsnaps obj2

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe682e700 (LWP 59632)]
0x0000000000e9fe88 in std::vector<snapid_t, std::allocator<snapid_t> >::size (this=0xd8) at /usr/include/c++/4.6/bits/stl_vector.h:571
571 { return size_type(this->_M_impl._M_finish - this->_M_impl._M_start); }
(gdb) bt
#0 0x0000000000e9fe88 in std::vector<snapid_t, std::allocator<snapid_t> >::size (this=0xd8) at /usr/include/c++/4.6/bits/stl_vector.h:571
#1 0x0000000000e687f7 in ReplicatedPG::find_object_context (this=0x20e3000, oid=..., oloc=..., pobc=0x7fffe682d040, can_create=false, psnapid=0x7fffe682cd70) at osd/ReplicatedPG.cc:4478
#2 0x0000000000e436e6 in ReplicatedPG::do_op (this=0x20e3000, op=std::tr1::shared_ptr (count 4, weak 0) 0x22c6000) at osd/ReplicatedPG.cc:865
#3 0x000000000101c94a in PG::do_request (this=0x20e3000, op=std::tr1::shared_ptr (count 4, weak 0) 0x22c6000) at osd/PG.cc:1814
#4 0x0000000000f412d7 in OSD::dequeue_op (this=0x20c9000, pg=..., op=std::tr1::shared_ptr (count 4, weak 0) 0x22c6000) at osd/OSD.cc:6411
#5 0x0000000000f40f2e in OSD::OpWQ::_process (this=0x20c9dc0, pg=...) at osd/OSD.cc:6385
#6 0x0000000000fab556 in ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_process (this=0x20c9dc0, u=...) at ./common/WorkQueue.h:168
#7 0x0000000000fab370 in ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process (this=0x20c9dc0, handle=...) at ./common/WorkQueue.h:189
#8 0x0000000001211e0d in ThreadPool::worker (this=0x20c9458, wt=0x2157e40) at common/WorkQueue.cc:119
#9 0x0000000001213a9d in ThreadPool::WorkThread::entry (this=0x2157e40) at common/WorkQueue.h:316
#10 0x000000000120aef9 in Thread::_entry_func (arg=0x2157e40) at common/Thread.cc:41
#11 0x00007ffff79c2e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007ffff5b584bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x0000000000000000 in ?? ()

Actions #1

Updated by David Zafman almost 11 years ago

  • Assignee set to David Zafman

A similar issue caused by push_to_replica() not checking the return from get_snapset_context().

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe602d700 (LWP 60182)]
0x0000000000eada38 in std::_Rb_tree<snapid_t, std::pair<snapid_t const, unsigned long>, std::_Select1st<std::pair<snapid_t const, unsigned long> >, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::_M_begin (this=0x80) at /usr/include/c++/4.6/bits/stl_tree.h:493
493 { return static_cast<_Link_type>(this->_M_impl._M_header._M_parent); }
(gdb) bt
#0 0x0000000000eada38 in std::_Rb_tree<snapid_t, std::pair<snapid_t const, unsigned long>, std::_Select1st<std::pair<snapid_t const, unsigned long> >, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::_M_begin (this=0x80) at /usr/include/c++/4.6/bits/stl_tree.h:493
#1 0x0000000000ecb366 in std::_Rb_tree<snapid_t, std::pair<snapid_t const, unsigned long>, std::_Select1st<std::pair<snapid_t const, unsigned long> >, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::lower_bound (this=0x80, __k=...)
at /usr/include/c++/4.6/bits/stl_tree.h:828
#2 0x0000000000eb5f3d in std::map<snapid_t, unsigned long, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::lower_bound (this=0x80, __x=...)
at /usr/include/c++/4.6/bits/stl_map.h:784
#3 0x0000000000ea4ad2 in std::map<snapid_t, unsigned long, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::operator[] (this=0x80, __k=...)
at /usr/include/c++/4.6/bits/stl_map.h:450
#4 0x0000000000e6cbfe in ReplicatedPG::calc_clone_subsets (this=0x22fd000, snapset=..., soid=...,
Python Exception <type 'exceptions.IndexError'> list index out of range:
missing=..., last_backfill=..., data_subset=..., clone_subsets=std::map with 0 elements)
at osd/ReplicatedPG.cc:4927
#5 0x0000000000e6fb1c in ReplicatedPG::push_to_replica (this=0x22fd000, obc=0x24c6780, soid=...,
peer=5, prio=10) at osd/ReplicatedPG.cc:5166
#6 0x0000000000e7f9fd in ReplicatedPG::recover_object_replicas (this=0x22fd000, soid=..., v=...,
prio=10) at osd/ReplicatedPG.cc:6797
#7 0x0000000000e807ff in ReplicatedPG::recover_replicas (this=0x22fd000, max=5)
at osd/ReplicatedPG.cc:6846
#8 0x0000000000e7c5f2 in ReplicatedPG::start_recovery_ops (this=0x22fd000, max=5,
prctx=0x7fffe602c7c0) at osd/ReplicatedPG.cc:6509
#9 0x0000000000f3c62b in OSD::do_recovery (this=0x2155000, pg=0x22fd000) at osd/OSD.cc:5971
#10 0x0000000000f4c336 in OSD::RecoveryWQ::_process (this=0x21563e0, pg=0x22fd000) at osd/OSD.h:1311
#11 0x0000000000fabd68 in ThreadPool::WorkQueue<PG>::_process (this=0x21563e0, t=0x22fd000)
at ./common/WorkQueue.h:246

#12 0x0000000000fabd03 in ThreadPool::WorkQueue<PG>::_void_process (this=0x21563e0, p=0x22fd000,
handle=...) at ./common/WorkQueue.h:254
#13 0x0000000001211fa5 in ThreadPool::worker (this=0x21555e0, wt=0x22b4e80)
at common/WorkQueue.cc:119
#14 0x0000000001213c35 in ThreadPool::WorkThread::entry (this=0x22b4e80) at common/WorkQueue.h:316
#15 0x000000000120b091 in Thread::_entry_func (arg=0x22b4e80) at common/Thread.cc:41
#16 0x00007ffff79c2e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#17 0x00007ffff5b584bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x0000000000000000 in ?? ()

Actions #2

Updated by Sage Weil almost 11 years ago

  • Priority changed from Normal to Urgent
Actions #3

Updated by David Zafman almost 11 years ago

  • Status changed from New to Fix Under Review
Actions #4

Updated by David Zafman almost 11 years ago

  • Category set to OSD
Actions #5

Updated by Samuel Just almost 11 years ago

  • Priority changed from Urgent to High
Actions #6

Updated by David Zafman almost 11 years ago

  • Status changed from Fix Under Review to Resolved

To trigger this you have to delete all copies of a clone or the head. We aren't going to handle that gracefully, but at least assert instead of segfault.
3fa65852e1ff769a47e19709b9e7cc2c316351c7

Actions

Also available in: Atom PDF