Project

General

Profile

Actions

Bug #460

closed

OSD crash: ReplicatedPG::push_to_replica / Rb_tree

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After my cluster recovered from the latest crashes, I wanted to check if my RBD data was still in tact.

This caused osd0 to crash:

Core was generated by `/usr/bin/cosd -i 0 -c /etc/ceph/ceph.conf'.
Program terminated with signal 11, Segmentation fault.
#0  std::_Rb_tree<snapid_t, std::pair<snapid_t const, unsigned long>, std::_Select1st<std::pair<snapid_t const, unsigned long> >, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::_M_begin (this=0x2218000, 
    snapset=..., soid=<value optimized out>, missing=..., data_subset=..., clone_subsets=...)
    at /usr/include/c++/4.4/bits/stl_tree.h:482
482          { return static_cast<_Link_type>(this->_M_impl._M_header._M_parent); }
(gdb) bt
#0  std::_Rb_tree<snapid_t, std::pair<snapid_t const, unsigned long>, std::_Select1st<std::pair<snapid_t const, unsigned long> >, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::_M_begin (this=0x2218000, 
    snapset=..., soid=<value optimized out>, missing=..., data_subset=..., clone_subsets=...)
    at /usr/include/c++/4.4/bits/stl_tree.h:482
#1  std::_Rb_tree<snapid_t, std::pair<snapid_t const, unsigned long>, std::_Select1st<std::pair<snapid_t const, unsigned long> >, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::lower_bound (this=0x2218000, 
    snapset=..., soid=<value optimized out>, missing=..., data_subset=..., clone_subsets=...)
    at /usr/include/c++/4.4/bits/stl_tree.h:745
#2  std::map<snapid_t, unsigned long, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::lower_bound (this=0x2218000, snapset=..., soid=<value optimized out>, missing=..., data_subset=..., clone_subsets=...)
    at /usr/include/c++/4.4/bits/stl_map.h:701
#3  std::map<snapid_t, unsigned long, std::less<snapid_t>, std::allocator<std::pair<snapid_t const, unsigned long> > >::operator[] (this=0x2218000, snapset=..., soid=<value optimized out>, missing=..., data_subset=..., clone_subsets=...)
    at /usr/include/c++/4.4/bits/stl_map.h:447
#4  ReplicatedPG::calc_clone_subsets (this=0x2218000, snapset=..., soid=<value optimized out>, missing=..., 
    data_subset=..., clone_subsets=...) at osd/ReplicatedPG.cc:2613
#5  0x000000000049571e in ReplicatedPG::push_to_replica (this=0x2218000, obc=<value optimized out>, soid=..., peer=8)
    at osd/ReplicatedPG.cc:2831
#6  0x0000000000496083 in ReplicatedPG::recover_object_replicas (this=0x2218000, soid=...) at osd/ReplicatedPG.cc:3682
#7  0x00000000004964ab in ReplicatedPG::recover_replicas (this=0x2218000, max=<value optimized out>)
    at osd/ReplicatedPG.cc:3715
#8  0x000000000049f0ba in ReplicatedPG::start_recovery_ops (this=0x2218000, max=1) at osd/ReplicatedPG.cc:3524
#9  0x00000000004d7c6c in OSD::do_recovery (this=0x1332000, pg=0x2218000) at osd/OSD.cc:4332
#10 0x00000000005c6c0f in ThreadPool::worker (this=0x13325f8) at common/WorkQueue.cc:44
#11 0x00000000004fd9ed in ThreadPool::WorkThread::entry() ()
#12 0x000000000046e82a in Thread::_entry_func (arg=0x2218000) at ./common/Thread.h:39
#13 0x00007fcfa13459ca in start_thread () from /lib/libpthread.so.0
#14 0x00007fcfa02fd6fd in clone () from /lib/libc.so.6
#15 0x0000000000000000 in ?? ()

Restarting the OSD caused the OSD to crash again withing a few seconds.

The core, binary and logs are available on logger.pcextreme.nl:/srv/ceph/issues/osd_crash_rb_tree

Actions

Also available in: Atom PDF