Project

General

Profile

Fix #8914

Updated by Loïc Dachary over 9 years ago

h3. Steps to reproduce 

 <pre> 
 ./stop.sh 
 rm -fr dev out ;    mkdir -p dev ; CEPH_NUM_MON=1 CEPH_NUM_OSD=3 ./vstart.sh -d -n -X -l mon osd 
 ./rados --pool rbd put SOMETHING /etc/group 
 # sleep 60 # comment this out and the problem does not show 
 rm dev/osd1/current/*/*SOMETHING* # osd.1 is the primary 
 ceph pg scrub 0.7 
 sleep 60 
 </pre> 

 It crashes in "build_push_op":https://github.com/ceph/ceph/blob/master/src/osd/ReplicatedPG.cc#L8760 because "get_omap_iterator":https://github.com/ceph/ceph/blob/master/src/os/FileStore.cc#L4704 returned a null iterator because the file was removed.  

 h3. Original description 

 OSD crashed with assert ReplicatedBackend::build_push_op .  
 Steps Followed: 

 sudo ceph pg map 3.151  
 osdmap e1274 pg 3.151 (3.151) -> up [2,9,20] acting [2,9,20] 
 I removed object file1 (inserted object using rados) , rm -f on /var/lib/ceph/osd/ceph-9/current/3.151/file1* and /var/lib/ceph/osd/ceph-2/current/3.151/file1* 

 Check for scrub errors using : 
 ceph pg scrub 3.151  
 ceph -w showed scrub errors on 2 and 9 
 Ran command:  
 ceph osd repair 2 
 Got Seg fault in osd.2: 
 2014-06-16 10:33:19.906324 7fb9e9543700 0 log [ERR] : 3.151 shard 2 missing a086551/file1/head//3 
 2014-06-16 10:33:19.906330 7fb9e9543700 0 log [ERR] : 3.151 shard 9 missing a086551/file1/head//3 
 2014-06-16 10:33:19.906362 7fb9e9543700 0 log [ERR] : 3.151 repair 1 missing, 0 inconsistent objects 
 2014-06-16 10:33:19.906378 7fb9e9543700 0 log [ERR] : 3.151 repair 2 errors, 2 fixed 
 2014-06-16 10:33:19.924977 7fb9e9d44700 -1 ** Caught signal (Segmentation fault) * 
 in thread 7fb9e9d44700 

 1: /usr/bin/ceph-osd() [0x974a1f] 
 2: (()+0x10340) [0x7fba089de340] 
 3: (ReplicatedBackend::build_push_op(ObjectRecoveryInfo const&, ObjectRecoveryProgress const&, ObjectRecoveryProgress*, PushOp*, object_stat_sum_t*)+0xc1c) [0x7d209c] 
 4: (ReplicatedBackend::prep_push(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, eversion_t, interval_set<unsigned long>&, std::map<hobject_t, interval_set<unsigned long>, std::less<hobject_t>, std::allocator<std::pair<hobject_t const, interval_set<unsigned long> > > >&, PushOp*)+0x3d8) [0x7d2b48] 
 5: (ReplicatedBackend::prep_push_to_replica(std::tr1::shared_ptr<ObjectContext>, hobject_t const&, pg_shard_t, PushOp*)+0x3af) [0x7d6b8f] 
 6: (ReplicatedBackend::start_pushes(hobject_t const&, std::tr1::shared_ptr<ObjectContext>, ReplicatedBackend::RPGHandle*)+0x1af) [0x7d9c6f] 
 7: (C_ReplicatedBackend_OnPullComplete::finish(ThreadPool::TPHandle&)+0x143) [0x84b083] 
 8: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x661a09] 
 9: (ReplicatedPG::BlessedGenContext<ThreadPool::TPHandle&>::finish(ThreadPool::TPHandle&)+0x95) [0x824f65] 
 10: (GenContext<ThreadPool::TPHandle&>::complete(ThreadPool::TPHandle&)+0x9) [0x661a09] 
 11: (ThreadPool::WorkQueueVal<GenContext<ThreadPool::TPHandle&>, GenContext<ThreadPool::TPHandle&>>::_void_process(void*, ThreadPool::TPHandle&)+0x62) [0x6697d2] 
 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0xaf1) [0xa4b351] 
 13: (ThreadPool::WorkThread::entry()+0x10) [0xa4c440] 
 14: (()+0x8182) [0x7fba089d6182] 
 15: (clone()+0x6d) [0x7fba06d7730d] 
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Back