Project

General

Profile

Bug #23031

Updated by David Zafman about 6 years ago

Using vstart to start 3 OSDs with -o filestore debug inject read err=1 

 Manually injectdataerr on all replicas of object "foo."   

 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES      LOG DISK_LOG STATE                                           STATE_STAMP                  VERSION REPORTED UP      UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                  LAST_DEEP_SCRUB DEEP_SCRUB_STAMP             SNAPTRIMQ_LEN 
 1.0             1                    1          1           0         1 72750205    18         18 active+recovery_unfound+degraded+inconsistent 2018-02-16 18:01:18.212552     12'18     32:289 [1,0]            1    [1,0]                1        12'18 2018-02-16 17:44:18.140880             12'18 2018-02-16 17:44:18.140880               0 

 Marked all OSDS out and back in 

 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES      LOG DISK_LOG STATE                                       STATE_STAMP                  VERSION REPORTED UP    UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                  LAST_DEEP_SCRUB DEEP_SCRUB_STAMP             SNAPTRIMQ_LEN 
 1.0             1                    1          2           0         1 72750205    18         18 active+recovery_unfound+degraded+remapped 2018-02-16 18:03:44.924607     12'18     41:301 [1]            1    [1,2]                1        12'18 2018-02-16 17:44:18.140880             12'18 2018-02-16 17:44:18.140880               0 

 Killed all OSDs and then restarted them in order 0, 1 and 2.    osd.0 crashed as below. 

 # killall ceph-osd 
 # for i in $(seq 0 2); do bin/ceph-osd -i $i -c ceph.conf; done 

 ''' 
     

     -2> 2018-02-16 18:11:49.233 7f2643a9e700 10 osd.0 pg_epoch: 44 pg[1.0( v 12'18 lc 12'17 (0'0,12'18] local-lis/les=43/44 n=1 ec=10/10 lis/c 43/31 les/c/f 44/32/0 43/43/43) [1,0] r=1 lpr=43 pi=[31,43)/3 luod=0'0 crt=12'18 lcod 0'0 active m=1] _handle_message: 0x55644b334bc0 
     -1> 2018-02-16 18:11:49.233 7f2643a9e700 10 osd.0 pg_epoch: 44 pg[1.0( v 12'18 lc 12'17 (0'0,12'18] local-lis/les=43/44 n=1 ec=10/10 lis/c 43/31 les/c/f 44/32/0 43/43/43) [1,0] r=1 lpr=43 pi=[31,43)/3 luod=0'0 crt=12'18 lcod 0'0 active m=1] do_repop 1:602f83fe:::foo:head v 44'19 (transaction) 143 
      0> 2018-02-16 18:11:49.233 7f2643a9e700 -1 /home/dzafman/ceph/src/osd/ReplicatedBackend.cc: In function 'void ReplicatedBackend::do_repop(OpRequestRef)' thread 7f2643a9e700 time 2018-02-16 18:11:49.235446 
 /home/dzafman/ceph/src/osd/ReplicatedBackend.cc: 1085: FAILED assert(!parent->get_log().get_missing().is_missing(soid)) 

  ceph version 13.0.1-2022-gcb1cf5e (cb1cf5ef1b559f2bffe54abdf6b6cae453db6ebf) mimic (dev) 
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xf5) [0x7f26633c3615] 
  2: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0x11c) [0x5564494375ac] 
  3: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x237) [0x55644943a347] 
  4: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x55644934bee7] 
  5: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x675) [0x5564492f9605] 
  6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x341) [0x556449155d21] 
  7: (PGOpItem::run(OSD*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x5564493cdb52] 
  8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xf24) [0x55644915df94] 
  9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4f2) [0x7f26633c90e2] 
  10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f26633cb440] 
  11: (()+0x76ba) [0x7f2661edb6ba] 
  12: (clone()+0x6d) [0x7f266116582d] 
 '''

Back