Project

General

Profile

Bug #41923

Updated by David Zafman over 4 years ago

 
 Change config osd_pool_default_pg_autoscale_mode to "on" 

 Saw these 4 core dumps on 3 different sub-tests. 

 ../qa/runstandalone.sh "osd-scrub-repair.sh TEST_XXXXXXXX" 

 <pre> 
 TEST_corrupt_scrub_erasure_overwrites 
 /home/dzafman/ceph/src/osd/ECBackend.cc: 641: FAILED ceph_assert(pop.data.length() == sinfo.aligned_logical_offset_to_chunk_offset( after_progress.data_recovered_to - op.recovery_progress.data_recovered_to)) 
 /home/dzafman/ceph/src/osd/ECBackend.cc: 583: FAILED ceph_assert(op.hinfo) 

 TEST_repair_stats  
    -12> 2019-09-18T14:13:30.560-0700 7fd515db1700    1 -- 127.0.0.1:0/19306 <== osd.0 v2:127.0.0.1:6816/19643 43 ==== osd_ping(ping_reply e26 up_from 23 ping_stamp 2019-09-18T14:13:30.563990-0700/166.699510694s send_stamp 161.692563315s delta_ub -5.006947379s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55c3725fc400 con 0x55c36fbc3180 
    -11> 2019-09-18T14:13:30.560-0700 7fd515db1700 20 osd.1 26 handle_osd_ping new stamps hbstamp(osd.0 up_from 23 peer_clock_delta [-5.007942020s,-5.006947379s]) 
    -10> 2019-09-18T14:13:30.796-0700 7fd50b357700    5 prioritycache tune_memory target: 4294967296 mapped: 57991168 unmapped: 81920 heap: 58073088 old mem: 2845415832 new mem: 2845415832 
     -9> 2019-09-18T14:13:31.084-0700 7fd512571700 10 osd.1 26 tick 
     -8> 2019-09-18T14:13:31.084-0700 7fd512571700 10 osd.1 26 do_waiters -- start 
     -7> 2019-09-18T14:13:31.084-0700 7fd512571700 10 osd.1 26 do_waiters -- finish 
     -6> 2019-09-18T14:13:31.084-0700 7fd512571700 20 osd.1 26 tick last_purged_snaps_scrub 2019-09-18T14:10:08.873595-0700 next 2019-09-19T14:10:08.873595-0700 
     -5> 2019-09-18T14:13:31.811-0700 7fd50b357700    5 prioritycache tune_memory target: 4294967296 mapped: 57991168 unmapped: 81920 heap: 58073088 old mem: 2845415832 new mem: 2845415832 
     -4> 2019-09-18T14:13:31.875-0700 7fd515db1700    1 -- [v2:127.0.0.1:6808/19306,v1:127.0.0.1:6809/19306] <== osd.0 127.0.0.1:0/19643 57 ==== osd_ping(ping e26 up_from 23 ping_stamp 2019-09-18T14:13:31.880600-0700/163.008322854s send_stamp 163.008322854s delta_ub -5.006947379s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55c3725fc400 con 0x55c3724d6d80 
     -3> 2019-09-18T14:13:31.875-0700 7fd515db1700 20 osd.1 26 handle_osd_ping new stamps hbstamp(osd.0 up_from 23 peer_clock_delta [-5.008228232s,-5.006947379s]) 
     -2> 2019-09-18T14:13:31.875-0700 7fd515db1700    1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fd4fa541700' had timed out after 15 
     -1> 2019-09-18T14:13:31.875-0700 7fd515db1700    1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fd4fa541700' had suicide timed out after 150 
              0> 2019-09-18T14:13:31.899-0700 7fd4fa541700 -1 *** Caught signal (Aborted) ** 
  in thread 7fd4fa541700 thread_name:tp_osd_tp 

  ceph version v15.0.0-5169-g0d5f330188 (0d5f33018877851db13181511d4868396079a5b9) octopus (dev) 
  1: (()+0x2e29f48) [0x55c363fb6f48] 
  2: (()+0x12890) [0x7fd519e93890] 
  3: (pthread_cond_wait()+0x243) [0x7fd519e8e9f3] 
  4: (ceph::condition_variable_debug::wait(std::unique_lock<ceph::mutex_debug_detail::mutex_debug_impl<false> >&)+0xab) [0x55c3640ab3a7] 
  5: (BlueStore::OpSequencer::drain_preceding(BlueStore::TransContext*)+0x6b) [0x55c363e47be3] 
  6: (BlueStore::_osr_drain_preceding(BlueStore::TransContext*)+0x268) [0x55c363e0b0a8] 
  7: (BlueStore::_split_collection(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Collection>&, unsigned int, int)+0x24b) [0x55c363e2f7a5] 
  8: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x54d) [0x55c363e15277] 
  9: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x5e8) [0x55c363e14204] 
  10: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x96) [0x55c363628000] 
  11: (OSD::dispatch_context(PeeringCtx&, PG*, std::shared_ptr<OSDMap const>, ThreadPool::TPHandle*)+0x806) [0x55c3635ffc1e] 
  12: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x380) [0x55c3636057ee] 
  13: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x6d) [0x55c363aaa28b] 
  14: (OpQueueItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x4b) [0x55c36363211d] 
  15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x3631) [0x55c363612cf1] 
  16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x59c) [0x55c36405b882] 
  17: (ShardedThreadPool::WorkThreadSharded::entry()+0x25) [0x55c36405d25f] 
  18: (Thread::entry_wrapper()+0x78) [0x55c364047ec6] 
  19: (Thread::_entry_func(void*)+0x18) [0x55c364047e44] 
  20: (()+0x76db) [0x7fd519e886db] 
  21: (clone()+0x3f) [0x7fd518bd188f] 

 TEST_repair_stats_ec 
 /home/dzafman/ceph/src/osd/ECBackend.cc: 478: FAILED ceph_assert(op.xattrs.size()) 

 </pre>

Back