Actions
Bug #41923
closed3 different ceph-osd asserts caused by enabling auto-scaler
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Change config osd_pool_default_pg_autoscale_mode to "on"
Saw these 4 core dumps on 3 different sub-tests.
../qa/runstandalone.sh "osd-scrub-repair.sh TEST_XXXXXXXX"
TEST_corrupt_scrub_erasure_overwrites /home/dzafman/ceph/src/osd/ECBackend.cc: 641: FAILED ceph_assert(pop.data.length() == sinfo.aligned_logical_offset_to_chunk_offset( after_progress.data_recovered_to - op.recovery_progress.data_recovered_to)) /home/dzafman/ceph/src/osd/ECBackend.cc: 583: FAILED ceph_assert(op.hinfo) TEST_repair_stats -12> 2019-09-18T14:13:30.560-0700 7fd515db1700 1 -- 127.0.0.1:0/19306 <== osd.0 v2:127.0.0.1:6816/19643 43 ==== osd_ping(ping_reply e26 up_from 23 ping_stamp 2019-09-18T14:13:30.563990-0700/166.699510694s send_stamp 161.692563315s delta_ub -5.006947379s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55c3725fc400 con 0x55c36fbc3180 -11> 2019-09-18T14:13:30.560-0700 7fd515db1700 20 osd.1 26 handle_osd_ping new stamps hbstamp(osd.0 up_from 23 peer_clock_delta [-5.007942020s,-5.006947379s]) -10> 2019-09-18T14:13:30.796-0700 7fd50b357700 5 prioritycache tune_memory target: 4294967296 mapped: 57991168 unmapped: 81920 heap: 58073088 old mem: 2845415832 new mem: 2845415832 -9> 2019-09-18T14:13:31.084-0700 7fd512571700 10 osd.1 26 tick -8> 2019-09-18T14:13:31.084-0700 7fd512571700 10 osd.1 26 do_waiters -- start -7> 2019-09-18T14:13:31.084-0700 7fd512571700 10 osd.1 26 do_waiters -- finish -6> 2019-09-18T14:13:31.084-0700 7fd512571700 20 osd.1 26 tick last_purged_snaps_scrub 2019-09-18T14:10:08.873595-0700 next 2019-09-19T14:10:08.873595-0700 -5> 2019-09-18T14:13:31.811-0700 7fd50b357700 5 prioritycache tune_memory target: 4294967296 mapped: 57991168 unmapped: 81920 heap: 58073088 old mem: 2845415832 new mem: 2845415832 -4> 2019-09-18T14:13:31.875-0700 7fd515db1700 1 -- [v2:127.0.0.1:6808/19306,v1:127.0.0.1:6809/19306] <== osd.0 127.0.0.1:0/19643 57 ==== osd_ping(ping e26 up_from 23 ping_stamp 2019-09-18T14:13:31.880600-0700/163.008322854s send_stamp 163.008322854s delta_ub -5.006947379s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55c3725fc400 con 0x55c3724d6d80 -3> 2019-09-18T14:13:31.875-0700 7fd515db1700 20 osd.1 26 handle_osd_ping new stamps hbstamp(osd.0 up_from 23 peer_clock_delta [-5.008228232s,-5.006947379s]) -2> 2019-09-18T14:13:31.875-0700 7fd515db1700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fd4fa541700' had timed out after 15 -1> 2019-09-18T14:13:31.875-0700 7fd515db1700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fd4fa541700' had suicide timed out after 150 0> 2019-09-18T14:13:31.899-0700 7fd4fa541700 -1 *** Caught signal (Aborted) ** in thread 7fd4fa541700 thread_name:tp_osd_tp ceph version v15.0.0-5169-g0d5f330188 (0d5f33018877851db13181511d4868396079a5b9) octopus (dev) 1: (()+0x2e29f48) [0x55c363fb6f48] 2: (()+0x12890) [0x7fd519e93890] 3: (pthread_cond_wait()+0x243) [0x7fd519e8e9f3] 4: (ceph::condition_variable_debug::wait(std::unique_lock<ceph::mutex_debug_detail::mutex_debug_impl<false> >&)+0xab) [0x55c3640ab3a7] 5: (BlueStore::OpSequencer::drain_preceding(BlueStore::TransContext*)+0x6b) [0x55c363e47be3] 6: (BlueStore::_osr_drain_preceding(BlueStore::TransContext*)+0x268) [0x55c363e0b0a8] 7: (BlueStore::_split_collection(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Collection>&, unsigned int, int)+0x24b) [0x55c363e2f7a5] 8: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x54d) [0x55c363e15277] 9: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x5e8) [0x55c363e14204] 10: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x96) [0x55c363628000] 11: (OSD::dispatch_context(PeeringCtx&, PG*, std::shared_ptr<OSDMap const>, ThreadPool::TPHandle*)+0x806) [0x55c3635ffc1e] 12: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x380) [0x55c3636057ee] 13: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x6d) [0x55c363aaa28b] 14: (OpQueueItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x4b) [0x55c36363211d] 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x3631) [0x55c363612cf1] 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x59c) [0x55c36405b882] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x25) [0x55c36405d25f] 18: (Thread::entry_wrapper()+0x78) [0x55c364047ec6] 19: (Thread::_entry_func(void*)+0x18) [0x55c364047e44] 20: (()+0x76db) [0x7fd519e886db] 21: (clone()+0x3f) [0x7fd518bd188f] TEST_repair_stats_ec /home/dzafman/ceph/src/osd/ECBackend.cc: 478: FAILED ceph_assert(op.xattrs.size())
Actions