Bug #52445: OSD asserts on starting too many pushes - RADOS - Ceph

Bug #52445

I am running ceph version 15.2.5 cluster in the recent days scrub reported error and few pg failed due to OSD's randomly stops and  
 restarting OSD also fails after some time. 

 The cluster is in a non-usable state. 

 Attached crash log from one of the failed OSD. 

 <pre>/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/ceph-15.2.7/src/osd/OSD.cc: 9521: FAILED ceph_assert(started <= reserved_pushes) 

  ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable) 
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x55fcb6621dbe] 
  2: (()+0x504fd8) [0x55fcb6621fd8] 
  3: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x5f5) [0x55fcb6704c25] 
  4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1d) [0x55fcb6960a3d] 
  5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12ef) [0x55fcb67224df] 
  6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x55fcb6d5b224] 
  7: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55fcb6d5de84] 
  8: (()+0x82de) [0x7f04c1b1c2de] 
  9: (clone()+0x43) [0x7f04c0853e83]</pre>

Back

Project

General

Profile

Ceph » RADOS

Bug #52445