Project

General

Profile

Backport #18132

Updated by Nathan Cutler over 7 years ago

In ceph deployments with large numbers of objects (typically generated by use of radosgw for object storage) 
 it's quite possible for OSDs recovering data to hit their suicide timeout and shutdown because of the number 
 of objects each was trying to recover in a single chunk between heartbeats. 

 https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1628750/comments/0 

 build_push_op assumes 8MB of omap entries is about as much work to read as 8MB of object data. This is probably false. 
 Add a config (osd_recovery_max_omap_entries_per_chunk ?) with a sane default (50k?) and change build_push_op to use it.

Back