jewel: ReplicatedBackend::build_push_op: add a second config to limit omap entries/chunk independently of object data
build_push_op assumes 8MB of omap entries is about as much work to read as 8MB of object data. This is probably false. Add a config (osd_recovery_max_omap_entries_per_chunk ?) with a sane default (50k?) and change build_push_op to use it.
#2 Updated by Alexey Sheplyakov about 3 years ago
In ceph deployments with large numbers of objects (typically generated by use of radosgw for object storage)
it's quite possible for OSDs recovering data to hit their suicide timeout and shutdown because of the number
of objects each was trying to recover in a single chunk between heartbeats.