Bug #21798
closedBluestore cache overcommits OSD memory resulting in oom kill
0%
Description
The default value for bluestore_cache_size_ssd is 3G. Our server has 6 OSDs and 16G of ram, so running with the default value the memory usage eventually grows to >18G and triggers the oom killer. We are running luminous (12.2.1-1xenial). Lowering the value to 1G solved the problem.
This default value should only be honored if the available memory allows it. Maybe a warning state should be present if the cache setting would cause this, and/or if the OSD server is running out of memory.
Updated by Ian Kelling over 6 years ago
Related: http://docs.ceph.com/docs/jewel/start/hardware-recommendations/ says "OSDs do not require as much RAM for regular operations (e.g., 500MB of RAM per daemon instance);" Yet, in this case there was 266% of that recommendation and it was below the actual minimum. Should the docs be updated?
Updated by Sage Weil over 6 years ago
- Status changed from New to Resolved
This is (mostly) fixed by 80c60fcde22cf1269ada45d0914543b8f7d49b3e post 12.2.1, see #21417. There is still some overcommit due to allocator fragmentation etc, but it's much better now