Project

General

Profile

Actions

Bug #21798

closed

Bluestore cache overcommits OSD memory resulting in oom kill

Added by Ruben Rodriguez over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The default value for bluestore_cache_size_ssd is 3G. Our server has 6 OSDs and 16G of ram, so running with the default value the memory usage eventually grows to >18G and triggers the oom killer. We are running luminous (12.2.1-1xenial). Lowering the value to 1G solved the problem.

This default value should only be honored if the available memory allows it. Maybe a warning state should be present if the cache setting would cause this, and/or if the OSD server is running out of memory.

Actions #1

Updated by Ian Kelling over 6 years ago

Related: http://docs.ceph.com/docs/jewel/start/hardware-recommendations/ says "OSDs do not require as much RAM for regular operations (e.g., 500MB of RAM per daemon instance);" Yet, in this case there was 266% of that recommendation and it was below the actual minimum. Should the docs be updated?

Actions #2

Updated by Sage Weil over 6 years ago

  • Status changed from New to Resolved

This is (mostly) fixed by 80c60fcde22cf1269ada45d0914543b8f7d49b3e post 12.2.1, see #21417. There is still some overcommit due to allocator fragmentation etc, but it's much better now

Actions

Also available in: Atom PDF