Bug #21798: Bluestore cache overcommits OSD memory resulting in oom kill - Ceph - Ceph

Actions

Copy link

Bug #21798

closed

Bluestore cache overcommits OSD memory resulting in oom kill

Added by Ruben Rodriguez over 6 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

v12.2.1

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

The default value for bluestore_cache_size_ssd is 3G. Our server has 6 OSDs and 16G of ram, so running with the default value the memory usage eventually grows to >18G and triggers the oom killer. We are running luminous (12.2.1-1xenial). Lowering the value to 1G solved the problem.

This default value should only be honored if the available memory allows it. Maybe a warning state should be present if the cache setting would cause this, and/or if the OSD server is running out of memory.

Actions

Copy link

Updated by Ian Kelling over 6 years ago

Related: http://docs.ceph.com/docs/jewel/start/hardware-recommendations/ says "OSDs do not require as much RAM for regular operations (e.g., 500MB of RAM per daemon instance);" Yet, in this case there was 266% of that recommendation and it was below the actual minimum. Should the docs be updated?

Actions

Copy link

Updated by Sage Weil over 6 years ago

Status changed from New to Resolved

This is (mostly) fixed by 80c60fcde22cf1269ada45d0914543b8f7d49b3e post 12.2.1, see #21417. There is still some overcommit due to allocator fragmentation etc, but it's much better now

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #21798

Bluestore cache overcommits OSD memory resulting in oom kill

Updated by Ian Kelling over 6 years ago

Updated by Sage Weil over 6 years ago