Bug #36725: luminous: Apparent Memory Leak in OSD - RADOS - Ceph

Actions

Copy link

Bug #36725

closed

luminous: Apparent Memory Leak in OSD

Added by John Jaser over 5 years ago. Updated over 5 years ago.

Status:

Closed

Priority:

Urgent

Assignee:

Category:

Target version:

Ceph - v12.2.9

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Since last update (late October), been experiencing apparent memory leak in OSD process on two ceph servers in small business environment.
Debian stretch, kernel 4.9.0-8-amd64, luminous with Bluestore.
Two servers, each with two OSD daemons - 8 TB storage each (2x4TB) and 8GB RAM.

memory use on OSD process has been observed to grow at about 100MB per hour per OSD; have been rebooting servers when each OSD process approaches 50% of physical memory. After reboot they return to about 10% use each, and begin growing again.

Since no one else has reported, I believe likely to to my configuration, but both systems have been extremely stable up to this point.
Maybe related to my non-optimal replicas? (triple replicas on two servers)

Files

Download all files

3.1G_free.txt (4.2 KB) 3.1G_free.txt		John Jaser, 11/20/2018 09:41 PM
2.0G_free.txt (4.21 KB) 2.0G_free.txt		John Jaser, 11/20/2018 09:41 PM
4.5G_free.txt (4.21 KB) 4.5G_free.txt		John Jaser, 11/20/2018 09:41 PM
4.1G_free (3.93 KB) 4.1G_free		John Jaser, 12/02/2018 07:29 PM
1.6G_free (3.94 KB) 1.6G_free		John Jaser, 12/02/2018 07:29 PM
344M_free (3.94 KB) 344M_free		John Jaser, 12/02/2018 07:29 PM
Selection_001.png (109 KB) Selection_001.png		Konstantin Shalygin, 12/13/2018 09:20 AM

Actions

Copy link

Updated by John Jaser over 5 years ago

Note: Downgrading both OSD servers to v12.2.8 returned memory usage to normal.

Actions

Copy link

Updated by Nathan Cutler over 5 years ago

Priority changed from Normal to Urgent

raising priority since this might be a regression in 12.2.9

Actions

Copy link

Updated by Greg Farnum over 5 years ago

Project changed from Ceph to RADOS
Category deleted (~~OSD~~)
Component(RADOS) OSD added

Actions

Copy link

Updated by Sage Weil over 5 years ago

Subject changed from Apparent Memory Leak in OSD to luminous: Apparent Memory Leak in OSD
Status changed from New to Need More Info

can you dump the mempools (ceph daemon osd.NNN dump_mempools) several times over the growht of the process so we can see what is consuming the memory?

Actions

Copy link Download all files

Updated by John Jaser over 5 years ago

File 2.0G_free.txt 2.0G_free.txt added
File 3.1G_free.txt 3.1G_free.txt added
File 4.5G_free.txt 4.5G_free.txt added

Upgraded one OSD server to 12.2.9. Clean reboot. Generating hourly report on memory and mempools. Three examples attached.

Actions

Copy link Download all files

Updated by John Jaser over 5 years ago

File 1.6G_free 1.6G_free added
File 4.1G_free 4.1G_free added
File 344M_free 344M_free added

Upgraded one OSD server to 12.2.10: Same symptom observed. See attached. Two OSD daemons use up all physical memory in about 25 hours. 12.2.8 runs stable.

Actions

Copy link

Updated by birong huang over 5 years ago

I have same problem.

Actions

Copy link

Updated by Konstantin Shalygin over 5 years ago

John, you are you in course about new 12.2.9 options osd_memory_target and bluestore_cache_autotune?
You should try to disable autotune or lower memory_target, I don't know exactly how it works together, documentation still unclear.

Actions

Copy link

Updated by John Jaser over 5 years ago

Konstantin: thanks for pointing that out. that looks like the issue. Both OSD servers have 8GB RAM total, each running two OSD daemons. So the default osd_memory_target setting of 4294967296 won't allow any overhead for OS RAM. (sort of breaks the 1GB RAM per TB storage rule of thumb for my setup).

I changed setting to osd_memory_target = 2684354560

After 55 hours uptime, free memory is about 2.1G which is just slightly over target, and looks stable.

Thanks to all.

Actions

Copy link

#10

Updated by Greg Farnum over 5 years ago

Status changed from Need More Info to Closed

Actions

Copy link

#11

Updated by Konstantin Shalygin over 5 years ago

File Selection_001.png Selection_001.png added

I made dumps during the tune of the osd_memory_target value. Perhaps this data will be useful in the future.

ceph-post-file: 97de181d-f5f9-44cd-8798-877a803df70f

I set osd_memory_target to 3GB, memory consumption is about 7-9% more than the default Bluestore settings in 12.2.8.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #36725

luminous: Apparent Memory Leak in OSD

Updated by John Jaser over 5 years ago

Updated by Nathan Cutler over 5 years ago

Updated by Greg Farnum over 5 years ago

Updated by Sage Weil over 5 years ago

Updated by John Jaser over 5 years ago

Updated by John Jaser over 5 years ago

Updated by birong huang over 5 years ago

Updated by Konstantin Shalygin over 5 years ago

Updated by John Jaser over 5 years ago

Updated by Greg Farnum over 5 years ago

Updated by Konstantin Shalygin over 5 years ago