Project

General

Profile

Bug #7539

Firefly EC pool massive memory leak during writes

Added by Mark Nelson about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

EC pools appear to leak memory rather badly during writes. Looks like messages aren't being properly cleaned up. Massif output file included. This was for a 5 minute rados bench run through valgrind with 3 OSDs.

ceph-osd.0.out (420 KB) Mark Nelson, 02/25/2014 02:19 PM

ceph-osd.1.out (134 KB) Mark Nelson, 02/28/2014 06:55 AM

osd.0.log.gz (1.99 MB) Mark Nelson, 02/28/2014 12:35 PM

Associated revisions

Revision fbb1ec88 (diff)
Added by Samuel Just about 9 years ago

ECBackend: don't leak transactions

Fixes: #7539
Signed-off-by: Samuel Just <>

Revision 62fd382f (diff)
Added by Samuel Just about 9 years ago

osd_types,PG: trim mod_desc for log entries to min size

In the event that mod_desc.bl contains pointers into a large
message buffer, we'd otherwise end up keeping around the entire
MOSDECSubOpWrite which created each log entry.

Fixes: #7539
Signed-off-by: Samuel Just <>

History

#1 Updated by Samuel Just about 9 years ago

ubuntu@teuthology:/a/teuthology-2014-02-26_23:00:27-rados-firefly-testing-basic-plana/106918/remote

Messing up nightlies also!

#2 Updated by Samuel Just about 9 years ago

  • Status changed from New to 7

testing wip-7542

#3 Updated by Mark Nelson about 9 years ago

Tested wip-7542 this morning using 0.77-616-g36fda70-1saucy. Still seeing rapid memory growth (without valgrind, about 12GB per OSD after 3 minutes of 4MB rados bench writes). Also tested with Massif and included the results again.

#4 Updated by Mark Nelson about 9 years ago

Ran more tests on 0.77-619-gfbb1ec8-1saucy, still seeing lots of memory usage. Ran with optracker debugging, output for osd 0 included.

#5 Updated by Sage Weil about 9 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF