Feature #1583: osd: bound pg log memory usage - Ceph - Ceph

Actions

Copy link

Feature #1583

closed

osd: bound pg log memory usage

Added by Alexandre Oliva over 12 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

Back in 0.34, my cluster with 3210 PGs 3-plicated across 3 OSD required some 4GB of RAM for the OSDs, i.e., with 3 OSDs up it would use 1.3GB on each; 2GB for 2 OSDs or 4GB for a single OSD.

With 0.35 and 3594, each OSD eats up 4.5GB of RAM just reading the local state, before even trying to contact a monitor. Two OSDs complete recovery using some 6 or 7GB each, and a single OSD skyrockets to 14+ GB before even moving PGs to peering. I could never complete recovery with a single 0.35 OSD :-(

I'm attaching the results of some memory profiling. I selected 3 relevant snapshots in the following sequence of events:

0. I brought osd.2 down and let the others recover, then brought them all down
1. I started osd.2 with memory profiling, and took snapshot alldown when it completed transitioning all PGs to crashed+down+degraded+peering
2. I started osd.1, and took snapshot cleanboth of osd.2 when all PGs were active+clean+degraded
3. I stopped osd.1, and took snapshot recovering of osd.2 when I ran out of time

Then I generated graphs out of each snapshot, as well as graphs with the incremental memory use between consecutive snapshots. They're all attached.

Files

Download all files

alldown.pdf (9.11 KB) alldown.pdf	osd.2 memory profile after 1 (all PGs crashed+down)	Alexandre Oliva, 09/28/2011 09:04 PM
cleanboth.pdf (12.2 KB) cleanboth.pdf	osd.2 memory profile after 2 (all PGs active+clean on 2 OSDs)	Alexandre Oliva, 09/28/2011 09:04 PM
recovering.pdf (14.5 KB) recovering.pdf	osd.2 memory profile during 3 (recovering from osd.1 down)	Alexandre Oliva, 09/28/2011 09:04 PM
alldown-to-cleanboth.pdf (11.5 KB) alldown-to-cleanboth.pdf	differential memory profile from 1 to 2	Alexandre Oliva, 09/28/2011 09:04 PM
cleanboth-to-recovering.pdf (10.1 KB) cleanboth-to-recovering.pdf	differential memory profile from 2 to 3	Alexandre Oliva, 09/28/2011 09:04 PM
clean3-cleandegraded2.pdf (21 KB) clean3-cleandegraded2.pdf	incremental memory use for recovery with osd.1 (negative!)	Alexandre Oliva, 09/29/2011 01:56 PM
cleandegraded2-recovering1.pdf (9.65 KB) cleandegraded2-recovering1.pdf	incremental memory use for recovery alone	Alexandre Oliva, 09/29/2011 01:56 PM
clean3-recovering1.pdf (10.1 KB) clean3-recovering1.pdf	incremental memory use for recovery alone, since recovery with 3 OSDs	Alexandre Oliva, 09/29/2011 01:56 PM
beforemon.pdf (6.53 KB) beforemon.pdf	osd.0 before contacting mon0	Alexandre Oliva, 09/29/2011 01:56 PM
clean3.pdf (10.1 KB) clean3.pdf	osd.0 after recovering with osd.1 and osd.2	Alexandre Oliva, 09/29/2011 01:56 PM
cleandegraded2.pdf (10.3 KB) cleandegraded2.pdf	osd.0 after recovering with osd.1 from death of osd.2	Alexandre Oliva, 09/29/2011 01:56 PM
recovering1.pdf (11.1 KB) recovering1.pdf	osd.0 at peak recovering alone from death of osd.1	Alexandre Oliva, 09/29/2011 01:56 PM
beforemon-clean3.pdf (16 KB) beforemon-clean3.pdf	incremental memory use for recovery with osd.1 and osd.2	Alexandre Oliva, 09/29/2011 01:56 PM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Feature #1583

osd: bound pg log memory usage

Updated by Alexandre Oliva over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Sage Weil over 12 years ago

Updated by Loïc Dachary over 9 years ago