Project

General

Profile

Bug #1590

occasionally excessive mon memory footprint

Added by Alexandre Oliva almost 10 years ago. Updated almost 10 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have 3 mons that share disks with osds. Sometimes, when btrfs gets into a mode in which syncs are delayed, the mons get into a state in which many subsequent elections get different results, and mons that used to be in the active set end up being kicked out for lagging behind. In these circumstances, if they were primary, they appear to start piling up messages to be relayed to the primary, and memory use grows, apparently exponentially.

The attached memory profile is from mon.1; it had grown from the baseline memory use of about 120MB to 16GB of virtual memory, 12.5GB heap, before I killed it. mon.0 had at the same time grown from the same baseline to some 3.5GB of virtual memory, but its heap, that peaked at 2.5GB, had gone back down to 125MB. mon.2 never went past the baseline.

This was collected with 0.35, but I had run into this with many earlier versions of ceph.

hugemon.pdf - peak memory use graph for mon.1, before I killed it (7.98 KB) Alexandre Oliva, 10/01/2011 01:23 PM


Related issues

Duplicates Ceph - Feature #1646: mon: catch up on committed items before attempting to join quorum Resolved 10/21/2011

History

#1 Updated by Alexandre Oliva almost 10 years ago

I've just run into this while only two out of the 3 mons were up: mon.0 was taking several minutes to complete a sync (a btrfs bug I've been looking into), and mon.1's memory use was at almost 16GB when I restarted it. So it doesn't take a third lagging monitor to trigger the problem: perhaps a lagging primary is the trigger.

#2 Updated by Sage Weil almost 10 years ago

  • Category set to Monitor

this will go away with #1646.

#3 Updated by Sage Weil almost 10 years ago

  • Status changed from New to Duplicate

Also available in: Atom PDF