Project

General

Profile

Actions

Bug #9113

closed

osd: snap trimming eats memory, linearly

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
firefly, dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

- rados pool snapshot taken weekly
- trimmed when >30 days old
- trimming makes some osds consume memory linearly
- restarting the osd resets, but memory consumption continues until trimming completes

google heap profiler has been unable to produce any useful data on the leak except some weak evidence that some memory is consumed by the transactions doing the trimming. but the total tracked heap was small so this may just be normal operation.

Actions #1

Updated by Sage Weil over 9 years ago

a few notes:


we think trims remove no more than 10-20 TB.  the delta between live data and the bottom of the snap stack at the moment is 250TB.  we also know that the memory issue is proportional to the delta between snapshots.
this situation is abnormal, because we removed 250TB from the cluster
ie its not normal to be growing at 250TB that fast heh

basically, the memory consumption progresses linearly across all osds housing primary pgs
the slope of the progression is directly related to the snap_trim_sleep osd setting
the lower the snap_trim_sleep, the greater the slope

Actions #2

Updated by Samuel Just over 9 years ago

  • Subject changed from osd: snap trimming eats memory, linearly (dumpling) to osd: snap trimming eats memory, linearly
  • Assignee set to Samuel Just

It's not just dumpling, the repops set in the snap trimmer is just wonky. We need to trim a bounded set of objects, wait, trim bounded set of objects, wait, etc.

Actions #3

Updated by Samuel Just over 9 years ago

  • Status changed from New to 7
Actions #4

Updated by Samuel Just over 9 years ago

  • Status changed from 7 to Fix Under Review
Actions #5

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to firefly, dumpling
Actions #6

Updated by Samuel Just over 9 years ago

  • Status changed from Pending Backport to 7

There's another piece. The trimmer is constantly requeueing.

Actions #7

Updated by Samuel Just over 9 years ago

  • Status changed from 7 to Fix Under Review
Actions #8

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #9

Updated by Samuel Just over 9 years ago

Backported to firefly.

Actions #10

Updated by Sage Weil over 9 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF