Actions
Bug #9113
closedosd: snap trimming eats memory, linearly
% Done:
0%
Source:
Community (user)
Tags:
Backport:
firefly, dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
- rados pool snapshot taken weekly
- trimmed when >30 days old
- trimming makes some osds consume memory linearly
- restarting the osd resets, but memory consumption continues until trimming completes
google heap profiler has been unable to produce any useful data on the leak except some weak evidence that some memory is consumed by the transactions doing the trimming. but the total tracked heap was small so this may just be normal operation.
Updated by Sage Weil over 9 years ago
a few notes:
we think trims remove no more than 10-20 TB. the delta between live data and the bottom of the snap stack at the moment is 250TB. we also know that the memory issue is proportional to the delta between snapshots. this situation is abnormal, because we removed 250TB from the cluster ie its not normal to be growing at 250TB that fast heh basically, the memory consumption progresses linearly across all osds housing primary pgs the slope of the progression is directly related to the snap_trim_sleep osd setting the lower the snap_trim_sleep, the greater the slope
Updated by Samuel Just over 9 years ago
- Subject changed from osd: snap trimming eats memory, linearly (dumpling) to osd: snap trimming eats memory, linearly
- Assignee set to Samuel Just
It's not just dumpling, the repops set in the snap trimmer is just wonky. We need to trim a bounded set of objects, wait, trim bounded set of objects, wait, etc.
Updated by Samuel Just over 9 years ago
- Status changed from 7 to Fix Under Review
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to firefly, dumpling
Updated by Samuel Just over 9 years ago
- Status changed from Pending Backport to 7
There's another piece. The trimmer is constantly requeueing.
Updated by Samuel Just over 9 years ago
- Status changed from 7 to Fix Under Review
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Sage Weil over 9 years ago
- Status changed from Pending Backport to Resolved
Actions