Bug #9113
closed
osd: snap trimming eats memory, linearly
Added by Sage Weil over 9 years ago.
Updated over 9 years ago.
Backport:
firefly, dumpling
Description
- rados pool snapshot taken weekly
- trimmed when >30 days old
- trimming makes some osds consume memory linearly
- restarting the osd resets, but memory consumption continues until trimming completes
google heap profiler has been unable to produce any useful data on the leak except some weak evidence that some memory is consumed by the transactions doing the trimming. but the total tracked heap was small so this may just be normal operation.
a few notes:
we think trims remove no more than 10-20 TB. the delta between live data and the bottom of the snap stack at the moment is 250TB. we also know that the memory issue is proportional to the delta between snapshots.
this situation is abnormal, because we removed 250TB from the cluster
ie its not normal to be growing at 250TB that fast heh
basically, the memory consumption progresses linearly across all osds housing primary pgs
the slope of the progression is directly related to the snap_trim_sleep osd setting
the lower the snap_trim_sleep, the greater the slope
- Subject changed from osd: snap trimming eats memory, linearly (dumpling) to osd: snap trimming eats memory, linearly
- Assignee set to Samuel Just
It's not just dumpling, the repops set in the snap trimmer is just wonky. We need to trim a bounded set of objects, wait, trim bounded set of objects, wait, etc.
- Status changed from New to 7
- Status changed from 7 to Fix Under Review
- Status changed from Fix Under Review to Pending Backport
- Backport set to firefly, dumpling
- Status changed from Pending Backport to 7
There's another piece. The trimmer is constantly requeueing.
- Status changed from 7 to Fix Under Review
- Status changed from Fix Under Review to Pending Backport
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF