Actions
Bug #8938
closedOSD memory leak seen with fs-master-testing-basic/kernel_untar_build.sh
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Initial symptom was stuck test.
teuthology.log shows two OSDs died as if killed by OOM killer:
2014-07-26T09:40:07.565 INFO:teuthology.task.ceph.osd.0.plana62.stderr:daemon-helper: command crashed with signal 9 2014-07-26T10:38:46.369 INFO:teuthology.task.ceph.osd.2.plana91.stderr:daemon-helper: command crashed with signal 9
Logged in and saw remaining OSDs on each host using 80% and 65% of RAM respectively. At 2014-07-26T14:04:54.195 I killed the filesystem client so that the test would complete with a failure. Logs are zipping up now...
Updated by John Spray almost 10 years ago
- Status changed from New to Resolved
This was fixed at about the same time:
commit 288908b3316bc975a2b3f75aea5131d7c1cba57f Author: Sage Weil <sage@redhat.com> Date: Sat Jul 26 21:19:34 2014 -0700 Revert "Merge pull request #2129 from ceph/wip-librbd-oc" This reverts commit 74b386f03e4ca9970256db72c575589aea077534, reversing changes made to 36265d0db0d7c0eb31d25a0f77ac233b3fd198f8. The dirty_or_tx list is used by flush_set, which means we can resubmit new IOs for writes that are already in progress. This has a compounding effect that overwhelms the OSDs with dup IOs and stalls out the client. See, for example, teh failues in this run: /a/sage-2014-07-25_17:14:20-fs-wip-msgr-testing-basic-plana The fix is probably pretty simple, but reverting for now to make the tests pass. Signed-off-by: Sage Weil <sage@inktank.com>
Actions