https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2017-02-14T12:07:26Z
Ceph
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=86050
2017-02-14T12:07:26Z
Marek Panek
<ul></ul><p>We observe the same effect in 11.2. After some time OSDs consume ~6G RAM memory (12 OSDs per 64G RAM server) and finally got killed by OOM.</p>
<pre><code>8700 ceph 20 0 6687476 5,286g 0 S 6,0 8,4 453:41.07 ceph-osd <br /> 8525 ceph 20 0 6531072 5,286g 0 S 3,3 8,4 410:53.50 ceph-osd <br /> 8921 ceph 20 0 6403132 5,178g 0 S 3,3 8,2 478:13.86 ceph-osd <br /> 8942 ceph 20 0 6360136 5,147g 0 S 4,0 8,2 463:37.74 ceph-osd <br /> 8841 ceph 20 0 6387304 5,141g 0 S 6,3 8,2 479:56.43 ceph-osd <br /> 8625 ceph 20 0 6439932 5,132g 0 S 4,0 8,2 485:43.86 ceph-osd <br /> 8703 ceph 20 0 6366108 5,114g 0 S 2,0 8,1 433:00.83 ceph-osd <br /> 8621 ceph 20 0 6366424 5,082g 0 S 12,3 8,1 499:04.82 ceph-osd <br /> 8577 ceph 20 0 6398816 5,036g 0 S 3,6 8,0 390:14.41 ceph-osd <br /> 8551 ceph 20 0 6274940 4,969g 0 S 5,0 7,9 417:29.32 ceph-osd <br /> 5105 ceph 20 0 6016388 4,898g 0 S 5,6 7,8 200:36.20 ceph-osd <br />27040 ceph 20 0 5793032 4,685g 0 S 6,6 7,5 211:34.08 ceph-osd</code></pre>
<p>They are 4TB OSDs.</p>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=86051
2017-02-14T12:09:03Z
Marek Panek
<ul></ul><p>Marek Panek wrote:</p>
<blockquote>
<p>We observe the same effect in 11.2 with bluestore. After some time OSDs consume ~6G RAM memory (12 OSDs per 64G RAM server) and finally got killed by OOM.</p>
<p>8700 ceph 20 0 6687476 5,286g 0 S 6,0 8,4 453:41.07 ceph-osd <br />8525 ceph 20 0 6531072 5,286g 0 S 3,3 8,4 410:53.50 ceph-osd <br />8921 ceph 20 0 6403132 5,178g 0 S 3,3 8,2 478:13.86 ceph-osd <br />8942 ceph 20 0 6360136 5,147g 0 S 4,0 8,2 463:37.74 ceph-osd <br />8841 ceph 20 0 6387304 5,141g 0 S 6,3 8,2 479:56.43 ceph-osd <br />8625 ceph 20 0 6439932 5,132g 0 S 4,0 8,2 485:43.86 ceph-osd <br />8703 ceph 20 0 6366108 5,114g 0 S 2,0 8,1 433:00.83 ceph-osd <br />8621 ceph 20 0 6366424 5,082g 0 S 12,3 8,1 499:04.82 ceph-osd <br />8577 ceph 20 0 6398816 5,036g 0 S 3,6 8,0 390:14.41 ceph-osd <br />8551 ceph 20 0 6274940 4,969g 0 S 5,0 7,9 417:29.32 ceph-osd <br />5105 ceph 20 0 6016388 4,898g 0 S 5,6 7,8 200:36.20 ceph-osd <br />27040 ceph 20 0 5793032 4,685g 0 S 6,6 7,5 211:34.08 ceph-osd</p>
<p>They are 4TB OSDs.</p>
</blockquote>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=86150
2017-02-16T11:20:12Z
Wido den Hollander
wido@42on.com
<ul><li><strong>Category</strong> set to <i>107</i></li><li><strong>Source</strong> set to <i>Community (user)</i></li><li><strong>Tags</strong> set to <i>bluestore,kraken,memory</i></li></ul><p>This was discussed during Yesterday's performance meeting and Sage suggested that this is indeed a memory leak.</p>
<p>Although lowering bluestore_cache_size seems to fix it it only masquerades the real issue.</p>
<p>bluestore_cache_size is also not a setting per shard, but it's divided by the amount of shards inside the code.</p>
<p>Try running with 'debug_mempools = true' and then run:</p>
<pre>ceph daemon osd.X dump_mempools</pre>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=86186
2017-02-16T21:56:40Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-10 priority-5 priority-high3 closed" href="/issues/18926">Bug #18926</a>: Why osds do not release memory?</i> added</li></ul>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=88093
2017-03-28T14:51:00Z
Jaime Ruiz
<ul></ul><p>Fixed with the following commands:</p>
<p>The memory is released by applying the following commands in a content node:</p>
<p>968 28/03/17 12:51:28 systemctl start <a class="email" href="mailto:ceph-mgr@cn5.service">ceph-mgr@cn5.service</a><br />969 28/03/17 12:51:33 systemctl status <a class="email" href="mailto:ceph-mgr@cn5.service">ceph-mgr@cn5.service</a><br />972 28/03/17 12:52:37 systemctl stop <a class="email" href="mailto:ceph-mgr@cn5.service">ceph-mgr@cn5.service</a></p>
<p>We can see how the memory is released here:</p>
<p>cn5.vn1ldv1c1.cdn ~# sar -r 1 -f /var/log/sa/sa28 | grep PM<br />12:00:02 PM 26310136 369715512 93.36 354416 1753244 376096532 94.97 288524784 854352 32<br />12:10:01 PM 30571164 365454484 92.28 1146264 1861208 370760812 93.62 283487992 1661652 48<br />12:20:01 PM 29749384 366276264 92.49 1641044 1863668 370914752 93.66 283952176 2013512 188<br />12:30:01 PM 29285088 366740560 92.61 2174092 1794624 371147340 93.72 284120960 2328764 32<br />12:40:01 PM 28929392 367096256 92.70 2389656 1802116 371380628 93.78 284346764 2394176 40<br />12:50:01 PM 28524076 367501572 92.80 2349836 1796560 371526164 93.81 284879696 2197292 40<br />01:00:01 PM 160678156 235347492 59.43 2859676 1806632 371668548 93.85 152318728 2561288 516<br />01:10:01 PM 154097068 241928580 61.09 8666108 1804952 373251920 94.25 155266968 6171812 32<br />01:20:01 PM 153453208 242572440 61.25 8883900 1916188 373617584 94.34 156084024 5992300 16<br />01:30:01 PM 153278356 242747292 61.30 8734576 1925748 373754680 94.38 156598692 5622352 8<br />01:40:01 PM 153312836 242712812 61.29 8087932 1928984 374131420 94.47 157366324 4794936 32<br />01:50:01 PM 154681508 241344140 60.94 6972484 1934452 374446920 94.55 156447128 4292416 8<br />02:00:01 PM 154454348 241571300 61.00 7026772 1938780 374585740 94.59 156656448 4277764 8<br />02:10:01 PM 154680956 241344692 60.94 7001188 1942524 374724676 94.62 156383224 4289180 8<br />02:20:01 PM 154634124 241391524 60.95 6920016 1935084 374887256 94.66 156468928 4223128 8</p>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=88095
2017-03-28T14:55:52Z
Jaime Ruiz
<ul></ul><p>We decided to stop the ceph-mgr service in all the nodes because is using lot of CPU and we understood that this service was not required.</p>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=88172
2017-03-29T06:42:07Z
Muthusamy Muthiah
<ul></ul><p>Hi Jaime,</p>
<p>The issue not fixed with this workaround, and we will address this workaround in another issue related to ceph-mgr<br /><a class="external" href="http://tracker.ceph.com/issues/18994">http://tracker.ceph.com/issues/18994</a>. Clearly the above updates are side effects of stopping ceph-mgr due to high CPU utilisation.</p>
<p>This ticket is still valid , has to solve the memory leak issue and the current workaround is to reduce bluestore_cache_size from 1GB to 100MB.</p>
<p>Regards,<br />Muthu</p>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=88183
2017-03-29T13:00:24Z
Nokia ceph-users
<ul></ul><p>This is a bug with the ceph-mgr service -->> <a class="external" href="http://tracker.ceph.com/issues/19407">http://tracker.ceph.com/issues/19407</a> and currently set to need review state.</p>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=88186
2017-03-29T13:07:46Z
Nokia ceph-users
<ul></ul><p>sorry wrong window . ignore my previous comment</p>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=88283
2017-03-31T15:41:20Z
Aaron T
ceph-redmine@aarontc.com
<ul></ul><p>I'm experiencing this runaway memory issue as well. It only appeared a couple of days ago. I tried setting the bluestore cache directive to ~100MB ( "bluestore_cache_size": "107374182") but this has not had a noticeable effect on the problem. Two OSD hosts with 64GiB RAM and ~13 bluestore OSDs have suddenly been OOM killing ceph-osd procs when memory pressure is exceeding 128GiB (RAM+swap)...</p>
<p>This 11.2.0 cluster has been running just fine for ~2 months before this issue suddenly appeared, with no discernible change in usage patterns and no hardware/software reconfiguration. Very strange.</p>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=92799
2017-06-14T05:23:36Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>RADOS</i></li><li><strong>Category</strong> changed from <i>107</i> to <i>Performance/Resource Usage</i></li><li><strong>Component(RADOS)</strong> <i>BlueStore</i> added</li></ul>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=93484
2017-06-21T02:21:37Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/15792">https://github.com/ceph/ceph/pull/15792</a></p>
<p>should help</p>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=93485
2017-06-21T02:22:02Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li></ul>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=93506
2017-06-21T08:44:12Z
Nathan Cutler
ncutler@suse.cz
<ul></ul><p><strong>master PR</strong>: <a class="external" href="https://github.com/ceph/ceph/pull/15295">https://github.com/ceph/ceph/pull/15295</a><br /><strong>kraken backport PR</strong>: <a class="external" href="https://github.com/ceph/ceph/pull/15792">https://github.com/ceph/ceph/pull/15792</a></p>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=93513
2017-06-21T09:36:54Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Backport</strong> set to <i>kraken</i></li></ul>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=93521
2017-06-21T11:50:56Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/20366">Backport #20366</a>: kraken: kraken-bluestore 11.2.0 memory leak issue</i> added</li></ul>
RADOS - Bug #18924: kraken-bluestore 11.2.0 memory leak issue
https://tracker.ceph.com/issues/18924?journal_id=94416
2017-07-05T20:05:38Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>