Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-06-28T04:48:22ZCeph
Redmine Ceph - Backport #20443 (Resolved): kraken: osd: client IOPS drops to zero frequentlyhttps://tracker.ceph.com/issues/204432017-06-28T04:48:22ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/15962">https://github.com/ceph/ceph/pull/15962</a></p> Ceph - Backport #20428 (Resolved): jewel: osd: client IOPS drops to zero frequentlyhttps://tracker.ceph.com/issues/204282017-06-27T11:13:38ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/15947">https://github.com/ceph/ceph/pull/15947</a></p> Ceph - Bug #20427 (Resolved): osd: client IOPS drops to zero frequentlyhttps://tracker.ceph.com/issues/204272017-06-27T10:17:03ZAlexey Sheplyakovasheplyakov@mirantis.com
<p>[From <a class="external" href="http://www.spinics.net/lists/ceph-devel/msg37163.html">http://www.spinics.net/lists/ceph-devel/msg37163.html</a>]</p>
<p>At Alibaba, we experienced unstable performance with Jewel on one<br />production cluster, and we can easily reproduce it now with several<br />small test clusters. One test cluster has 30 SSDs, and another test<br />one has 120 SSDs, we are using filestore+async messenger on the<br />backend and fio+librbd to test them. When this issue happens, client<br />fio IOPS drops to zero (or close to zero) frequently during fio runs.<br />And the durations of those drops were very short, about 1 second or<br />so.</p>
<p>For the 30 SSDs test cluster, we use 135 client fio writing into 135<br />rbd images individually, each fio has only 1 job and rate limit is<br />3MB/s. On this fresh created test cluster, for all 135 client fio<br />runs, during first 15 minutes or so, client IOPS were very stable and<br />each OSD server's throughput was very stable as well. After 15 minutes<br />and 360 GB data written, the test cluster entered an unstable state,<br />client fio IOPS dropped to zero (or close) frequently and each OSD<br />server's throughput became very spiky as well (from 500MB/s to less<br />1MB/s). We tried let all fio keeping writing for about 16 hours,<br />cluster was still in this swing state.</p>
<p>This is very easily reproducible. I don't think it's caused by<br />filestore folder splitting, since they were all done during the first<br />15 minutes. And also, OSD server mem/cpu/disk were far from saturated.<br />One thing we noticed from perf counter is that op_latency increased<br />from 0.7 ms to >20 ms after entering this unstable state. Is this<br />normal Jewel/filestore behavior? Anyone knows what causes it?</p> rbd - Backport #19957 (Resolved): jewel: rbd: Lock release requests not honored after watch is re...https://tracker.ceph.com/issues/199572017-05-17T09:14:43ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/17385">https://github.com/ceph/ceph/pull/17385</a></p> Ceph - Backport #19928 (Resolved): kraken: mon crash on shutdown, lease_ack_timeout eventhttps://tracker.ceph.com/issues/199282017-05-15T09:32:40ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/15084">https://github.com/ceph/ceph/pull/15084</a></p> Ceph - Backport #19926 (Resolved): jewel: mon crash on shutdown, lease_ack_timeout eventhttps://tracker.ceph.com/issues/199262017-05-15T08:16:42ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/15083">https://github.com/ceph/ceph/pull/15083</a></p> Ceph - Backport #19916 (Resolved): kraken: osd/OSD.h: 706: FAILED assert(removed) in PG::unreg_ne...https://tracker.ceph.com/issues/199162017-05-12T13:12:21ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/15066">https://github.com/ceph/ceph/pull/15066</a></p> Ceph - Backport #19915 (Resolved): jewel: osd/OSD.h: 706: FAILED assert(removed) in PG::unreg_nex...https://tracker.ceph.com/issues/199152017-05-12T12:42:17ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/15065">https://github.com/ceph/ceph/pull/15065</a></p> Ceph - Backport #19910 (Resolved): jewel: random OSDs fail to start after reboot with systemdhttps://tracker.ceph.com/issues/199102017-05-11T13:49:37ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/15051">https://github.com/ceph/ceph/pull/15051</a></p> Ceph - Backport #19647 (Resolved): kraken: ceph-disk: directory-backed OSDs do not start on boothttps://tracker.ceph.com/issues/196472017-04-18T09:18:50ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/14604">https://github.com/ceph/ceph/pull/14604</a></p> Ceph - Backport #19646 (Resolved): jewel: ceph-disk: directory-backed OSDs do not start on boothttps://tracker.ceph.com/issues/196462017-04-18T08:18:58ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/14602">https://github.com/ceph/ceph/pull/14602</a></p> Ceph - Backport #19508 (Resolved): Upgrading from 0.94.6 to 10.2.6 can overload monitors (failed ...https://tracker.ceph.com/issues/195082017-04-06T08:42:09ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/14392">https://github.com/ceph/ceph/pull/14392</a></p> Ceph - Backport #19323 (Rejected): hammer: segfault in FileStore::fiemap()https://tracker.ceph.com/issues/193232017-03-21T12:18:47ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/14069">https://github.com/ceph/ceph/pull/14069</a></p> rgw - Backport #19322 (Resolved): kraken: multisite: possible infinite loop in RGWFetchAllMetaCRhttps://tracker.ceph.com/issues/193222017-03-21T11:00:41ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="https://github.com/ceph/ceph/pull/14067">https://github.com/ceph/ceph/pull/14067</a></p> rgw - Backport #19321 (Resolved): jewel: multisite: possible infinite loop in RGWFetchAllMetaCRhttps://tracker.ceph.com/issues/193212017-03-21T10:42:15ZAlexey Sheplyakovasheplyakov@mirantis.com
<p><a class="external" href="http://tracker.ceph.com/issues/17655">http://tracker.ceph.com/issues/17655</a></p>