https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2013-01-10T09:50:26ZCeph rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=155382013-01-10T09:50:26ZSage Weilsage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=155822013-01-10T12:52:53ZIan Colleicolle@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>rbd</i></li><li><strong>Category</strong> deleted (<del><i>qemu</i></del>)</li><li><strong>Target version</strong> deleted (<del><i>v0.56</i></del>)</li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=161572013-01-21T09:35:50ZOliver FranckeOliver@filoo.de
<ul><li><strong>File</strong> <a href="/attachments/download/637/905_test.log.xz">905_test.log.xz</a> added</li><li><strong>File</strong> <a href="/attachments/download/638/ping.log.xz">ping.log.xz</a> added</li></ul><p>Hi Josh,</p>
<p>according to our conversation I did some testing.<br />I started the dd if=/dev... of=/tmp/doof.dat bs=4k count=256000 at around 18:10:00 as you can assume with my ping.log.<br />I think highest RTT was 500ms. And all above let's say 3-5ms I do not see with rbd_cache=false.</p>
<p>Best regards,</p>
<p>Oliver.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=176902013-02-20T23:18:42ZChris Dunlopchris@onthe.net.au
<ul></ul><p>Confirmed here, with ceph-0.56.3 and qemu-1.3.1.</p>
<p>See attached test output.</p>
<p>A summary is, the average ping time, and the standard deviation of the same, is much worse with rbd_cache=1:</p>
<p>rbd_cache=0: Avg: 0.493 ms Std: 0.109 ms<br />rbd_cache=1: Avg: 148.107 ms Std: 219.786 ms</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=176912013-02-20T23:20:00ZChris Dunlopchris@onthe.net.au
<ul><li><strong>File</strong> <a href="/attachments/download/691/test.log">test.log</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/691/test.log">View</a> added</li></ul><p>Sigh. The attachment might help...</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=177152013-02-21T14:50:53ZJosh Durgin
<ul></ul><p>I've looked at the logs, and I think <a class="issue tracker-8 status-3 priority-4 priority-default closed child" title="Subtask: ObjectCacher: optionally make readx/writex calls never block (Resolved)" href="https://tracker.ceph.com/issues/4091">#4091</a> should fix this. The high ping times tend to occur around when the cache fills up, making aio_write() block.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=181212013-03-01T11:35:52ZSage Weilsage@newdream.net
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Fix</i></li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=181222013-03-01T11:36:24ZIan Colleicolle@redhat.com
<ul><li><strong>Target version</strong> set to <i>v0.60</i></li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=181292013-03-01T11:40:11ZSage Weilsage@newdream.net
<ul><li><strong>translation missing: en.field_story_points</strong> set to <i>8.00</i></li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=186032013-03-11T16:53:39ZNeil Levinenlevine@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=189332013-03-15T11:17:57ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>7</i></li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=191152013-03-18T11:24:03ZIan Colleicolle@redhat.com
<ul><li><strong>Target version</strong> changed from <i>v0.60</i> to <i>v0.61 - Cuttlefish</i></li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=193682013-03-26T01:04:34ZJosh Durgin
<ul></ul><p>Looks like I finally found a fix - using an explicitly asynchronous flush (instead of the sync flush made async by qemu coroutines) fixes the problem in my environment. The rest of the I/O through qemu already uses explicitly async calls, so it's something about the interaction with coroutines or the way in which qemu uses coroutines to make the sync flush async. I'd still like to dig deeper to see what the underlying issue is, and see whether it's a generic problem in qemu or a known bad idea to mix aio and qemu coroutines.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=194112013-03-26T12:52:31ZJosh Durgin
<ul></ul><p>There's no way around it - we need an async flush in librbd. Using coroutines vs callbacks doesn't matter in this case, if the flush is not async, there's no way for the coroutine to yield.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=196222013-03-29T11:16:13ZJosh Durgin
<ul><li><strong>Status</strong> changed from <i>7</i> to <i>Fix Under Review</i></li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=196532013-03-29T13:29:07ZJosh Durgin
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul><p>commit:95c4a81be1af193786d0483fcbe81104d3da7c40 Note that the qemu patch still needs to get merged upstream (<a class="issue tracker-1 status-3 priority-4 priority-default closed" title="Bug: qemu: use asychronous flush (Resolved)" href="https://tracker.ceph.com/issues/4581">#4581</a>).</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=196622013-03-29T13:42:36ZJosh Durgin
<ul><li><strong>Tracker</strong> changed from <i>Fix</i> to <i>Bug</i></li><li><strong>Status</strong> changed from <i>Resolved</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>bobtail</i></li></ul> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=197652013-03-31T13:10:28ZStefan Priebes.priebe@profihost.ag
<ul></ul><p>Thanks for your great work! Is there already a way / branch to test this with bobtail?</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=203882013-04-15T00:16:20ZJosh Durgin
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>7</i></li></ul><p>The branch wip-bobtail-rbd-backports-req-order has the fix for this plus several other bugs backported on top of the current bobtail branch. It passes simple testing, and is going through more thorough testing overnight.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=204172013-04-16T05:50:10ZOliver FranckeOliver@filoo.de
<ul></ul><p>Hi Josh,</p>
<p>sounds promising, unfortunately I'm currently on 0.60... in our lab. We are going to move forward to latest bobtail next week in our productive env perhaps, do you think it will make it into this package?</p>
<p>Thnx n best regards,</p>
<p>Oliver.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=204242013-04-16T10:46:37ZJosh Durgin
<ul></ul><p>Yeah, the backports should definitely be merged by next week. On your lab cluster, you could try librbd from the 'next' branch, which has the librbd side of the fix for this.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=204472013-04-17T01:00:43ZOliver FranckeOliver@filoo.de
<ul></ul><p>Well,</p>
<p>could it be, that the fix already made it into "ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)"? I did not see any high latencies while writing...</p>
<p>Oliver.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=206532013-04-22T03:02:40ZOliver FranckeOliver@filoo.de
<ul></ul><p>Ooops, sorry...,</p>
<p>was a bit misleaded, cause "cache=writeback" was still in the config file.</p>
<p>Oliver.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=207502013-04-23T07:09:47ZWido den Hollanderwido@42on.com
<ul></ul><p>I just tested the Qemu patch with a cherry-pick to Qemu 1.2 and with the wip-bobtail-rbd-backports-req-order branch and that does indeed seem to improve the write performance a lot.</p>
<p>I saw about a 90% performance increase on this particular system.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=207762013-04-23T12:07:26ZJosh Durgin
<ul><li><strong>Status</strong> changed from <i>7</i> to <i>Resolved</i></li></ul><p>Thanks for testing it out everyone. It's now in the bobtail branch too.</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=225192013-05-25T00:24:36ZEdwin Peerepeer@isoho.st
<ul></ul><p>Using ceph 0.61.2 and qemu 1.4.2 or earlier versions with the patch:</p>
<p>The following hangs after a few iterations:</p>
<p>phobos ~ # i=0; while [ $i -lt 30 ]; do dd if=/dev/zero of=test bs=4k count=1000000 conv=fdatasync; i=$[$i+1]; done<br />1000000+0 records in<br />1000000+0 records out<br />4096000000 bytes (4.1 GB) copied, 141.949 s, 28.9 MB/s<br />1000000+0 records in<br />1000000+0 records out<br />4096000000 bytes (4.1 GB) copied, 115.936 s, 35.3 MB/s</p>
<p>If I revert the qemu patch, then it no longer locks up, but the latency issue is present (even with caching disabled).</p>
<p>Any ideas?</p> rbd - Bug #3737: Higher ping-latency observed in qemu with rbd_cache=true during disk-writehttps://tracker.ceph.com/issues/3737?journal_id=225202013-05-25T07:27:19ZEdwin Peerepeer@isoho.st
<ul></ul><p>Update: seems to work fine if I turn writeback caching back on again (previously turned off before patching).</p>