Project

General

Profile

Bug #3737

Higher ping-latency observed in qemu with rbd_cache=true during disk-write

Added by Oliver Francke over 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
bobtail
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Hi Josh,

as per our short conversation in IRC-#ceph there is an issue with latency/responsiveness with rbd_cache enabled, no matter what cache= says.
In the lab we have qemu-1.2.2 currently as well as ceph version 0.56-111-ga14a36e (a14a36ed78d9febb7fbf1f6bf209d9bd58daace6)

Please advise necessary debug-switches to narrow down the problem.

Thnx and have a pretty good year to all of you ;)

Oliver.

905_test.log.xz (2.27 MB) Oliver Francke, 01/21/2013 09:35 AM

ping.log.xz (1.14 KB) Oliver Francke, 01/21/2013 09:35 AM

test.log View - /tmp/test.log (15.6 KB) Chris Dunlop, 02/20/2013 11:20 PM


Related issues

Related to rbd - Subtask #4091: ObjectCacher: optionally make readx/writex calls never block Resolved 02/11/2013

Associated revisions

Revision 302b93c4 (diff)
Added by Josh Durgin over 7 years ago

librbd: add an async flush

At this point it's a simple wrapper around the ObjectCacher or
librados.

This is needed for QEMU so that its main thread can continue while a
flush is occurring. Since this will be backported, don't update the
librbd version yet, just add a #define that QEMU and others can use to
detect the presence of aio_flush().

Refs: #3737
Signed-off-by: Josh Durgin <>

Revision 31a45e8e (diff)
Added by Josh Durgin over 7 years ago

librbd: add an async flush

At this point it's a simple wrapper around the ObjectCacher or
librados.

This is needed for QEMU so that its main thread can continue while a
flush is occurring. Since this will be backported, don't update the
librbd version yet, just add a #define that QEMU and others can use to
detect the presence of aio_flush().

Refs: #3737
Signed-off-by: Josh Durgin <>
(cherry picked from commit 302b93c478b3f4bc2c82bfb08329e3c98389dd97)

History

#1 Updated by Sage Weil over 7 years ago

  • Priority changed from Normal to High

#2 Updated by Ian Colle over 7 years ago

  • Project changed from Ceph to rbd
  • Category deleted (qemu)
  • Target version deleted (v0.56)

#3 Updated by Oliver Francke over 7 years ago

Hi Josh,

according to our conversation I did some testing.
I started the dd if=/dev... of=/tmp/doof.dat bs=4k count=256000 at around 18:10:00 as you can assume with my ping.log.
I think highest RTT was 500ms. And all above let's say 3-5ms I do not see with rbd_cache=false.

Best regards,

Oliver.

#4 Updated by Chris Dunlop over 7 years ago

Confirmed here, with ceph-0.56.3 and qemu-1.3.1.

See attached test output.

A summary is, the average ping time, and the standard deviation of the same, is much worse with rbd_cache=1:

rbd_cache=0: Avg: 0.493 ms Std: 0.109 ms
rbd_cache=1: Avg: 148.107 ms Std: 219.786 ms

#5 Updated by Chris Dunlop over 7 years ago

Sigh. The attachment might help...

#6 Updated by Josh Durgin over 7 years ago

I've looked at the logs, and I think #4091 should fix this. The high ping times tend to occur around when the cache fills up, making aio_write() block.

#7 Updated by Sage Weil over 7 years ago

  • Tracker changed from Bug to Fix

#8 Updated by Ian Colle over 7 years ago

  • Target version set to v0.60

#9 Updated by Sage Weil over 7 years ago

  • translation missing: en.field_story_points set to 8.00

#10 Updated by Neil Levine over 7 years ago

  • Status changed from New to 12

#11 Updated by Sage Weil over 7 years ago

  • Status changed from 12 to 7

#12 Updated by Ian Colle over 7 years ago

  • Target version changed from v0.60 to v0.61 - Cuttlefish

#13 Updated by Josh Durgin over 7 years ago

Looks like I finally found a fix - using an explicitly asynchronous flush (instead of the sync flush made async by qemu coroutines) fixes the problem in my environment. The rest of the I/O through qemu already uses explicitly async calls, so it's something about the interaction with coroutines or the way in which qemu uses coroutines to make the sync flush async. I'd still like to dig deeper to see what the underlying issue is, and see whether it's a generic problem in qemu or a known bad idea to mix aio and qemu coroutines.

#14 Updated by Josh Durgin over 7 years ago

There's no way around it - we need an async flush in librbd. Using coroutines vs callbacks doesn't matter in this case, if the flush is not async, there's no way for the coroutine to yield.

#15 Updated by Josh Durgin over 7 years ago

  • Status changed from 7 to Fix Under Review

#16 Updated by Josh Durgin over 7 years ago

  • Status changed from Fix Under Review to Resolved

commit:95c4a81be1af193786d0483fcbe81104d3da7c40 Note that the qemu patch still needs to get merged upstream (#4581).

#17 Updated by Josh Durgin over 7 years ago

  • Tracker changed from Fix to Bug
  • Status changed from Resolved to Pending Backport
  • Backport set to bobtail

#18 Updated by Stefan Priebe over 7 years ago

Thanks for your great work! Is there already a way / branch to test this with bobtail?

#19 Updated by Josh Durgin over 7 years ago

  • Status changed from Pending Backport to 7

The branch wip-bobtail-rbd-backports-req-order has the fix for this plus several other bugs backported on top of the current bobtail branch. It passes simple testing, and is going through more thorough testing overnight.

#20 Updated by Oliver Francke over 7 years ago

Hi Josh,

sounds promising, unfortunately I'm currently on 0.60... in our lab. We are going to move forward to latest bobtail next week in our productive env perhaps, do you think it will make it into this package?

Thnx n best regards,

Oliver.

#21 Updated by Josh Durgin over 7 years ago

Yeah, the backports should definitely be merged by next week. On your lab cluster, you could try librbd from the 'next' branch, which has the librbd side of the fix for this.

#22 Updated by Oliver Francke over 7 years ago

Well,

could it be, that the fix already made it into "ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)"? I did not see any high latencies while writing...

Oliver.

#23 Updated by Oliver Francke over 7 years ago

Ooops, sorry...,

was a bit misleaded, cause "cache=writeback" was still in the config file.

Oliver.

#24 Updated by Wido den Hollander over 7 years ago

I just tested the Qemu patch with a cherry-pick to Qemu 1.2 and with the wip-bobtail-rbd-backports-req-order branch and that does indeed seem to improve the write performance a lot.

I saw about a 90% performance increase on this particular system.

#25 Updated by Josh Durgin over 7 years ago

  • Status changed from 7 to Resolved

Thanks for testing it out everyone. It's now in the bobtail branch too.

#26 Updated by Edwin Peer about 7 years ago

Using ceph 0.61.2 and qemu 1.4.2 or earlier versions with the patch:

The following hangs after a few iterations:

phobos ~ # i=0; while [ $i -lt 30 ]; do dd if=/dev/zero of=test bs=4k count=1000000 conv=fdatasync; i=$[$i+1]; done
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 141.949 s, 28.9 MB/s
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 115.936 s, 35.3 MB/s

If I revert the qemu patch, then it no longer locks up, but the latency issue is present (even with caching disabled).

Any ideas?

#27 Updated by Edwin Peer about 7 years ago

Update: seems to work fine if I turn writeback caching back on again (previously turned off before patching).

Also available in: Atom PDF