Project

General

Profile

Bug #3889

krbd: handle zero-length requests

Added by Alex Elder about 11 years ago. Updated almost 11 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm pretty sure there are some special zero-length
requests (like flush) that can come down from the
block layer. Right now we're not handling them,
and we should (or should explicitly decide not to.)

History

#1 Updated by Sage Weil about 11 years ago

  • Project changed from Linux kernel client to rbd

#2 Updated by Alex Elder almost 11 years ago

  • Priority changed from Normal to High

#3 Updated by Alex Elder almost 11 years ago

  • Assignee set to Alex Elder

#4 Updated by Alex Elder almost 11 years ago

  • Status changed from New to In Progress

I just sent the following in an e-mail to Josh and Sage,
but thought I might as well document it here. If we want
to support FUA and FLUSH we simply have to tell the block
driver we support it. If we don't, that's fine, things
will work as-is and now that we're (finally) completing
writes only when they're on disk it's no different from
any other disk with no write cache.

There are two zero-length request types that we might benefit
from providing. They would be marked with FUA (force unit
access) and/or FLUSH flags, and both of provide a guarantee
about whether and when stuff is persistent in the storage medium.

FUA could be implemented pretty easily--especially given
the patch I just posted which switches to using the
safe callback for write requests. If set, the driver must
guarantee that the data in the request is persistent
when the request is marked complete.

HOWEVER, it looks as though FUA is also supposed to ensure
that all prior writes have completed as well. So marking a
request FUA also causes an implicit FLUSH.

If FLUSH is set on a request, it guarantees the "disk write
cache" is flushed before doing any I/O described by the
request. A FLUSH can be an otherwise empty request (and in
fact it will be because the block layer will convert a request
with that flag set into an empty request with it set, followed
by the original request without it set).

For this, we could return to using the non-safe completion for
write requests. But it would mean we'd have to support what
amounts to a sync command to all the osds when we got a FLUSH
request. That is probably a fine way to handle it, though it
could be refined to sync only those we knew had been written
to since the last FLUSH if we cared a lot.

This would mean supporting a mode of osd operation in which
we only occasionally or lazily do syncs, rather than doing
them after each operation (isn't that what we do for XFS?).

#5 Updated by Alex Elder almost 11 years ago

We'll discuss details at our standup, but here is an update.

Unless I misunderstand him, Sage believes that requests only
ever supply one response (ACK, or if requested, ONDISK), and
that for write requests the ACK is only returned after the
data is durable.

In that case http://tracker.ceph.com/issues/5146 doesn't really
change behavior, but it's more explicit that we're waiting for
data to be on disk.

He also felt that we could change things to not wait for the
ONDISK notice, but isn't sure it's worth the complexity.

At this point we'll resolve this one of two ways:
- Leave things the way they are. We don't report support for
FUA or FLUSH, and we already behave as a device without
a write cache, and that's all fine.
- Add support for FUA and FLUSH:
- Change rbd to not request the ONDISK acknowledgement
for write requests
- Change the osds (if necessary) to return ACK without
waiting for it being durable
- Add (if necessary) support for an osd sync request.
- Change rbd to translate a FLUSH request into an osd
sync for osds used by an image.
- (Possibly) ensure that a zero-length request with FUA
set results in a successful zero-length request to
an osd, and that a FUA request waits for the ONDISK
callback.
- Change the osd code (if necessary) to not require data
to be persistent before sending the ACK for a write
request back to the osd client.

Whether the net result of this is a performance win or
not is not clear without actually doing some experiments.

#6 Updated by Alex Elder almost 11 years ago

  • Status changed from In Progress to Won't Fix

OK, after a little more discussion... We're going to
go the easy route and just close this issue. We'll
continue to act as a disk without a write cache.

Also available in: Atom PDF