Bug #17564: close race window when handling writes on a file descriptor opened with O_APPEND - CephFS - Ceph

Actions

Copy link

Bug #17564

open

close race window when handling writes on a file descriptor opened with O_APPEND

Added by Jeff Layton over 7 years ago. Updated about 5 years ago.

Status:

New

Priority:

Low

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This comment is in _write() in the userland client code:

  // use/adjust fd pos?
  if (offset < 0) {
    lock_fh_pos(f);
    /*
     * FIXME: this is racy in that we may block _after_ this point waiting for caps, and size may
     * change out from under us.
     */
    if (f->flags & O_APPEND) {
      int r = _lseek(f, 0, SEEK_END);
      if (r < 0) {
        unlock_fh_pos(f);
        return r;
      }
    }

I think we can close this race window by ensuring that when we're doing a write to an O_APPEND file descriptor, that we request the caps we'll need to fetch the size and then just fetch that size out of the Inode directly before repositioning the offset.

Actions

Copy link

Updated by Zheng Yan over 7 years ago

Not that easy to do. MDS does not always issue caps what client wants. In this case, client wants Fsw, but MDS may only give client Fw.

Actions

Copy link

Updated by Jeff Layton over 7 years ago

For the record -- I really don't care for O_APPEND semantics, but...

_write() does this currently:

  int r = get_caps(in, CEPH_CAP_FILE_WR, CEPH_CAP_FILE_BUFFER, &have, endoff);

What I'd propose is that when we're doing an write to eof on an O_APPEND fd, that we request CEPH_CAP_FILE_WR|CEPH_CAP_FILE_SHARED, and then just fail the write if we can't get Fsw. If it succeeds, we do the lseek to EOF (maybe open-coding it to avoid losing mutex) and then issue the actual write to the correct offset.

That at least allows you to know that the size hasn't changed since you acquired caps, at least up to the point where the client_lock is released after issuing the write.

Actions

Copy link

Updated by Greg Farnum over 7 years ago

Are we allowed to fail writes like that? :/

It doesn't actually close the race in all cases, but for sane use it's probably safe to do something like grab the size as we do here, and send the op off with a rados assert that the final object is the expected size, and do a retry loop if we fail. (That doesn't cover cases where somebody writes to a larger different object.)

Actions

Copy link

Updated by Jeff Layton over 7 years ago

Good point. Yeah, a retry loop may be the best we can do in that case. I'll have to read up on rados asserts to make sure I understand what you're proposing there though.

Actions

Copy link

Updated by Patrick Donnelly about 5 years ago

Assignee deleted (~~Jeff Layton~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #17564

close race window when handling writes on a file descriptor opened with O_APPEND

Updated by Zheng Yan over 7 years ago

Updated by Jeff Layton over 7 years ago

Updated by Greg Farnum over 7 years ago

Updated by Jeff Layton over 7 years ago

Updated by Patrick Donnelly about 5 years ago