Documentation #44503: Document CephFS's behaviour on O_APPEND - CephFS - Ceph

Actions

Copy link

Documentation #44503

open

Document CephFS's behaviour on O_APPEND

Added by Niklas Hambuechen about 4 years ago. Updated about 4 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Tags:

Backport:

Reviewed:

Affected Versions:

Labels (FS):

Pull request ID:

Description

I have noticed that on my CephFS (13.2.2) file system mounted via fuse, if multiple writers `O_APPEND` to a file simultaneously while keeping the FD open (the typical logging use case), many bytes get lost.

I have been trying to figure out what the intended, or current, semantics are, but it seems the documentation is insufficient and should be improved.

On https://docs.ceph.com/docs/master/cephfs/posix/ `O_APPEND` is not mentioned. The only thing that sounds tangentially relevant is

In shared simultaneous writer situations, a write that crosses object boundaries is not necessarily atomic. This means that you could have writer A write “aa|aa” and writer B write “bb|bb” simultaneously (where | is the object boundary), and end up with “aa|bb” rather than the proper “aa|aa” or “bb|bb”.

However, even that doesn't quite catch it, because with `O_APPEND` I would expect "aaaabbbb", "aabbaabb" or any other interleaving of these 8 characters.

Beyond that, I could only find:

A mailing list post from 2015 (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004280.html) with a quote "This fix is still racy for multiple writer case. If you want strict append behaviour, please wrap each write with file lock" -- no idea if that is still the situation
#17564 (maybe related?)
#2825 (maybe related?)

None of that qualifies as proper documentation.

Regarding the "please wrap each write with file lock" hint, it is also unclear from the same documentation page how good CephFS's lock support is (see also http://0pointer.de/blog/projects/locking.html for the general problem and the various choices).

I think https://docs.ceph.com/docs/master/cephfs/posix/ should be extended to document how users should expect `O_APPEND` to behave. It would be extremely useful.

Thanks!

Actions

Copy link

Updated by Niklas Hambuechen about 4 years ago

Also perhaps relevant:

#7333

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries