Project

General

Profile

Actions

Documentation #44503

open

Document CephFS's behaviour on O_APPEND

Added by Niklas Hambuechen about 4 years ago. Updated about 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Labels (FS):
Pull request ID:

Description

I have noticed that on my CephFS (13.2.2) file system mounted via fuse, if multiple writers `O_APPEND` to a file simultaneously while keeping the FD open (the typical logging use case), many bytes get lost.

I have been trying to figure out what the intended, or current, semantics are, but it seems the documentation is insufficient and should be improved.

On https://docs.ceph.com/docs/master/cephfs/posix/ `O_APPEND` is not mentioned. The only thing that sounds tangentially relevant is

In shared simultaneous writer situations, a write that crosses object boundaries is not necessarily atomic. This means that you could have writer A write “aa|aa” and writer B write “bb|bb” simultaneously (where | is the object boundary), and end up with “aa|bb” rather than the proper “aa|aa” or “bb|bb”.

However, even that doesn't quite catch it, because with `O_APPEND` I would expect "aaaabbbb", "aabbaabb" or any other interleaving of these 8 characters.

Beyond that, I could only find:

None of that qualifies as proper documentation.

Regarding the "please wrap each write with file lock" hint, it is also unclear from the same documentation page how good CephFS's lock support is (see also http://0pointer.de/blog/projects/locking.html for the general problem and the various choices).

I think https://docs.ceph.com/docs/master/cephfs/posix/ should be extended to document how users should expect `O_APPEND` to behave. It would be extremely useful.

Thanks!

Actions #1

Updated by Niklas Hambuechen about 4 years ago

Also perhaps relevant:

Actions

Also available in: Atom PDF