Project

General

Profile

Actions

Documentation #24642

open

Documentation #43031: CephFS Documentation Sprint 3

doc: visibility semantics to other clients

Added by Niklas Hambuechen almost 6 years ago. Updated almost 2 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Labels (FS):
offline
Pull request ID:

Description

I believe to just have run into a situation where one CephFS (fuse) mount created a file, and another CephFS mount still did not see that file as existent 3 minutes later.

So far I could not see anything in the CephFS docs that explains whether or not there are any guarantees on visibility semantics for other clients, or upper bounds or operations after which created files are supposed to be visible to other mount clients.

The only thing possibly relevant I have found so far is from http://docs.ceph.com/docs/mimic/cephfs/client-config-ref/ has

client oc max dirty age`
Description: Set the maximum age in seconds of dirty data in the object cache before writeback.
Default: 5.0 (seconds)

It would be great if this topic could be documented explicitly, as it is rather important to know when building distributed systems where order of events is important (e.g. where other nodes need to know at what point they can expect a file to be on the FS after another client has written it).

Actions #1

Updated by Niklas Hambuechen almost 6 years ago

User `SeanR` on freenode `#ceph` reports:

nh2, I found (depening on mount options) if I don't call fsync() then other clients may not see the data until the dirty pages are flushed - I even had a case where another client would see the files metadata from the mds, but on trying to read the contents it would get 0 bytes (as the contents was not yet flushed), worse the read would populate the fscache on the reading client.

Actions #2

Updated by Patrick Donnelly almost 6 years ago

Niklas Hambuechen wrote:

I believe to just have run into a situation where one CephFS (fuse) mount created a file, and another CephFS mount still did not see that file as existent 3 minutes later.

It sounds like you may have found a bug (possibly relating to a recent readdir bug). Can you post more information about your cluster version/setup and what client versions you're using?

Actions #3

Updated by Zheng Yan almost 6 years ago

you found a bug. this can be http://tracker.ceph.com/issues/23894

Actions #4

Updated by Niklas Hambuechen almost 6 years ago

Thanks everyone for the quick answers.

Patrick Donnelly wrote:

It sounds like you may have found a bug (possibly relating to a recent readdir bug). Can you post more information about your cluster version/setup and what client versions you're using?

ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)

and using the fuse client from that same version.

As far as I can tell, the fix for http://tracker.ceph.com/issues/23894 is in 13.2.0, so I can go upgrade and try out if that makes it never appear again (though it already took me a while to trigger it, so it seems relatively rare at least under my current workload, so I may want to allow for some time before I report back).

I also noticed that http://tracker.ceph.com/issues/23894 still has a `Backport: Luminous` tag -- do you plan to backport the fix to that?

Actions #5

Updated by Patrick Donnelly almost 6 years ago

  • Status changed from New to Closed

Niklas Hambuechen wrote:

Thanks everyone for the quick answers.

Patrick Donnelly wrote:

It sounds like you may have found a bug (possibly relating to a recent readdir bug). Can you post more information about your cluster version/setup and what client versions you're using?

ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)

and using the fuse client from that same version.

As far as I can tell, the fix for http://tracker.ceph.com/issues/23894 is in 13.2.0, so I can go upgrade and try out if that makes it never appear again (though it already took me a while to trigger it, so it seems relatively rare at least under my current workload, so I may want to allow for some time before I report back).

I also noticed that http://tracker.ceph.com/issues/23894 still has a `Backport: Luminous` tag -- do you plan to backport the fix to that?

It has already been backported to the Luminous branch: http://tracker.ceph.com/issues/24049

12.2.6 should be out soon so you can also wait for that or upgrade to Mimic.

Actions #6

Updated by Niklas Hambuechen almost 6 years ago

I have upgraded to Mimic now and will check out if I see three minute delays again.

I would like to ask you to reopen the issue though, because I originally filed this as a documentation issue:

To find out and document what ceph's guarantees, non-guarantees and expected timeouts are.

(The fact that a three-minute delay is unexpected is only one bit of that information; I certainly appreciate a related issue being fixed, but that issue is not this issue.)

Actions #7

Updated by Patrick Donnelly almost 6 years ago

  • Status changed from Closed to New
  • Target version set to v14.0.0
  • Backport set to mimic,luminous
  • Affected Versions deleted (v12.2.5)
  • Labels (FS) offline added

Niklas Hambuechen wrote:

I have upgraded to Mimic now and will check out if I see three minute delays again.

I would like to ask you to reopen the issue though, because I originally filed this as a documentation issue:

To find out and document what ceph's guarantees, non-guarantees and expected timeouts are.

(The fact that a three-minute delay is unexpected is only one bit of that information; I certainly appreciate a related issue being fixed, but that issue is not this issue.)

Thanks, we'll try to address this documentation fix.

Actions #8

Updated by Patrick Donnelly about 5 years ago

  • Target version changed from v14.0.0 to v15.0.0
Actions #9

Updated by Patrick Donnelly about 5 years ago

  • Target version deleted (v15.0.0)
Actions #10

Updated by Patrick Donnelly over 4 years ago

  • Subject changed from Document visibility semantics to other clients to doc: visibility semantics to other clients
  • Assignee set to Jeff Layton
  • Target version set to v15.0.0
  • Start date deleted (06/24/2018)
  • Backport deleted (mimic,luminous)
Actions #11

Updated by Patrick Donnelly over 4 years ago

  • Status changed from New to In Progress
  • Parent task set to #43031
Actions #12

Updated by Patrick Donnelly about 4 years ago

  • Target version deleted (v15.0.0)
Actions #13

Updated by Jeff Layton over 2 years ago

It's not clear to me what this tracker bug is actually asking for. I get that you want some documentation about "guarantees" but that's rather vague. Niklas, I know it has been a few years since this was asked, but what did you have in mind here?

If you want to propose a starter document, then that's even better.

Actions #14

Updated by Niklas Hambuechen over 2 years ago

I had in mind that hopefully CephFS could have user-facing documentation about file visibility semantics across multiple mounts, addressing for example:

  • I `open(O_CREAT)` a new file. When will this file be visible in mounts of other machines? Immediately (blocking the `open()`? After a certain configurable time period that flushes some cache? After the corresponding `close()`? Or without any guarantees, alias "when Ceph had wall-time available to propagate things over the network"?
  • Does this apply equally to various types of dirents (e.g. file vs directory)? Any differences between the kernel and the FUSE mount in this regard?
  • Event ordering: If I create files `A` and `B` in that order, on the same mount, is it possible that other mounts observe a situation in which `B` exist but `A` only appears later? Some classifications of certain operations' behaviour would be useful, perhaps in the distributed systems terminology that's nicely laid out in https://jepsen.io/consistency and commonly used for analysing the behaviour of (distributed and non-distributed) databases.

Thus similar to https://docs.ceph.com/en/latest/cephfs/posix/ but POSIX mainly deals with what a machine can see after it modified something; instead I'd love to see stated what CephFS aims to guarantee (or not guarantee) when other machines modify the FS.

I believe this is important to have, so that a CephFS user, when debugging some problem with files being visible from one machine but not another, can easily look up with behaviours are allowed/explained by the design, and which ones are likely bugs.

Actions #15

Updated by Jeff Layton almost 2 years ago

  • Assignee deleted (Jeff Layton)
Actions

Also available in: Atom PDF