Bug #46574: Cephfs with kernel client mtime stuck when multiple clients append to file - Linux kernel client - Ceph

Actions

Copy link

Bug #46574

closed

Cephfs with kernel client mtime stuck when multiple clients append to file

Added by Jozef Kováč almost 4 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Jeff Layton

Category:

Target version:

Ceph - v15.2.4

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v15.2.3

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

File modification timestamp stop updating when multiple clients(from multiple nodes) append to file at same time or few seconds after each other.
For example:
node1 => echo "test1" >> /mnt/ceph/mtime_test
node2 => echo "test2" >> /mnt/ceph/mtime_test
node1 => echo ....
node2 => echo ....

node1 => watch stat /mnt/ceph/mtime_test # first 1-2 change and no change even after few minutes of waiting

Timestam start working again, when no client appending to file for 20-120s

Sometimes mtime not updating even when only one client appending to file, but its rare.

OS: Ubuntu 18.04.4, Ubuntu 20.04, fedora 32
Kernel: 5.4.0-40-generic, 4.15.0-108-generic,4.15.0-109-generic or even latest 5.7.7/5.7.8

Ceph v15.2.3, v15.2.4, 14.2.10

No problem except  bad performance with ceph-fuse client.

Actions

Copy link

Updated by Jozef Kováč almost 4 years ago

Sometimes stuck for longer than 2 mins even 30 minutes after last append to file new append to file do not updating timestamp no mather how long waiting

Actions

Copy link

Updated by Greg Farnum almost 3 years ago

Project changed from Ceph to Linux kernel client

This presumably has to do with when we force a refresh on the relevant timestamp-managing caps and gather the timestamps from clients which are all in Fsw.

Actions

Copy link

Updated by Jeff Layton over 2 years ago

Assignee set to Jeff Layton

Actions

Copy link

Updated by Jeff Layton over 2 years ago

I can confirm the behavior. I'll note that the time does seem to be updated after the I/O stops, but that's not really the level of cache-coherency that we're going for here. This involves some of the less-traveled paths of the kcephfs client.

In this case, the clients basically don't have the necessary caps to cache the attributes and they are continually issuing GETATTR calls to the MDS. Meanwhile, the client is doing uncached writes since it doesn't have the Fb caps, which are needed to use the pagecache.

I'll have to do a bit of debugging and observation to figure out the cause.

Actions

Copy link

Updated by Jeff Layton over 2 years ago

Basically, the mtime only freezes once there are competing clients writing to the file. If you kill one of the writers then the clients will start seeing mtime updates again once the writer is using cached I/O.

So, I think the problem is confined to the uncached codepaths in the kclient. Still looking at the cause. It's not clear to me how the client is intended to update the MDS as to the new mtime, when it doesn't have the caps to flush back an update for that field. Are we supposed to do a SETATTR?

Actions

Copy link

Updated by Jeff Layton over 2 years ago

I've been looking at this today. The problem seems to be in the order of operations by the client. It requests Fw caps fairly late in the process, after the inode's i_mtime has already been reset to the current time. We then end up requesting Fw caps from the MDS and that cap update clobbers the mtime. I think the right fix is to ensure that we request Fw caps before doing any changes to the inode, so that we're certain of the point from which the change is being made.

Actions

Copy link

Updated by Jeff Layton over 2 years ago

Status changed from New to Fix Under Review

Patch posted to ceph-devel:

https://lore.kernel.org/ceph-devel/20210811112324.8870-1-jlayton@kernel.org/T/#u

Actions

Copy link

Updated by Jeff Layton over 2 years ago

Status changed from Fix Under Review to Resolved

Merged for v5.15.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #46574

Cephfs with kernel client mtime stuck when multiple clients append to file

Updated by Jozef Kováč almost 4 years ago

Updated by Greg Farnum almost 3 years ago

Updated by Jeff Layton over 2 years ago

Updated by Jeff Layton over 2 years ago

Updated by Jeff Layton over 2 years ago

Updated by Jeff Layton over 2 years ago

Updated by Jeff Layton over 2 years ago

Updated by Jeff Layton over 2 years ago