Cephfs with kernel client mtime stuck when multiple clients append to file
File modification timestamp stop updating when multiple clients(from multiple nodes) append to file at same time or few seconds after each other.
node1 => echo "test1" >> /mnt/ceph/mtime_test
node2 => echo "test2" >> /mnt/ceph/mtime_test
node1 => echo ....
node2 => echo ....
node1 => watch stat /mnt/ceph/mtime_test # first 1-2 change and no change even after few minutes of waiting
Timestam start working again, when no client appending to file for 20-120s
Sometimes mtime not updating even when only one client appending to file, but its rare.
OS: Ubuntu 18.04.4, Ubuntu 20.04, fedora 32
Kernel: 5.4.0-40-generic, 4.15.0-108-generic,4.15.0-109-generic or even latest 5.7.7/5.7.8
Ceph v15.2.3, v15.2.4, 14.2.10
No problem except bad performance with ceph-fuse client.
#4 Updated by Jeff Layton over 1 year ago
I can confirm the behavior. I'll note that the time does seem to be updated after the I/O stops, but that's not really the level of cache-coherency that we're going for here. This involves some of the less-traveled paths of the kcephfs client.
In this case, the clients basically don't have the necessary caps to cache the attributes and they are continually issuing GETATTR calls to the MDS. Meanwhile, the client is doing uncached writes since it doesn't have the Fb caps, which are needed to use the pagecache.
I'll have to do a bit of debugging and observation to figure out the cause.
#5 Updated by Jeff Layton over 1 year ago
Basically, the mtime only freezes once there are competing clients writing to the file. If you kill one of the writers then the clients will start seeing mtime updates again once the writer is using cached I/O.
So, I think the problem is confined to the uncached codepaths in the kclient. Still looking at the cause. It's not clear to me how the client is intended to update the MDS as to the new mtime, when it doesn't have the caps to flush back an update for that field. Are we supposed to do a SETATTR?
#6 Updated by Jeff Layton over 1 year ago
I've been looking at this today. The problem seems to be in the order of operations by the client. It requests Fw caps fairly late in the process, after the inode's i_mtime has already been reset to the current time. We then end up requesting Fw caps from the MDS and that cap update clobbers the mtime. I think the right fix is to ensure that we request Fw caps before doing any changes to the inode, so that we're certain of the point from which the change is being made.
#7 Updated by Jeff Layton over 1 year ago
- Status changed from New to Fix Under Review
Patch posted to ceph-devel: