Bug #23291
closed
client: add way to sync setattr operations to MDS
Added by Jeff Layton about 6 years ago.
Updated almost 6 years ago.
Category:
Correctness/Safety
Component(FS):
Client, libcephfs
Description
Zheng pointed out that we could end up with a setattr request being cached in the libcephfs client after it has replied with success to the application. This is problematic for ganesha:
1/ NFS client does a NFS SETATTR
2/ ganesha issues setattrx to libcephfs, result is cached in client and not sent to MDS
3/ ganesha replies with success to client
4/ ganesha crashes
setattrx changes are then lost.
NFS expects setattr requests to be durable. The right fix is probably to allow the application to request that a setattrx be forced through to the MDS. In practice, ganesha and samba will almost never want the client to cache the results of a setattr.
In the libcephfs code the decision to cache the change is done in _do_setattr. We just need to provide some way to force the change to go to the MDS, even when it holds the appropriate caps. Alternately, we could just always have ceph_ll_setattr do that. AFAIK, ganesha is the only caller of that function so far.
- Description updated (diff)
Jeff Layton wrote:
Zheng pointed out that we could end up with a setattr request being cached in the libcephfs client after it has replied with success to the application. This is problematic for ganesha:
1/ NFS client does a NFS SETATTR
2/ ganesha issues setattrx to libcephfs, result is cached in client and not sent to MDS
3/ ganesha replies with success to client
4/ ganesha crashes
setattrx changes are then lost.
NFS expects setattr requests to be durable.
That requires synchronous requirements. e.g. fchmod(fd, buf); fsync(fd);
Is that what Ganesha does for the local file system?
The right fix is probably to allow the application to request that a setattrx be forced through to the MDS. In practice, ganesha and samba will almost never want the client to cache the results of a setattr.
To satisfy durability requirements, the client must also wait for the safe (i.e. durable) reply. Right?
- Subject changed from Allow caller to request that setattr request be synchronous to client: allow caller to request that setattr request be synchronous
- Category set to Correctness/Safety
- Target version set to v13.0.0
- Source set to Development
- Component(FS) Client added
- Related to Bug #16739: Client::setxattr always sends setxattr request to MDS added
Jeff, see also #16739. Maybe that's why it hasn't been a problem?
No. I don't think that bug is related. That one is dealing with setxattr (setting extended attributes) and yes, the client does always send those to the MDS. This is more about changing inode metadata (truncate, chown, chmod, etc.) and we do cache those changes locally if we have the right caps.
- Related to deleted (Bug #16739: Client::setxattr always sends setxattr request to MDS)
The big question here is how to achieve this:
The kernel has an AT_STATX_FORCE_SYNC flag, and we could repurpose it for use in setattr commands. Ganesha however uses ceph_ll_setattr, and that does not accept a flags field. We'd have to add a variant that does, which would be an API change (at a minimum), and then handle that at build-time in ganesha.
Alternately, we could have some conf setting in the client that forces setattrs to be synchronous. In practice, we will always want these to be synchronous from Ganesha anyway so a client-wide setting would be fine for this purpose.
Jeff Layton wrote:
Zheng pointed out that we could end up with a setattr request being cached in the libcephfs client after it has replied with success to the application. This is problematic for ganesha:
1/ NFS client does a NFS SETATTR
2/ ganesha issues setattrx to libcephfs, result is cached in client and not sent to MDS
3/ ganesha replies with success to client
4/ ganesha crashes
setattrx changes are then lost.
NFS expects setattr requests to be durable. The right fix is probably to allow the application to request that a setattrx be forced through to the MDS. In practice, ganesha and samba will almost never want the client to cache the results of a setattr.
Hmm, so NFS requires that setattr requests against the backing FS be as if:
setattr(fd, ...);
fsync(fd);
? Why not just use ceph_fsetattrx followed by ceph_fsync? Do we really need another API?
ceph_ll_fsync would give us almost the semantics we need. The main problem is that we may not have the file open and all of the variants currently take a Fh *. We need one that takes an Inode *.
Maybe we should add:
ceph_ll_sync_inode(Inode *i, bool syncdataonly);
?
Ok, I have a draft patch for this now. What I don't have is a great way to test it.
Hmm...now that I look, we have no test coverage at all for functional ceph_fsync or ceph_ll_fsync. I'll have to give that some thought.
- Status changed from New to In Progress
- Status changed from In Progress to Pending Backport
- Backport set to luminous
- Copied to Backport #23474: luminous: client: allow caller to request that setattr request be synchronous added
- Subject changed from client: allow caller to request that setattr request be synchronous to client: add way to sync setattr operations to MDS
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF