Bug #1206
closedNFS reexport file creation lags 1-3 seconds
0%
Description
I'm attaching the kernel logs and mds logs for the creation of a file called scale-product/testfoo3 via a touch running on a remote NFS client.
I've looked through the logs, but they're a little difficult to track.
There are a few other file accesses in these logs for other services we're using.
However, I think it unlikely that those are causing problems, as I can create files locally on the ceph kernel mount without experiencing this lag.
Files
Updated by Brian Chrisman almost 13 years ago
If I reexport the ceph filesystem via fuse/nfs, there is no lag in creating files.
Updated by Sage Weil almost 13 years ago
It looks like after the mknod nfsd is calling write_inode via commit_metadata() in fs/nfsd/vfs.c. This triggers cap writeback, which does not flush teh mds cache immediately because normally nobody (except maybe sync(2)) waits on it.
If you export with option async and it should skip this step and be fast.
We can also defined a ->commit_metadata hook in the export_operations that does something different/smarter. What this is, I'm not sure. In this specific case, mknod is setting the mtime/atime/ctime, which is pretty useless (the mds did that too). In other cases safety might matter more.
Wanna try the async option just to verify this is in fact what's going on?
Updated by Sage Weil almost 13 years ago
- Project changed from Ceph to Linux kernel client
- Target version deleted (
v0.31)
Updated by Sage Weil almost 13 years ago
- Category set to NFS
- Target version set to v3.0
Updated by Brian Chrisman almost 13 years ago
That's faster... 0.5s for a touch create.
As a reference local filesystem and another cfs exported over nfs runs sub-0.01s for touch.
Updated by Brian Chrisman almost 13 years ago
Ahh okay... file create isn't really any slower than on native ceph kernel mount (rather than re-export)... so this isn't an issue.
I'm going to close it.
Updated by Brian Chrisman almost 13 years ago
Ahh I don't seem to have access to close it...
Updated by Sage Weil almost 13 years ago
ok, but are you saying that a native mount is also .5s for touch? that's no good either!
Updated by Brian Chrisman almost 13 years ago
0.3 seconds for kernel mount... 0.01s for cfuse.. I'm going to work on tracking this down..
Updated by Sage Weil almost 13 years ago
- Target version changed from v3.0 to v3.1
Updated by Brian Chrisman over 12 years ago
- File mds.scale-192-168-98-109.log mds.scale-192-168-98-109.log added
- File messages messages added
I've reproduced this half-second-NFS-create bug (f3de1a506d6c2debb399a1ae71f8f50714a31c8a), and it only appears through NFS (and occurs with async export and mount options set).
With local ceph mount, I get < 0.02s file creates, so there's something mucked up with the nfs side of this.
I'm going to attach mds and messages logs... (and move on to testing for stale fhs).
Here I'm creating a sequence of files, foo42, foo43 ... there's are being created manually (not being run as soon as last 'touch' returns) via nfs.
(Note, I previously exported non-ceph filesystems from the same server/node without any significant file create lag)
Example:
[root@buildrhel6 ~]# time touch /root/testmnt/foo42
real 0m0.603s
user 0m0.000s
sys 0m0.005s
[root@buildrhel6 ~]# time touch /root/testmnt/foo43
real 0m0.463s
user 0m0.000s
sys 0m0.005s
[root@buildrhel6 ~]# time touch /root/testmnt/foo44
real 0m0.465s
user 0m0.000s
sys 0m0.004s
[root@buildrhel6 ~]#
Updated by Brian Chrisman over 12 years ago
Looks like that commit-id may be an internal reference (we host a local repo)
This is a recent master.. 0.31 is the latest tag.
I'm working on archiving exactly which commit we build rpms from.
Updated by Brian Chrisman over 12 years ago
actual commit is:
3e9a982a88ed16e5b47136f617193d6056b32ab4
I still need to get the kernel commit.. but they were built at the same time (7/26)
Updated by Sage Weil over 12 years ago
- Target version changed from v3.1 to v3.2
Updated by Sage Weil over 12 years ago
- Target version deleted (
v3.2) - Translation missing: en.field_position set to 1
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position deleted (
2) - Translation missing: en.field_position set to 1
- Translation missing: en.field_position changed from 1 to 995
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position deleted (
1017) - Translation missing: en.field_position set to 438
Updated by Ian Colle about 11 years ago
- Project changed from Linux kernel client to CephFS
- Category deleted (
NFS)
Updated by John Spray over 7 years ago
- Status changed from New to Closed
Closing this because it's ancient (and if NFS creates were super-slow we'd notice on the knfs suite)