Bug #15502
closedfiles read or written with cephfs (fuse or kernel) on client drop all their page cache pages on close
0%
Description
Testing cephfs file system I/O with early jewel bits (ceph-10.0.4-1.el7cp.x86_64) on:
RHEL72 client mounting a cephfs (fuse or kernel) file systems on a RHEL72 4 node server cluster shows that every time we close a file on the client, all of the page cache pages associated with that file get dropped.
A simple test case is attached ...
Below is some sample output:
procs -----------memory---------- ---swap-- -----io---- system- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 106428 48332732 0 234132 0 0 30 106 1 1 1 0 98 0 0
0 0 106428 48332312 0 234212 0 0 0 0 82 134 0 0 100 0 0
0 0 106428 48332572 0 234212 0 0 0 0 121 124 0 0 100 0 0
0 0 106428 48332572 0 234212 0 0 0 0 83 122 0 0 100 0 0
0 0 106428 48332572 0 234212 0 0 0 0 143 173 0 0 100 0 0
Dropping Cache on purpose
0 0 106428 48332572 0 234212 0 0 32 10 91 204 0 0 100 0 0
Sleeping for 5 seconds
0 0 106428 48383008 0 185880 0 0 1240 51 261 368 0 0 100 0 0
0 0 106428 48383624 0 185104 0 0 0 0 53 91 0 0 100 0 0
0 0 106428 48383624 0 185104 0 0 0 0 124 141 0 0 100 0 0
0 0 106428 48383632 0 185104 0 0 0 0 40 63 0 0 100 0 0
0 0 106428 48383632 0 185104 0 0 0 0 92 69 0 0 100 0 0
Create a 10GB file with : dd if=/dev/zero of=/sasdata/ddfile.log bs=128k count=102400 conv=fsync
0 0 106428 48382888 0 185104 0 0 128 0 111 233 0 0 100 0 0
1 0 106428 46386068 0 2182464 0 0 0 0 1055 184 0 4 96 0 0
1 0 106428 44044976 0 4524148 0 0 0 0 1072 92 0 4 96 0 0
2 0 106428 41883136 0 6684856 0 0 0 0 30939 5315 0 6 94 0 0
3 0 106428 39729552 0 8838996 0 0 0 0 29685 5536 0 7 93 0 0
1 0 106428 37863336 0 10705224 0 0 0 0 30309 5337 0 7 93 0 0
0 1 106428 37855084 0 10714604 0 0 0 0 18560 6098 0 2 94 4 0
0 1 106428 37855332 0 10714356 0 0 0 0 18490 5969 0 2 94 4 0
2 1 106428 37856164 0 10713548 0 0 0 0 17895 6318 0 2 94 4 0
1 1 106428 37857660 0 10712308 0 0 0 0 11395 5344 0 2 94 4 0
1 1 106428 37859740 0 10710196 0 0 0 0 8840 4560 0 2 94 4 0
0 1 106428 37862488 0 10707604 0 0 0 0 8160 3117 0 1 94 4 0
0 1 106428 37867152 0 10702940 0 0 0 0 1317 631 0 0 96 4 0
0 1 106428 37873184 0 10696908 0 0 0 1 924 848 0 0 96 4 0
0 1 106428 37880596 0 10689620 0 0 0 0 504 1157 0 0 96 4 0
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 14.9595 s, 718 MB/s
Note page cache usage now ...
sleeping 5 seconds
0 0 106428 48383240 0 185932 0 0 96 0 1301 916 0 3 96 0 0 <======= FILES CACHE DROPPED
0 0 106428 48383428 0 185772 0 0 0 0 54 97 0 0 100 0 0
0 0 106428 48383676 0 185772 0 0 0 0 120 122 0 0 100 0 0
0 0 106428 48383800 0 185772 0 0 0 0 43 63 0 0 100 0 0
0 0 106428 48383924 0 185772 0 0 0 0 94 68 0 0 100 0 0
Read the 10GB file with -> dd if=/sasdata/ddfile.log of=/dev/null bs=16k
0 1 106428 48309964 0 260880 0 0 0 0 1442 2055 0 0 99 0 0
0 1 106428 47605496 0 968972 0 0 0 0 11158 17486 0 3 94 3 0
0 1 106428 46897808 0 1676576 0 0 0 0 11903 18367 0 2 94 3 0
0 1 106428 46172880 0 2400252 0 0 0 0 11966 18425 0 2 94 3 0
0 1 106428 45458936 0 3115524 0 0 0 0 12346 19059 0 3 94 3 0
0 1 106428 44729800 0 3845044 0 0 0 16 11846 18127 0 3 94 3 0
2 1 106428 44006744 0 4567652 0 0 0 0 12050 17993 0 3 94 3 0
1 1 106428 43284408 0 5291036 0 0 0 0 11900 18420 0 2 94 3 0
0 1 106428 42558892 0 6016704 0 0 0 0 12650 18308 0 2 94 3 0
0 1 106428 41848088 0 6726356 0 0 0 0 12678 18419 0 3 94 3 0
1 1 106428 41125968 0 7449704 0 0 0 4 12493 18905 0 3 94 3 0
1 1 106428 40416464 0 8159692 0 0 0 0 12446 18430 0 2 94 3 0
0 1 106428 39701936 0 8873680 0 0 0 0 12691 18567 0 2 94 3 0
0 1 106428 38995404 0 9580452 0 0 0 0 13141 18703 0 3 94 3 0
1 1 106428 38242380 0 10335368 0 0 0 0 12206 18146 0 3 94 3 0
1 0 106428 44433100 0 4142496 0 0 0 0 7367 9652 0 3 95 2 0
163840+0 records in
163840+0 records out
10737418240 bytes (11 GB) copied, 15.4091 s, 697 MB/s
Note page cache usage now ...
sleeping 5 seconds
0 0 106428 48389876 0 186476 0 0 0 0 521 358 0 1 99 0 0 <======= FILES CACHE DROPPED
0 0 106428 48389980 0 186476 0 0 0 0 111 162 0 0 100 0 0
0 0 106428 48389980 0 186476 0 0 0 0 67 65 0 0 100 0 0
0 0 106428 48389980 0 186476 0 0 0 0 69 68 0 0 100 0 0
0 0 106428 48389980 0 186476 0 0 0 0 80 81 0 0 100 0 0
cephdropcache.sh: line 29: 22294 Terminated vmstat 1
Files
Updated by Greg Farnum about 8 years ago
- Assignee set to Zheng Yan
Zheng, can you look at this? Hopefully we just have a bad cap transition on the server or something.
Updated by John Spray about 8 years ago
Updated by Zheng Yan about 8 years ago
This should be fixed in upstream, Barry, which version of RHEL kernel do you use?
Updated by Greg Farnum about 8 years ago
This was on both ceph-fuse and a recent-ish rhel (7.2? 7.3-prerelease?) kernel. Unless there's some extra thing in the FUSE interfaces that we need to deal with (I don't think I heard of one?), this is something we're doing explicitly, and all the patches from #13640 are definitely included. Which makes me think it's caps.
Updated by Barry Marson about 8 years ago
Kernel is 3.10.0-327.el7.x86_64 ie the GA kernel for RHEL7.2
Updated by Zheng Yan about 8 years ago
I can't reproduce this on 3.10.0-327.el7 kernel mount. To make ceph-fuse keeps the kernel pagecache, you need to set "fuse_use_invalidate_cb" to true.
Updated by Zheng Yan about 8 years ago
- Status changed from New to Need More Info
Updated by Barry Marson about 8 years ago
I was able to make this happen with kernel mode as well. Does that tunable noted in #6 need to be implemented for fuse and kernel mode ?
Why is caching not the default ?
Im out of town for another 2 days. I'll follow up with more testing when I return
Barry
Updated by Zheng Yan about 8 years ago
here is my test result.
[root@zhyan-kvm1 ceph]# ./cephdropcache.sh testfile procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 1 0 1837396 21904 116060 0 0 321 24 49 84 0 0 91 8 0 0 0 0 1837256 21904 116104 0 0 16 0 33 55 0 0 100 1 0 0 0 0 1837256 21904 116104 0 0 0 0 16 26 0 0 100 0 0 0 0 0 1837256 21904 116104 0 0 0 0 16 26 0 0 100 0 0 0 0 0 1837256 21904 116104 0 0 0 0 15 31 0 0 100 0 0 0 0 0 1837256 21904 116104 0 0 0 0 16 26 0 0 100 0 0 Dropping Cache on purpose 0 1 0 1837160 21912 116232 0 0 32 672 42 77 0 0 50 50 0 0 1 0 1837160 21912 116232 0 0 0 0 14 26 0 0 50 50 0 Sleeping for 5 seconds 0 0 0 1926432 684 48264 0 0 824 36 93 126 0 1 76 23 0 0 0 0 1926432 684 48264 0 0 0 0 17 30 0 0 100 0 0 0 0 0 1926432 684 48264 0 0 0 0 13 22 0 0 100 0 0 0 0 0 1926432 684 48264 0 0 0 0 11 22 0 0 100 0 0 0 0 0 1926432 684 48264 0 0 0 0 16 26 0 0 100 0 0 Create a 1GB file with : dd if=/dev/zero of=testfile bs=1024k count=1024 conv=fsync 0 2 0 1540708 840 432748 0 0 224 24 813 514 0 12 70 15 3 1 1 0 1227188 840 746252 0 0 0 0 825 807 0 9 47 42 3 2 0 0 873508 840 1099772 0 0 0 0 1032 853 0 14 45 39 3 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 3.35954 s, 320 MB/s Note page cache usage now ... sleeping 5 seconds 0 0 0 874308 988 1099736 0 0 152 0 439 367 0 4 63 31 1 0 0 0 874308 988 1099736 0 0 0 0 23 27 0 0 100 0 0 0 0 0 874308 988 1099736 0 0 0 84 25 33 0 0 100 0 0 0 0 0 874308 988 1099736 0 0 0 0 13 23 0 0 100 0 0 0 0 0 874308 988 1099736 0 0 0 0 13 23 0 0 100 0 0 Read the 10GB file with -> dd if=testfile of=/dev/null bs=16k 1 0 0 874184 988 1099736 0 0 0 0 127 55 0 5 95 0 0 16384+0 records in 16384+0 records out 1073741824 bytes (1.1 GB) copied, 0.12942 s, 8.3 GB/s Note page cache usage now ... sleeping 5 seconds 0 0 0 874308 988 1099740 0 0 0 0 54 70 1 1 99 0 0 0 1 0 874308 992 1099736 0 0 0 4 19 29 0 0 100 1 0 0 0 0 874300 992 1099748 0 0 0 0 20 36 0 0 92 9 0 0 0 0 874300 992 1099748 0 0 0 0 13 27 0 0 100 0 0 1 0 0 874300 992 1099748 0 0 0 0 18 23 0 0 100 0 0 ./cephdropcache.sh: line 31: 681 Terminated vmstat 1
"fuse_use_invalidate_cb" controls if ceph-fuse notify kernel to invalidate pagecache. The reason we don't enable it by default is that the fuse invalidate callback used to cause deadlock.
Updated by Greg Farnum about 8 years ago
That was just because our callback logic was broken though, right? I think it's time to enable by default (I think it's been on in our testing this whole time?).
Updated by Barry Marson about 8 years ago
I've verified by adding:
fuse_use_invalidate_cb=true
to the /etc/ceph/ceph.config file that mounting with fuse does indeed keep the file cached on closure of the file.
So the next question is, what about when bypassing fuse and mounting in kernel mode ? That still drops the pages on closure.
Thanks
Barry
Updated by Zheng Yan about 8 years ago
Greg Farnum wrote:
That was just because our callback logic was broken though, right? I think it's time to enable by default (I think it's been on in our testing this whole time?).
I think it's ok to turn it on
Updated by Zheng Yan about 8 years ago
I checked again. 3.10.0-327.el7.x86_64 does not contain the fix. 3.10.0-375.el7 kernel does.
Updated by Barry Marson about 8 years ago
Its interesting. The kernel changelogs reference the page cache invalidation change as
- Fri Mar 04 2016 Rafael Aquini <aquini@redhat.com> [3.10.0-358.el7]
- [fs] ceph: don't invalidate page cache when inode is no longer used (Zheng Yan) [1291193]
But the 'Fixed in Kernel' associated with bz 1291193 claims kernel-3.10.0-362.el7.
Fortunately I need to move to kernel-3.10.0-382.el7 because I need a fix for
https://bugzilla.redhat.com/show_bug.cgi?id=1320427
I shall proceed with the new bits.
Thanks
Barry
Updated by Greg Farnum almost 8 years ago
Created #15634 to enable the config value.
Updated by Greg Farnum almost 8 years ago
- Category set to Performance/Resource Usage
- Status changed from Need More Info to Resolved
I think this is all cleaned up now.