Project

General

Profile

Actions

Bug #15502

closed

files read or written with cephfs (fuse or kernel) on client drop all their page cache pages on close

Added by Barry Marson about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Testing cephfs file system I/O with early jewel bits (ceph-10.0.4-1.el7cp.x86_64) on:

RHEL72 client mounting a cephfs (fuse or kernel) file systems on a RHEL72 4 node server cluster shows that every time we close a file on the client, all of the page cache pages associated with that file get dropped.

A simple test case is attached ...

Below is some sample output:

procs -----------memory---------- ---swap-- -----io---- system- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 106428 48332732 0 234132 0 0 30 106 1 1 1 0 98 0 0
0 0 106428 48332312 0 234212 0 0 0 0 82 134 0 0 100 0 0
0 0 106428 48332572 0 234212 0 0 0 0 121 124 0 0 100 0 0
0 0 106428 48332572 0 234212 0 0 0 0 83 122 0 0 100 0 0
0 0 106428 48332572 0 234212 0 0 0 0 143 173 0 0 100 0 0
Dropping Cache on purpose
0 0 106428 48332572 0 234212 0 0 32 10 91 204 0 0 100 0 0
Sleeping for 5 seconds
0 0 106428 48383008 0 185880 0 0 1240 51 261 368 0 0 100 0 0
0 0 106428 48383624 0 185104 0 0 0 0 53 91 0 0 100 0 0
0 0 106428 48383624 0 185104 0 0 0 0 124 141 0 0 100 0 0
0 0 106428 48383632 0 185104 0 0 0 0 40 63 0 0 100 0 0
0 0 106428 48383632 0 185104 0 0 0 0 92 69 0 0 100 0 0
Create a 10GB file with : dd if=/dev/zero of=/sasdata/ddfile.log bs=128k count=102400 conv=fsync
0 0 106428 48382888 0 185104 0 0 128 0 111 233 0 0 100 0 0
1 0 106428 46386068 0 2182464 0 0 0 0 1055 184 0 4 96 0 0
1 0 106428 44044976 0 4524148 0 0 0 0 1072 92 0 4 96 0 0
2 0 106428 41883136 0 6684856 0 0 0 0 30939 5315 0 6 94 0 0
3 0 106428 39729552 0 8838996 0 0 0 0 29685 5536 0 7 93 0 0
1 0 106428 37863336 0 10705224 0 0 0 0 30309 5337 0 7 93 0 0
0 1 106428 37855084 0 10714604 0 0 0 0 18560 6098 0 2 94 4 0
0 1 106428 37855332 0 10714356 0 0 0 0 18490 5969 0 2 94 4 0
2 1 106428 37856164 0 10713548 0 0 0 0 17895 6318 0 2 94 4 0
1 1 106428 37857660 0 10712308 0 0 0 0 11395 5344 0 2 94 4 0
1 1 106428 37859740 0 10710196 0 0 0 0 8840 4560 0 2 94 4 0
0 1 106428 37862488 0 10707604 0 0 0 0 8160 3117 0 1 94 4 0
0 1 106428 37867152 0 10702940 0 0 0 0 1317 631 0 0 96 4 0
0 1 106428 37873184 0 10696908 0 0 0 1 924 848 0 0 96 4 0
0 1 106428 37880596 0 10689620 0 0 0 0 504 1157 0 0 96 4 0
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 14.9595 s, 718 MB/s
Note page cache usage now ...
sleeping 5 seconds
0 0 106428 48383240 0 185932 0 0 96 0 1301 916 0 3 96 0 0 <======= FILES CACHE DROPPED
0 0 106428 48383428 0 185772 0 0 0 0 54 97 0 0 100 0 0
0 0 106428 48383676 0 185772 0 0 0 0 120 122 0 0 100 0 0
0 0 106428 48383800 0 185772 0 0 0 0 43 63 0 0 100 0 0
0 0 106428 48383924 0 185772 0 0 0 0 94 68 0 0 100 0 0
Read the 10GB file with -> dd if=/sasdata/ddfile.log of=/dev/null bs=16k
0 1 106428 48309964 0 260880 0 0 0 0 1442 2055 0 0 99 0 0
0 1 106428 47605496 0 968972 0 0 0 0 11158 17486 0 3 94 3 0
0 1 106428 46897808 0 1676576 0 0 0 0 11903 18367 0 2 94 3 0
0 1 106428 46172880 0 2400252 0 0 0 0 11966 18425 0 2 94 3 0
0 1 106428 45458936 0 3115524 0 0 0 0 12346 19059 0 3 94 3 0
0 1 106428 44729800 0 3845044 0 0 0 16 11846 18127 0 3 94 3 0
2 1 106428 44006744 0 4567652 0 0 0 0 12050 17993 0 3 94 3 0
1 1 106428 43284408 0 5291036 0 0 0 0 11900 18420 0 2 94 3 0
0 1 106428 42558892 0 6016704 0 0 0 0 12650 18308 0 2 94 3 0
0 1 106428 41848088 0 6726356 0 0 0 0 12678 18419 0 3 94 3 0
1 1 106428 41125968 0 7449704 0 0 0 4 12493 18905 0 3 94 3 0
1 1 106428 40416464 0 8159692 0 0 0 0 12446 18430 0 2 94 3 0
0 1 106428 39701936 0 8873680 0 0 0 0 12691 18567 0 2 94 3 0
0 1 106428 38995404 0 9580452 0 0 0 0 13141 18703 0 3 94 3 0
1 1 106428 38242380 0 10335368 0 0 0 0 12206 18146 0 3 94 3 0
1 0 106428 44433100 0 4142496 0 0 0 0 7367 9652 0 3 95 2 0
163840+0 records in
163840+0 records out
10737418240 bytes (11 GB) copied, 15.4091 s, 697 MB/s
Note page cache usage now ...
sleeping 5 seconds
0 0 106428 48389876 0 186476 0 0 0 0 521 358 0 1 99 0 0 <======= FILES CACHE DROPPED
0 0 106428 48389980 0 186476 0 0 0 0 111 162 0 0 100 0 0
0 0 106428 48389980 0 186476 0 0 0 0 67 65 0 0 100 0 0
0 0 106428 48389980 0 186476 0 0 0 0 69 68 0 0 100 0 0
0 0 106428 48389980 0 186476 0 0 0 0 80 81 0 0 100 0 0
cephdropcache.sh: line 29: 22294 Terminated vmstat 1


Files

cephdropcache.sh (570 Bytes) cephdropcache.sh Script demonstrates files being dropped from cache on close Barry Marson, 04/14/2016 05:34 PM
Actions #1

Updated by Greg Farnum about 8 years ago

  • Assignee set to Zheng Yan

Zheng, can you look at this? Hopefully we just have a bad cap transition on the server or something.

Actions #3

Updated by Zheng Yan about 8 years ago

This should be fixed in upstream, Barry, which version of RHEL kernel do you use?

Actions #4

Updated by Greg Farnum about 8 years ago

This was on both ceph-fuse and a recent-ish rhel (7.2? 7.3-prerelease?) kernel. Unless there's some extra thing in the FUSE interfaces that we need to deal with (I don't think I heard of one?), this is something we're doing explicitly, and all the patches from #13640 are definitely included. Which makes me think it's caps.

Actions #5

Updated by Barry Marson about 8 years ago

Kernel is 3.10.0-327.el7.x86_64 ie the GA kernel for RHEL7.2

Actions #6

Updated by Zheng Yan about 8 years ago

I can't reproduce this on 3.10.0-327.el7 kernel mount. To make ceph-fuse keeps the kernel pagecache, you need to set "fuse_use_invalidate_cb" to true.

Actions #7

Updated by Zheng Yan about 8 years ago

  • Status changed from New to Need More Info
Actions #8

Updated by Barry Marson about 8 years ago

I was able to make this happen with kernel mode as well. Does that tunable noted in #6 need to be implemented for fuse and kernel mode ?

Why is caching not the default ?

Im out of town for another 2 days. I'll follow up with more testing when I return

Barry

Actions #9

Updated by Zheng Yan about 8 years ago

here is my test result.

[root@zhyan-kvm1 ceph]# ./cephdropcache.sh testfile 
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1      0 1837396  21904 116060    0    0   321    24   49   84  0  0 91  8  0
 0  0      0 1837256  21904 116104    0    0    16     0   33   55  0  0 100  1  0
 0  0      0 1837256  21904 116104    0    0     0     0   16   26  0  0 100  0  0
 0  0      0 1837256  21904 116104    0    0     0     0   16   26  0  0 100  0  0
 0  0      0 1837256  21904 116104    0    0     0     0   15   31  0  0 100  0  0
 0  0      0 1837256  21904 116104    0    0     0     0   16   26  0  0 100  0  0
Dropping Cache on purpose
 0  1      0 1837160  21912 116232    0    0    32   672   42   77  0  0 50 50  0
 0  1      0 1837160  21912 116232    0    0     0     0   14   26  0  0 50 50  0
Sleeping for 5 seconds
 0  0      0 1926432    684  48264    0    0   824    36   93  126  0  1 76 23  0
 0  0      0 1926432    684  48264    0    0     0     0   17   30  0  0 100  0  0
 0  0      0 1926432    684  48264    0    0     0     0   13   22  0  0 100  0  0
 0  0      0 1926432    684  48264    0    0     0     0   11   22  0  0 100  0  0
 0  0      0 1926432    684  48264    0    0     0     0   16   26  0  0 100  0  0
Create a 1GB file with : dd if=/dev/zero of=testfile bs=1024k count=1024 conv=fsync
 0  2      0 1540708    840 432748    0    0   224    24  813  514  0 12 70 15  3
 1  1      0 1227188    840 746252    0    0     0     0  825  807  0  9 47 42  3
 2  0      0 873508    840 1099772    0    0     0     0 1032  853  0 14 45 39  3
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 3.35954 s, 320 MB/s
Note page cache usage now ...
sleeping 5 seconds
 0  0      0 874308    988 1099736    0    0   152     0  439  367  0  4 63 31  1
 0  0      0 874308    988 1099736    0    0     0     0   23   27  0  0 100  0  0
 0  0      0 874308    988 1099736    0    0     0    84   25   33  0  0 100  0  0
 0  0      0 874308    988 1099736    0    0     0     0   13   23  0  0 100  0  0
 0  0      0 874308    988 1099736    0    0     0     0   13   23  0  0 100  0  0
Read the 10GB file with -> dd if=testfile of=/dev/null bs=16k
 1  0      0 874184    988 1099736    0    0     0     0  127   55  0  5 95  0  0
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 0.12942 s, 8.3 GB/s
Note page cache usage now ...
sleeping 5 seconds
 0  0      0 874308    988 1099740    0    0     0     0   54   70  1  1 99  0  0
 0  1      0 874308    992 1099736    0    0     0     4   19   29  0  0 100  1  0
 0  0      0 874300    992 1099748    0    0     0     0   20   36  0  0 92  9  0
 0  0      0 874300    992 1099748    0    0     0     0   13   27  0  0 100  0  0
 1  0      0 874300    992 1099748    0    0     0     0   18   23  0  0 100  0  0
./cephdropcache.sh: line 31:   681 Terminated              vmstat 1

"fuse_use_invalidate_cb" controls if ceph-fuse notify kernel to invalidate pagecache. The reason we don't enable it by default is that the fuse invalidate callback used to cause deadlock.

Actions #10

Updated by Greg Farnum about 8 years ago

That was just because our callback logic was broken though, right? I think it's time to enable by default (I think it's been on in our testing this whole time?).

Actions #11

Updated by Barry Marson about 8 years ago

I've verified by adding:

fuse_use_invalidate_cb=true

to the /etc/ceph/ceph.config file that mounting with fuse does indeed keep the file cached on closure of the file.

So the next question is, what about when bypassing fuse and mounting in kernel mode ? That still drops the pages on closure.

Thanks
Barry

Actions #12

Updated by Zheng Yan about 8 years ago

Greg Farnum wrote:

That was just because our callback logic was broken though, right? I think it's time to enable by default (I think it's been on in our testing this whole time?).

I think it's ok to turn it on

Actions #13

Updated by Zheng Yan about 8 years ago

I checked again. 3.10.0-327.el7.x86_64 does not contain the fix. 3.10.0-375.el7 kernel does.

Actions #14

Updated by Barry Marson about 8 years ago

Its interesting. The kernel changelogs reference the page cache invalidation change as

  • Fri Mar 04 2016 Rafael Aquini <> [3.10.0-358.el7]
    - [fs] ceph: don't invalidate page cache when inode is no longer used (Zheng Yan) [1291193]

But the 'Fixed in Kernel' associated with bz 1291193 claims kernel-3.10.0-362.el7.

Fortunately I need to move to kernel-3.10.0-382.el7 because I need a fix for

https://bugzilla.redhat.com/show_bug.cgi?id=1320427

I shall proceed with the new bits.

Thanks
Barry

Actions #15

Updated by Greg Farnum almost 8 years ago

Created #15634 to enable the config value.

Actions #16

Updated by Greg Farnum almost 8 years ago

  • Category set to Performance/Resource Usage
  • Status changed from Need More Info to Resolved

I think this is all cleaned up now.

Actions

Also available in: Atom PDF