Bug #20467
Ceph FS kernel client not consistency
0%
Description
I use 'ls' to list files in one directory, two client has different result
ruitian@rndcl9:/data/hft/data_v1/hftdata/WIND/market/stock/2016/10$ ls 21 | wc -l
5670
ruitian@rndcl3:/data/hft/data_v1/hftdata/WIND/market/stock/2016/10$ ls 21 | wc -l
3409
But when I can access the missing file directly, I can read the file on both of client.
ruitian@rndcl3:/data/hft/data_v1/hftdata/WIND/market/stock/2016/10$ ls 21 | grep 603999-SH-stock.book.csv.gz
"No Such File"
ruitian@rndcl3:/data/hft/data_v1/hftdata/WIND/market/stock/2016/10$ ls ./21/603999-SH-stock.book.csv.gz
./21/603999-SH-stock.book.csv.gz
It seems something wrong about directory in ceph kernel client.
History
#1 Updated by Zheng Yan over 6 years ago
kernel version ?
#2 Updated by Yunzhi Cheng over 6 years ago
kernel version is 4.4.0-46
#3 Updated by Yunzhi Cheng over 6 years ago
ruitian@rndcl3:~$ uname -a
Linux rndcl3 4.4.0-46-generic #67~14.04.1-Ubuntu SMP Fri Oct 21 16:04:40 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
ruitian@rndcl9:~$ uname -a
Linux rndcl9 4.4.0-46-generic #67~14.04.1-Ubuntu SMP Fri Oct 21 16:04:40 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
#4 Updated by Zheng Yan over 6 years ago
I checked ubuntu kenrel. It does not contain following commit. I suspect it's the cause. Could you please try newest upstream kernel.
commit af5e5eb574776cdf1b756a27cc437bff257e22fe Author: Yan, Zheng <zyan@redhat.com> Date: Fri Feb 26 16:27:13 2016 +0800 ceph: fix race during filling readdir cache Readdir cache uses page cache to save dentry pointers. When adding dentry pointers to middle of a page, we need to make sure the page already exists. Otherwise the beginning part of the page will be invalid pointers. Signed-off-by: Yan, Zheng <zyan@redhat.com>
#5 Updated by Yunzhi Cheng over 6 years ago
Zheng Yan wrote:
I checked ubuntu kenrel. It does not contain following commit. I suspect it's the cause. Could you please try newest upstream kernel.
[...]
I use the mainline kernel now, and still has this issue.
ruitian@rndcl3:~$ uname -r
4.11.8-041108-generic
ruitian@rndcl3:/data/hft/data_v1/hftdata/WIND/market/stock/2016/07$ ls 06 | grep 002319
"No Such File"
ruitian@rndcl9:/data/hft/data_v1/hftdata/WIND/market/stock/2016/07$ ls ./06/002319-SZ-stock.book.csv.gz
./06/002319-SZ-stock.book.csv.gz
And I find that if I create a tmp file in the error directory, it will be ok
ruitian@rndcl3:/data/hft/data_v1/hftdata/WIND/market/stock/2016/07$ ls 06 | grep 002319
"No Such File"
ruitian@rndcl3:/data/hft/data_v1/hftdata/WIND/market/stock/2016/07$ touch ./06/a.tmp
ruitian@rndcl3:/data/hft/data_v1/hftdata/WIND/market/stock/2016/07$ ls 06 | grep 002319
./06/002319-SZ-stock.book.csv.gz
Maybe it's a big problem that I cannot trust any result of 'ls'.
#6 Updated by Zheng Yan over 6 years ago
ceph-mds version and configure, dirfrag or multimds enabled?
please describe workload you put on cephfs, how many clients? does they modify the same directory at the same time?
how difficult to reproduce this, if it's easy. please set debug_mds=10 and enable kernel debugging:
on machine that run ceph-mds, run "ceph daemon mds.x config set debug_mds 10"
on cephfs client machine, run "echo module ceph +p > /sys/kernel/debug/ceph/*/mdsc"
reproduce this bug.
disable debugging and upload the log.
"ceph daemon mds.x config set debug_mds 0"
"echo module ceph -p > /sys/kernel/debug/ceph/*/mdsc"
thanks
#7 Updated by Zheng Yan over 6 years ago
- Assignee set to Zheng Yan
#8 Updated by Yunzhi Cheng over 6 years ago
- File current_mds_config.txt View added
Zheng Yan wrote:
ceph-mds version and configure, dirfrag or multimds enabled?
please describe workload you put on cephfs, how many clients? does they modify the same directory at the same time?
how difficult to reproduce this, if it's easy. please set debug_mds=10 and enable kernel debugging:
on machine that run ceph-mds, run "ceph daemon mds.x config set debug_mds 10"
on cephfs client machine, run "echo module ceph +p > /sys/kernel/debug/ceph/*/mdsc"reproduce this bug.
disable debugging and upload the log.
"ceph daemon mds.x config set debug_mds 0"
"echo module ceph -p > /sys/kernel/debug/ceph/*/mdsc"thanks
ceph-mds version is 10.2.7 And I only change mds_cache_size to 5000000, the other configs are same as default.
I haven't enable dirfrag or multimds
I have 6 clients per host , and there are 70 hosts
We have 3 ceph cluster, and on every host 3 clients point to A, 2 clients point to B, the last one point to C
I change 2 node to kernel 4.11.8-041108-generic and the others are 4.4.0-46-generic
As you see above, rndcl3 which I reproduce this issue is kernel 4.11.8
Here's the mount option
10.0.0.2,10.0.0.12,10.0.0.28:/hft on /data/hft type ceph (name=admin,noshare,key=client.admin)
10.0.0.2,10.0.0.12,10.0.0.28:/rtt on /data/rtt type ceph (name=admin,noshare,key=client.admin)
10.0.0.26,10.0.0.38,10.0.0.62:/rnd on /data/rnd2 type ceph (name=admin,noshare,key=client.admin)
10.0.0.27,10.0.0.39,10.0.0.63:/rnd on /data/rnd type ceph (name=admin,noshare,key=client.admin)
10.0.0.2,10.0.0.12,10.0.0.28:/share on /data/share type ceph (name=admin,noshare,key=client.admin)
10.0.0.26,10.0.0.38,10.0.0.62:/ on /old_data type ceph (name=admin,noshare,key=client.admin)
The client witch reproduce the issue is '10.0.0.2,10.0.0.12,10.0.0.28:/hft on /data/hft type ceph (name=admin,noshare,key=client.admin)'
We only read data from directroy and donot modify it. At most time, there are 10 processes read the same directory concurrency.
I will upload the debug_log if I reproduce it.
root@rndcl3:/sys/kernel/debug/ceph/8126fa27-e3b1-49e6-8a59-9ad337074533.client3143441# echo 'module ceph +p' > mdsc
-bash: echo: write error: Invalid argument
Seems it's error, should I do echo module ceph +p > /sys/kernel/debug/dynamic_debug ?
By the way, if I add mount option noasyncreaddir, will the bug be avoid?
#9 Updated by Zheng Yan over 6 years ago
Yunzhi Cheng wrote:
Seems it's error, should I do echo module ceph +p > /sys/kernel/debug/dynamic_debug ?
yes, it should be "module ceph +p > /sys/kernel/debug/dynamic_debug/control"
By the way, if I add mount option noasyncreaddir, will the bug be avoid?
yes, noasyncreaddir should avoid this problem
#10 Updated by Zheng Yan over 6 years ago
- Status changed from New to 7
#11 Updated by Yunzhi Cheng over 6 years ago
Zheng Yan wrote:
Please try
https://github.com/ceph/ceph-client/commit/6c51866e67dda544b9be524185a526cf4ace865a
Sorry I can't compile the custom kernel to our cluster recently, I can only try it when it been backport to the mainline kernel :(
There will be a long time before I can test a custom kernel in a test enviroment.
I upload the log when I reproduce the bug, it happened at Jul 6 16:58:00 to 17:00:30.
https://www.dropbox.com/s/brtd46l7o9kklgo/ceph-data-mds.rndcl9.log.gz?dl=0
https://www.dropbox.com/s/vfbgxfmgvn7qo8s/kern.log.gz?dl=0
And I reproduce it by
ls /data/hft/data_v1/hftdata/WIND/market/stock/2016/01/18 | grep 601808-SH-stock.trade.csv.gz
Maybe you can find something in the log, but it's really a mass.
I am testing noasyncreaddir mount option now.
#12 Updated by Zheng Yan over 6 years ago
Yunzhi Cheng wrote:
Zheng Yan wrote:
Please try
https://github.com/ceph/ceph-client/commit/6c51866e67dda544b9be524185a526cf4ace865a
Sorry I can't compile the custom kernel to our cluster recently, I can only try it when it been backport to the mainline kernel :(
There will be a long time before I can test a custom kernel in a test enviroment.I upload the log when I reproduce the bug, it happened at Jul 6 16:58:00 to 17:00:30.
https://www.dropbox.com/s/brtd46l7o9kklgo/ceph-data-mds.rndcl9.log.gz?dl=0
https://www.dropbox.com/s/vfbgxfmgvn7qo8s/kern.log.gz?dl=0
And I reproduce it by
[...]
Maybe you can find something in the log, but it's really a mass.I am testing noasyncreaddir mount option now.
I have reproduced this issue locally. The patch works.
Thanks
#13 Updated by Zheng Yan over 6 years ago
- Status changed from 7 to Resolved