Bug #11989
closedCephfs Kernel Client data corruption
0%
Description
Hi. i get random data corruption with the cephfs kernel client. i do streaming from a non-ceph machine using "cat <file> | nc -l -p 444 -q" into cephfs using "nc <ip> 444 | pv -r -t -a > /mnt/cephfs/testfile". the file is 30GB of size and contains ascii csv data. when i do this, i get a high chance for a file corruption. The corruption manifests in a block of 0x00 bytes with a random size between ~500 to 4000 bytes.
After testing for around for 80+ hours, here are the things i know for sure:
- i was able to reproduce this with with giant (0.87.2-1~bpo70+1) and hammer (0.94.1-1~bpo70+1, 0.94.2-1~bpo70+1) ceph versions
- tried kernel versions 4.0.4, 3.16.0-4-amd64 and 3.14.43-031443
- it also happens if the node that is writing to cephfs is NOT running any osds, but it happens more often if a server that is running osds is mounting cephfs locally. which may also be traffic related
- if i test the file directly after writing, the md5 sum matches, which makes me think that it got handled over to cephfs correctly. if i then issue a "echo 3 > /proc/sys/vm/drop_caches" locally to clear the buffers and md5sum again, it does not match the previous checksum.
- with a 30GB file, i have a chance of 1-4 corrupt blocks in that file.
- i were also not able to reproduce this with the ceph-fuse client.
- i were not able to reproduce it when i copy the file from a local tmpfs location, it seems somehow related to the network traffic.
mount:
172.16.0.101:6789:/ on /ceph-kernel type ceph (rw,relatime,name=admin,secret=<hidden>,nodcache,nofsc,acl)
root@idbi5:/ceph-kernel/kernel# nc 172.16.0.62 445 | pv -r -a -t > testfile1
0:01:11 [ 432MB/s] [ 432MB/s]
root@idbi5:/ceph-kernel/kernel# md5sum testfile1
801a0ec20f59aa4e4da51a8337cb722f testfile1 #this is the correct checksum
root@idbi5:/ceph-kernel/kernel# echo 3 > /proc/sys/vm/drop_caches
root@idbi5:/ceph-kernel/kernel# md5sum testfile1
8558af60da1f3c062fea18f0ce81ef24 testfile1 #this checksum is wrong
- i have already reinstalled all servers with debian 7.8 wheezy
- reinstalled ceph multiple times with stock configuration and crush map.
- I also removed every server from the cluster to exclude hardware-related problems like bad ram.
- i have a 5 node cluster with 20 osds on xfs, 10 osds with hdd, 10 with ssd. i testet this in hdd-only and ssd-only cephfs pools.
- i issued a deep scrub on all pgs, no errors showed up
- hardware used: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 128GB ECC ram, 10GBit Ethernet
Are more informations required? Thank you.
Files
Updated by Greg Farnum almost 9 years ago
- Project changed from Ceph to CephFS
- Category changed from 26 to 53
- Assignee set to Zheng Yan
I imagine this is a result of some kind of memory exhaustion, but I'm not sure how best to diagnose it or if there are other possibilities. Zheng?
Updated by Zheng Yan almost 9 years ago
are there any suspected message when this happens?
Updated by Bernd Helm almost 9 years ago
Zheng Yan wrote:
are there any suspected message when this happens?
dmesg is silent, ceph logs in /var/log/ceph are also showing nothing on the system that mounts the cephfs.
Updated by Zheng Yan almost 9 years ago
could you please provide me a list of corrupt blocks (offset and size of corrupt block). Besides, could you please try using direct io (with 1M or larger block size) to do the write.
Updated by Zheng Yan almost 9 years ago
- File 0001-ceph-fix-ceph_writepages_start.patch 0001-ceph-fix-ceph_writepages_start.patch added
- Status changed from 12 to 7
please try the attached patch
Updated by Bernd Helm almost 9 years ago
Zheng Yan wrote:
please try the attached patch
i have tried your patch with my test case with 300GB of data and got no corruptions. The Problem seems to be resolved.
Thank you!