Project

General

Profile

Actions

Bug #11989

closed

Cephfs Kernel Client data corruption

Added by Bernd Helm almost 9 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi. i get random data corruption with the cephfs kernel client. i do streaming from a non-ceph machine using "cat <file> | nc -l -p 444 -q" into cephfs using "nc <ip> 444 | pv -r -t -a > /mnt/cephfs/testfile". the file is 30GB of size and contains ascii csv data. when i do this, i get a high chance for a file corruption. The corruption manifests in a block of 0x00 bytes with a random size between ~500 to 4000 bytes.

After testing for around for 80+ hours, here are the things i know for sure:

  1. i was able to reproduce this with with giant (0.87.2-1~bpo70+1) and hammer (0.94.1-1~bpo70+1, 0.94.2-1~bpo70+1) ceph versions
  2. tried kernel versions 4.0.4, 3.16.0-4-amd64 and 3.14.43-031443
  3. it also happens if the node that is writing to cephfs is NOT running any osds, but it happens more often if a server that is running osds is mounting cephfs locally. which may also be traffic related
  4. if i test the file directly after writing, the md5 sum matches, which makes me think that it got handled over to cephfs correctly. if i then issue a "echo 3 > /proc/sys/vm/drop_caches" locally to clear the buffers and md5sum again, it does not match the previous checksum.
  5. with a 30GB file, i have a chance of 1-4 corrupt blocks in that file.
  6. i were also not able to reproduce this with the ceph-fuse client.
  7. i were not able to reproduce it when i copy the file from a local tmpfs location, it seems somehow related to the network traffic.

mount:
172.16.0.101:6789:/ on /ceph-kernel type ceph (rw,relatime,name=admin,secret=<hidden>,nodcache,nofsc,acl)

root@idbi5:/ceph-kernel/kernel# nc 172.16.0.62 445 | pv -r -a -t > testfile1
0:01:11 [ 432MB/s] [ 432MB/s]
root@idbi5:/ceph-kernel/kernel# md5sum testfile1
801a0ec20f59aa4e4da51a8337cb722f testfile1 #this is the correct checksum
root@idbi5:/ceph-kernel/kernel# echo 3 > /proc/sys/vm/drop_caches
root@idbi5:/ceph-kernel/kernel# md5sum testfile1
8558af60da1f3c062fea18f0ce81ef24 testfile1 #this checksum is wrong

  • i have already reinstalled all servers with debian 7.8 wheezy
  • reinstalled ceph multiple times with stock configuration and crush map.
  • I also removed every server from the cluster to exclude hardware-related problems like bad ram.
  • i have a 5 node cluster with 20 osds on xfs, 10 osds with hdd, 10 with ssd. i testet this in hdd-only and ssd-only cephfs pools.
  • i issued a deep scrub on all pgs, no errors showed up
  • hardware used: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 128GB ECC ram, 10GBit Ethernet

Are more informations required? Thank you.


Files

Actions #1

Updated by Greg Farnum almost 9 years ago

  • Project changed from Ceph to CephFS
  • Category changed from 26 to 53
  • Assignee set to Zheng Yan

I imagine this is a result of some kind of memory exhaustion, but I'm not sure how best to diagnose it or if there are other possibilities. Zheng?

Actions #2

Updated by Zheng Yan almost 9 years ago

are there any suspected message when this happens?

Actions #3

Updated by Bernd Helm almost 9 years ago

Zheng Yan wrote:

are there any suspected message when this happens?

dmesg is silent, ceph logs in /var/log/ceph are also showing nothing on the system that mounts the cephfs.

Actions #4

Updated by Zheng Yan almost 9 years ago

could you please provide me a list of corrupt blocks (offset and size of corrupt block). Besides, could you please try using direct io (with 1M or larger block size) to do the write.

Actions #5

Updated by Zheng Yan almost 9 years ago

  • Status changed from New to 12

I reproduced this locally

Actions #6

Updated by Zheng Yan almost 9 years ago

please try the attached patch

Actions #7

Updated by Bernd Helm almost 9 years ago

Zheng Yan wrote:

please try the attached patch

i have tried your patch with my test case with 300GB of data and got no corruptions. The Problem seems to be resolved.

Thank you!

Actions #8

Updated by Zheng Yan almost 9 years ago

  • Status changed from 7 to Resolved
Actions #9

Updated by Greg Farnum almost 8 years ago

  • Component(FS) kceph added
Actions

Also available in: Atom PDF