Project

General

Profile

Actions

Bug #17620

closed

Data Integrity Issue with kernel client vs fuse client

Added by Aaron Bassett over 7 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
kernel: 4.4.0-42-generic #62~14.04.1-Ubuntu SMP
I have a cluster with 10 osd nodes with 5 platters on each and 1 mds. I am mounting cephfs from 21 compute nodes and using mesos to schedule jobs across them. One of my jobs uses an s3 client which allows for multipart downloads in order to attempt to speed up downloads. This code can be seen here:

https://github.com/bibby/radula/blob/v0.7.5/radula/rad.py#L1774

This tool also includes a `verify` command which will compute an e-tag off a file on disk and compare it to an objects etag in the object store.

When mounting cephfs with the kernel client and running many downloads at once, with 3 threads each, the verify step is occasionally failing. These file's md5sums will also not match a file downloaded to a more traditional (local) filesystem. The filesize matches the good file. When diffed with `cmp`, the output looks like:

50498910209 174 |      0 ^
50498910210 202 M-^B 0 ^

50498910211 22 ^R 0 ^
50498910212 154 l 0 ^

50498910213 262 M-2 0 ^
50498910214 374 M-| 0 ^

50498910215 105 E 0 ^@

In this example, I have 59594209 0'ed out bytes in a 96G file.

When the compute nodes mount cephfs with the fuse client, I do not have any data integrity issues, however my maximum throughput is > 50% slower, so I'd really like to sort out the issue with the kernel client.

I've been watching the logs of the mds daemon and not seen it complaining about anything other than blocked requests during heavy writes. I haven't seem them go over 32s so they seem to be transient. I'm not sure if that's known/expected with cephfs or if it may be indicative of a problem. I see the blocked requests using both the kernel and fuse client. I'll note that all my osds and clients are dual 10G nics so it's very easy for a heavily loaded disk to become a bottleneck, I also will occasionally get client failing to respond to cache pressure during these heavy write periods, but

As a bit of a side note, in testing a direct single threaded download is much faster when writing to cephfs and so I will probably eventually move most of the jobs to that technique for this environments. However, any data integrity issue with the kernel client prevents me from using it at all, regardless of if I change jobs to be easier on cephfs.

Actions

Also available in: Atom PDF