Project

General

Profile

Actions

Bug #39938

closed

Issues with CephFS kernel driver

Added by Patrik Martinsson almost 5 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
fs/ceph
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I described the problem pretty thoroughly on https://marc.info/?l=ceph-devel&m=155786104524387&w=2 but not sure it reached the right audience.
I'll open a bug here as well.

// Patrik


Files

upload_file (872 KB) upload_file trace from the kernel driver when gitlab save the actual png Patrik Martinsson, 05/15/2019 10:38 AM
stat_file (15.4 KB) stat_file trace from the kernel driver when statting the file from another server (0bytes) Patrik Martinsson, 05/15/2019 10:38 AM
overwriting_file (40 KB) overwriting_file trace from the kernel driver when I overwrite the file with from the gitlab server Patrik Martinsson, 05/15/2019 10:38 AM
stat_file_2 (17.6 KB) stat_file_2 trace from the kernel driver when statting the file from another server (correct content is now available) Patrik Martinsson, 05/15/2019 10:39 AM
Actions #1

Updated by Ilya Dryomov almost 5 years ago

  • Project changed from Ceph to Linux kernel client
  • Category changed from ceph cli to fs/ceph
  • Assignee set to Zheng Yan
  • Priority changed from Normal to Urgent
Actions #2

Updated by Zheng Yan almost 5 years ago

Thanks for reporting this. This seems like a splice write bug.


ssize_t
ceph_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
                        loff_t *ppos, size_t len, unsigned int flags)
{
        ssize_t ret;
        struct inode *inode = file_inode(out);
        struct ceph_inode_info *ci = ceph_inode(inode);
        struct ceph_file_info *fi = out->private_data;
        int got, want;

        if (fi->fmode & CEPH_FILE_MODE_LAZY)
                want = CEPH_CAP_FILE_BUFFER | CEPH_CAP_FILE_LAZYIO;
        else
                want = CEPH_CAP_FILE_BUFFER;

        ret = ceph_get_caps(ci, CEPH_CAP_FILE_WR, want, *ppos + len, &got, NULL);
        if (ret < 0)
                return ret;

        if (!(got & want)) {
                ceph_put_cap_refs(ci, got);
                return default_file_splice_write(pipe, out, ppos, len, flags);
        }

        ret = generic_file_splice_write(pipe, out, ppos, len, flags);
        ceph_put_cap_refs(ci, got);
        return ret;
}

needs to call __ceph_mark_dirty_caps() before ceph_put_cap_refs(). try config gitlab to not use splice write or using upstream kernel.

Actions #3

Updated by Patrik Martinsson almost 5 years ago

Zheng Yan wrote:

Thanks for reporting this. This seems like a splice write bug.

[...]

needs to call __ceph_mark_dirty_caps() before ceph_put_cap_refs(). try config gitlab to not use splice write or using upstream kernel.

I see. Hm, seems quite serious.

try config gitlab not use splice write

That seems like a "low level" thing. Pretty sure you cant configure the application to do so. Non the less, splice() will need to work correctly, no ?

using upstream kernel.

We are using Rhel 7.6 and I've tried with elrepo-kernels (kernel-lt-4.4.178-1.el7.elrepo.x86_64.rpm, kernel-ml-5.1.1-1.el7.elrepo.x86_64.rpm), but then I get the following on the client, "mon0 xx: feature set mismatch, my XXXXXX < server's XXXXXX, missing 40000000".

Are you saying that this is fixed in a newer kernel driver than the one included for Rhel 7.6 ?

Actions #4

Updated by Zheng Yan almost 5 years ago

that's strange, 5.1 kernel is much newer than luminous, it should support all ceph features

Actions #5

Updated by Patrick Donnelly over 3 years ago

  • Assignee deleted (Zheng Yan)
Actions #6

Updated by Jeff Layton over 2 years ago

  • Status changed from New to Resolved

I believe this has been fixed upstream for quite some time.

Actions

Also available in: Atom PDF