Project

General

Profile

Bug #39938

Issues with CephFS kernel driver

Added by Patrik Martinsson about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
Urgent
Assignee:
Category:
fs/ceph
Target version:
Start date:
05/15/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:

Description

I described the problem pretty thoroughly on https://marc.info/?l=ceph-devel&m=155786104524387&w=2 but not sure it reached the right audience.
I'll open a bug here as well.

// Patrik

upload_file - trace from the kernel driver when gitlab save the actual png (872 KB) Patrik Martinsson, 05/15/2019 10:38 AM

stat_file - trace from the kernel driver when statting the file from another server (0bytes) (15.4 KB) Patrik Martinsson, 05/15/2019 10:38 AM

overwriting_file - trace from the kernel driver when I overwrite the file with from the gitlab server (40 KB) Patrik Martinsson, 05/15/2019 10:38 AM

stat_file_2 - trace from the kernel driver when statting the file from another server (correct content is now available) (17.6 KB) Patrik Martinsson, 05/15/2019 10:39 AM

History

#1 Updated by Ilya Dryomov about 1 month ago

  • Project changed from Ceph to Linux kernel client
  • Category changed from ceph cli to fs/ceph
  • Assignee set to Zheng Yan
  • Priority changed from Normal to Urgent

#2 Updated by Zheng Yan about 1 month ago

Thanks for reporting this. This seems like a splice write bug.


ssize_t
ceph_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
                        loff_t *ppos, size_t len, unsigned int flags)
{
        ssize_t ret;
        struct inode *inode = file_inode(out);
        struct ceph_inode_info *ci = ceph_inode(inode);
        struct ceph_file_info *fi = out->private_data;
        int got, want;

        if (fi->fmode & CEPH_FILE_MODE_LAZY)
                want = CEPH_CAP_FILE_BUFFER | CEPH_CAP_FILE_LAZYIO;
        else
                want = CEPH_CAP_FILE_BUFFER;

        ret = ceph_get_caps(ci, CEPH_CAP_FILE_WR, want, *ppos + len, &got, NULL);
        if (ret < 0)
                return ret;

        if (!(got & want)) {
                ceph_put_cap_refs(ci, got);
                return default_file_splice_write(pipe, out, ppos, len, flags);
        }

        ret = generic_file_splice_write(pipe, out, ppos, len, flags);
        ceph_put_cap_refs(ci, got);
        return ret;
}

needs to call __ceph_mark_dirty_caps() before ceph_put_cap_refs(). try config gitlab to not use splice write or using upstream kernel.

#3 Updated by Patrik Martinsson about 1 month ago

Zheng Yan wrote:

Thanks for reporting this. This seems like a splice write bug.

[...]

needs to call __ceph_mark_dirty_caps() before ceph_put_cap_refs(). try config gitlab to not use splice write or using upstream kernel.

I see. Hm, seems quite serious.

try config gitlab not use splice write

That seems like a "low level" thing. Pretty sure you cant configure the application to do so. Non the less, splice() will need to work correctly, no ?

using upstream kernel.

We are using Rhel 7.6 and I've tried with elrepo-kernels (kernel-lt-4.4.178-1.el7.elrepo.x86_64.rpm, kernel-ml-5.1.1-1.el7.elrepo.x86_64.rpm), but then I get the following on the client, "mon0 xx: feature set mismatch, my XXXXXX < server's XXXXXX, missing 40000000".

Are you saying that this is fixed in a newer kernel driver than the one included for Rhel 7.6 ?

#4 Updated by Zheng Yan about 1 month ago

that's strange, 5.1 kernel is much newer than luminous, it should support all ceph features

Also available in: Atom PDF