Project

General

Profile

Bug #20528

cephfs client io hang when read file with specified layout

Added by Michael Yang over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

In my test, I configured file with below attribute.
  1. getfattr -n ceph.file.layout tstfile
    ceph.file.layout="stripe_unit=67108864 stripe_count=1 object_size=67108864 pool=cephfs_data”

Then I write some data into the file and read from it, the write operation work well, but the read operation hang and not return.

It's easy to reproduce the issue with below command.
  1. mount -t ceph <cluster ip>:6789 /mnt
  2. cd /mnt
  3. touch tstfile
  4. setfattr -n ceph.file.layout -v "stripe_unit=67108864 stripe_count=1 object_size=67108864" tstfile
  5. dd if=/dev/zero of=tstfile bs=64M count=1
  6. dd if=tstfile of=/dev/null bs=64M count=1

Then you would find the dd read command hang and not return, and the process state would be with "D", even you could kill the process, but the "umount /mnt" process would be hang.

I had tracked the cephfs kernel client and ceph osd log, found the ceph osd process read request as usual and return, but the client get null message and consider the io error, then the client keep trying to read the data and got null message too... that's why the dd read command hang and not return.

My test environment as below:
Ceph version: Jewel 10.2.7
Cephfs clent os: CentOS Linux release 7.2.1511 (Core)
Cephfs client kernel version: 4.11.3-1.el7.elrepo.x86_64

And you can get the cephfs client log and osd log in the attachment.

cephfs-kernel-client.log View (83.8 KB) Michael Yang, 07/06/2017 07:38 AM

ceph-osd.log View (79.3 KB) Michael Yang, 07/06/2017 07:38 AM

History

#1 Updated by Zheng Yan over 5 years ago

  • Status changed from New to 7

in net/ceph/messenger.c

static int read_partial_message(struct ceph_connection *con)
{
...
        front_len = le32_to_cpu(con->in_hdr.front_len);
        if (front_len > CEPH_MSG_MAX_FRONT_LEN)
                return -EIO;
        middle_len = le32_to_cpu(con->in_hdr.middle_len);
        if (middle_len > CEPH_MSG_MAX_MIDDLE_LEN)
                return -EIO;
        data_len = le32_to_cpu(con->in_hdr.data_len);
        if (data_len > CEPH_MSG_MAX_DATA_LEN)
                return -EIO;

CEPH_MSG_MAX_DATA_LEN is 16M.

Fixed by https://github.com/ceph/ceph-client/commit/80b00f5a91830b7ef41390ae84e88f4042a35beb

#2 Updated by Zheng Yan over 5 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF