Bug #20528
closedcephfs client io hang when read file with specified layout
0%
Description
- getfattr -n ceph.file.layout tstfile
ceph.file.layout="stripe_unit=67108864 stripe_count=1 object_size=67108864 pool=cephfs_data”
Then I write some data into the file and read from it, the write operation work well, but the read operation hang and not return.
It's easy to reproduce the issue with below command.- mount -t ceph <cluster ip>:6789 /mnt
- cd /mnt
- touch tstfile
- setfattr -n ceph.file.layout -v "stripe_unit=67108864 stripe_count=1 object_size=67108864" tstfile
- dd if=/dev/zero of=tstfile bs=64M count=1
- dd if=tstfile of=/dev/null bs=64M count=1
Then you would find the dd read command hang and not return, and the process state would be with "D", even you could kill the process, but the "umount /mnt" process would be hang.
I had tracked the cephfs kernel client and ceph osd log, found the ceph osd process read request as usual and return, but the client get null message and consider the io error, then the client keep trying to read the data and got null message too... that's why the dd read command hang and not return.
My test environment as below:
Ceph version: Jewel 10.2.7
Cephfs clent os: CentOS Linux release 7.2.1511 (Core)
Cephfs client kernel version: 4.11.3-1.el7.elrepo.x86_64
And you can get the cephfs client log and osd log in the attachment.
Files
Updated by Zheng Yan almost 7 years ago
- Status changed from New to 7
in net/ceph/messenger.c
static int read_partial_message(struct ceph_connection *con) { ... front_len = le32_to_cpu(con->in_hdr.front_len); if (front_len > CEPH_MSG_MAX_FRONT_LEN) return -EIO; middle_len = le32_to_cpu(con->in_hdr.middle_len); if (middle_len > CEPH_MSG_MAX_MIDDLE_LEN) return -EIO; data_len = le32_to_cpu(con->in_hdr.data_len); if (data_len > CEPH_MSG_MAX_DATA_LEN) return -EIO;
CEPH_MSG_MAX_DATA_LEN is 16M.
Fixed by https://github.com/ceph/ceph-client/commit/80b00f5a91830b7ef41390ae84e88f4042a35beb