Bug #19189
closedcephfs kernel 4.9.13 file read hangs
0%
Description
using cephfs 10.2.5 on compute cluster with 4k cores and kernel 4.8.17 works like a charm. upgrading 3 nodes to 4.9.13 however results in hanging reads from cephfs after ~1h of activity.
One example:
dmesg:
[ 961.407146] libceph: get_reply osd174 tid 10093 data 20480 > preallocated 16384, skipping
f9nd104 ~ # cat /sys/kernel/debug/ceph/161863fa-917d-497c-aa88-ec2772c576ef.client2118648/osdc
REQUESTS 1 homeless 0
10093 osd174 3.5c062ee [174,91]/174 [174,91]/174 100071c1117.0000016b 0x400011 1 0'0 read
LINGER REQUESTS
this request hangs indefinitely.
the rest of the cephfs filesystem seems to work. also, trying to read the same file which is hanging on 4.9.13 node works without problems on any 4.8.17 node.
Updated by Zheng Yan about 7 years ago
- Status changed from New to 12
The bug was introduced in 4.9 kernel by commit https://github.com/ceph/ceph-client/commit/1afe478569ba7414dde8a874dda9c1ea621c0c63
The bug was fixed in 4.10 kernel by commit https://github.com/ceph/ceph-client/commit/d641df819db8b80198fd85d9de91137e8a823b07
I just sent it to stable@vger.kernel.org
Updated by Zheng Yan about 7 years ago
- Status changed from 12 to Resolved
the fix is merged into stable-4.9 tree