Bug #24074
openRead ahead in fuse client is broken with large buffer size
0%
Description
If the read is larger than 128K(e.g. 4M as our object size), fuse client will receive read requests as multiple ll_read of 128K(due to 32 page kernel limit).
Fuse client is actually sending multiple requests to read 128K, instead of a request of 4M. Read ahead is also not happening.
Could we either increase the max read size for ll_read, or fix the readahead to make it actually send the request with size 4M.
Here is the debug_ms history with a single OSD.
-972> 2018-05-09 23:18:23.583724 7fa7a0751700 1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4927 ==== osd_op_reply(3707530 1000024e6f1.0000003e [read 1179648~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (1750938965 0 1213602604) 0x5574f9e9d400 con 0x55751efd2800 -893> 2018-05-09 23:18:23.997039 7fa78bf28700 1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707548 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1310720~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x5574f9e9b380 con 0 -879> 2018-05-09 23:18:24.212921 7fa7a0751700 5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4928 0x5574f9e9b380 osd_op_reply(3707548 1000024e6f1.0000003e [read 1310720~131072] v0'0 uv317663 ondisk = 0) v8 -878> 2018-05-09 23:18:24.212962 7fa7a0751700 1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4928 ==== osd_op_reply(3707548 1000024e6f1.0000003e [read 1310720~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (3003036829 0 3756573223) 0x5574f9e9b380 con 0x55751efd2800 -763> 2018-05-09 23:18:24.999942 7fa78e72d700 1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707572 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1441792~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x55752ae4b040 con 0 -742> 2018-05-09 23:18:25.425097 7fa7a0751700 5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4929 0x5574fdac9d40 osd_op_reply(3707572 1000024e6f1.0000003e [read 1441792~131072] v0'0 uv317663 ondisk = 0) v8 -741> 2018-05-09 23:18:25.425127 7fa7a0751700 1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4929 ==== osd_op_reply(3707572 1000024e6f1.0000003e [read 1441792~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (4217386021 0 2545823479) 0x5574fdac9d40 con 0x55751efd2800 -665> 2018-05-09 23:18:25.816468 7fa789f24700 1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707590 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1572864~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x557510f6f6c0 con 0 -547> 2018-05-09 23:18:26.665076 7fa7a0751700 5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4930 0x55751ee4ea40 osd_op_reply(3707590 1000024e6f1.0000003e [read 1572864~131072] v0'0 uv317663 ondisk = 0) v8 -546> 2018-05-09 23:18:26.665106 7fa7a0751700 1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4930 ==== osd_op_reply(3707590 1000024e6f1.0000003e [read 1572864~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (39174652 0 2598685363) 0x55751ee4ea40 con 0x55751efd2800 -419> 2018-05-09 23:18:27.770692 7fa78af26700 1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707635 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1703936~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x5574ef1563c0 con 0 -400> 2018-05-09 23:18:27.798510 7fa7a0751700 5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4931 0x557515f6ea40 osd_op_reply(3707635 1000024e6f1.0000003e [read 1703936~131072] v0'0 uv317663 ondisk = 0) v8 -399> 2018-05-09 23:18:27.798540 7fa7a0751700 1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4931 ==== osd_op_reply(3707635 1000024e6f1.0000003e [read 1703936~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (1271611716 0 1990543321) 0x557515f6ea40 con 0x55751efd2800 -298> 2018-05-09 23:18:28.068584 7fa790731700 1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707656 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1835008~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x5575140b8680 con 0 -162> 2018-05-09 23:18:28.897127 7fa7a0751700 5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4932 0x55752973d040 osd_op_reply(3707656 1000024e6f1.0000003e [read 1835008~131072] v0'0 uv317663 ondisk = 0) v8 -161> 2018-05-09 23:18:28.897174 7fa7a0751700 1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4932 ==== osd_op_reply(3707656 1000024e6f1.0000003e [read 1835008~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (2439561356 0 416711656) 0x55752973d040 con 0x55751efd2800
Updated by Patrick Donnelly almost 6 years ago
- Status changed from New to Need More Info
- Target version set to v14.0.0
- Source set to Community (user)
- Backport set to mimic,luminous
- Component(FS) Client added
- Labels (FS) task(intern) added
Chuan Qiu wrote:
If the read is larger than 128K(e.g. 4M as our object size), fuse client will receive read requests as multiple ll_read of 128K(due to 32 page kernel limit).
Fuse client is actually sending multiple requests to read 128K, instead of a request of 4M. Read ahead is also not happening.
Could we either increase the max read size for ll_read, or fix the readahead to make it actually send the request with size 4M.
Here is the debug_ms history with a single OSD.
[...]
Have you tried changing the file layouts?
http://docs.ceph.com/docs/master/cephfs/file-layouts/
http://docs.ceph.com/docs/master/dev/file-striping/
Please let us know what configuration you have for your client and a snippet of a `debug client = 20` log to help diagnose.
Updated by Zheng Yan almost 6 years ago
try passing '--client_readahead_max_bytes=4194304' option to ceph-fuse
Updated by Patrick Donnelly about 5 years ago
- Target version changed from v14.0.0 to v15.0.0