Project

General

Profile

Bug #24074

Read ahead in fuse client is broken with large buffer size

Added by Chuan Qiu 7 months ago. Updated 7 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
05/10/2018
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
task(intern)
Pull request ID:

Description

If the read is larger than 128K(e.g. 4M as our object size), fuse client will receive read requests as multiple ll_read of 128K(due to 32 page kernel limit).

Fuse client is actually sending multiple requests to read 128K, instead of a request of 4M. Read ahead is also not happening.

Could we either increase the max read size for ll_read, or fix the readahead to make it actually send the request with size 4M.

Here is the debug_ms history with a single OSD.


  -972> 2018-05-09 23:18:23.583724 7fa7a0751700  1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4927 ==== osd_op_reply(3707530 1000024e6f1.0000003e [read 1179648~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (1750938965 0 1213602604) 0x5574f9e9d400 con 0x55751efd2800
  -893> 2018-05-09 23:18:23.997039 7fa78bf28700  1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707548 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1310720~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x5574f9e9b380 con 0
  -879> 2018-05-09 23:18:24.212921 7fa7a0751700  5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4928 0x5574f9e9b380 osd_op_reply(3707548 1000024e6f1.0000003e [read 1310720~131072] v0'0 uv317663 ondisk = 0) v8
  -878> 2018-05-09 23:18:24.212962 7fa7a0751700  1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4928 ==== osd_op_reply(3707548 1000024e6f1.0000003e [read 1310720~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (3003036829 0 3756573223) 0x5574f9e9b380 con 0x55751efd2800
  -763> 2018-05-09 23:18:24.999942 7fa78e72d700  1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707572 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1441792~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x55752ae4b040 con 0
  -742> 2018-05-09 23:18:25.425097 7fa7a0751700  5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4929 0x5574fdac9d40 osd_op_reply(3707572 1000024e6f1.0000003e [read 1441792~131072] v0'0 uv317663 ondisk = 0) v8
  -741> 2018-05-09 23:18:25.425127 7fa7a0751700  1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4929 ==== osd_op_reply(3707572 1000024e6f1.0000003e [read 1441792~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (4217386021 0 2545823479) 0x5574fdac9d40 con 0x55751efd2800
  -665> 2018-05-09 23:18:25.816468 7fa789f24700  1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707590 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1572864~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x557510f6f6c0 con 0
  -547> 2018-05-09 23:18:26.665076 7fa7a0751700  5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4930 0x55751ee4ea40 osd_op_reply(3707590 1000024e6f1.0000003e [read 1572864~131072] v0'0 uv317663 ondisk = 0) v8
  -546> 2018-05-09 23:18:26.665106 7fa7a0751700  1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4930 ==== osd_op_reply(3707590 1000024e6f1.0000003e [read 1572864~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (39174652 0 2598685363) 0x55751ee4ea40 con 0x55751efd2800
  -419> 2018-05-09 23:18:27.770692 7fa78af26700  1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707635 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1703936~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x5574ef1563c0 con 0
  -400> 2018-05-09 23:18:27.798510 7fa7a0751700  5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4931 0x557515f6ea40 osd_op_reply(3707635 1000024e6f1.0000003e [read 1703936~131072] v0'0 uv317663 ondisk = 0) v8
  -399> 2018-05-09 23:18:27.798540 7fa7a0751700  1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4931 ==== osd_op_reply(3707635 1000024e6f1.0000003e [read 1703936~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (1271611716 0 1990543321) 0x557515f6ea40 con 0x55751efd2800
  -298> 2018-05-09 23:18:28.068584 7fa790731700  1 -- 10.1.1.65:0/1835558936 --> 10.1.0.161:6828/27121 -- osd_op(unknown.0.0:3707656 7.14e2s0 7:472cd5dd:::1000024e6f1.0000003e:head [read 1835008~131072] snapc 0=[] ondisk+read+known_if_redirected e126144) v8 -- 0x5575140b8680 con 0
  -162> 2018-05-09 23:18:28.897127 7fa7a0751700  5 -- 10.1.1.65:0/1835558936 >> 10.1.0.161:6828/27121 conn(0x55751efd2800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8155 cs=1 l=1). rx osd.288 seq 4932 0x55752973d040 osd_op_reply(3707656 1000024e6f1.0000003e [read 1835008~131072] v0'0 uv317663 ondisk = 0) v8
  -161> 2018-05-09 23:18:28.897174 7fa7a0751700  1 -- 10.1.1.65:0/1835558936 <== osd.288 10.1.0.161:6828/27121 4932 ==== osd_op_reply(3707656 1000024e6f1.0000003e [read 1835008~131072] v0'0 uv317663 ondisk = 0) v8 ==== 164+0+131072 (2439561356 0 416711656) 0x55752973d040 con 0x55751efd2800

History

#1 Updated by Greg Farnum 7 months ago

  • Project changed from Ceph to fs

#2 Updated by Patrick Donnelly 7 months ago

  • Status changed from New to Need More Info
  • Target version set to v14.0.0
  • Source set to Community (user)
  • Backport set to mimic,luminous
  • Component(FS) Client added
  • Labels (FS) task(intern) added

Chuan Qiu wrote:

If the read is larger than 128K(e.g. 4M as our object size), fuse client will receive read requests as multiple ll_read of 128K(due to 32 page kernel limit).

Fuse client is actually sending multiple requests to read 128K, instead of a request of 4M. Read ahead is also not happening.

Could we either increase the max read size for ll_read, or fix the readahead to make it actually send the request with size 4M.

Here is the debug_ms history with a single OSD.
[...]

Have you tried changing the file layouts?

http://docs.ceph.com/docs/master/cephfs/file-layouts/
http://docs.ceph.com/docs/master/dev/file-striping/

Please let us know what configuration you have for your client and a snippet of a `debug client = 20` log to help diagnose.

#3 Updated by Zheng Yan 7 months ago

try passing '--client_readahead_max_bytes=4194304' option to ceph-fuse

Also available in: Atom PDF