Project

General

Profile

Actions

Bug #63634

open

[RFC] limit iov structures to 1024 while performing async I/O

Added by Dhairya Parmar 6 months ago. Updated 9 days ago.

Status:
Triaged
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This tracker is just to get initial thoughts/opinions about limiting the no. of iovec structures that can be read/written in one go while performing async I/O. According to the man page of pwritev:

NOTES
       POSIX.1  allows  an  implementation to place a limit on the number of items that can be passed in iov.  An implementation can
       advertise its limit by defining IOV_MAX in <limits.h> or at run time via the return value from sysconf(_SC_IOV_MAX).  On mod‐
       ern Linux systems, the limit is 1024.  Back in Linux 2.0 days, this limit was 16.

So it doesn't say that it is strictly limited but it "allows" to put a limit. While testing async I/0 libcephfs call, I found that we can happily breach the limit of 1024, there is no check for the IOV_MAX in client code but I do see a lot of code paths within ceph handling IOV_MAX:

blk/kernel/KernelDevice.cc:  if ((!buffered || bl.get_num_buffers() >= IOV_MAX) &&
blk/kernel/KernelDevice.cc:      bl.rebuild_aligned_size_and_memory(block_size, block_size, IOV_MAX)) {
blk/kernel/KernelDevice.cc:  if ((!buffered || bl.get_num_buffers() >= IOV_MAX) &&
blk/kernel/KernelDevice.cc:      bl.rebuild_aligned_size_and_memory(block_size, block_size, IOV_MAX)) {
client/fuse_ll.cc:    if (bl.get_num_buffers() > IOV_MAX)
common/buffer.cc:  iovec iov[IOV_MAX];
common/buffer.cc:    if (iovlen == IOV_MAX ||
common/buffer.cc:  iovec iov[IOV_MAX];
common/buffer.cc:    uint64_t size = std::min<uint64_t>(left_pbrs, IOV_MAX);
common/buffer.cc:  iov_vec_t iovs{_num / IOV_MAX + 1};
common/buffer.cc:       std::min(_num - IOV_MAX * nr_iov_created, (size_t)IOV_MAX));
common/buffer.cc:    if (++index == IOV_MAX) {
include/buffer.h:      ceph_assert(_num <= IOV_MAX);
include/buffer.h:      assert(_num <= IOV_MAX);
include/compat.h:#define IOV_MAX 1024
msg/async/AsyncConnection.h:static const int ASYNC_IOV_MAX = (IOV_MAX >= 1024 ? IOV_MAX / 4 : IOV_MAX);
msg/async/PosixStack.cc:      struct iovec msgvec[IOV_MAX];
msg/async/PosixStack.cc:      uint64_t size = std::min<uint64_t>(left_pbrs, IOV_MAX);
msg/async/PosixStack.cc:      WSABUF msgvec[IOV_MAX];
msg/async/PosixStack.cc:      uint64_t size = std::min<uint64_t>(left_pbrs, IOV_MAX);
os/bluestore/BlueStore.cc:  bl.rebuild_aligned_size_and_memory(BDEV_LABEL_BLOCK_SIZE, BDEV_LABEL_BLOCK_SIZE, IOV_MAX);
test/bufferlist.cc:  for (unsigned i = 0; i < IOV_MAX * 2; i++) {
test/bufferlist.cc:  EXPECT_EQ(IOV_MAX * 2, st.st_size);
test/bufferlist.cc:  for (unsigned i = 0; i < IOV_MAX * 2; i++) {
test/bufferlist.cc:  EXPECT_EQ(IOV_MAX * 2 + offset, (unsigned)st.st_size);

the kernel allocates resources and performs operations based on assumptions tied to IOV_MAX, exceeding this limit might lead to resource allocation issues, such as exhausting available kernel memory or encountering buffer overflows thus impacting system stability and performance. What i feel is if not now then maybe later on we might eventually have to handle this in client code too.

Actions

Also available in: Atom PDF