Bug #63634
open[RFC] limit iov structures to 1024 while performing async I/O
0%
Description
This tracker is just to get initial thoughts/opinions about limiting the no. of iovec structures that can be read/written in one go while performing async I/O. According to the man page of pwritev:
NOTES
POSIX.1 allows an implementation to place a limit on the number of items that can be passed in iov. An implementation can
advertise its limit by defining IOV_MAX in <limits.h> or at run time via the return value from sysconf(_SC_IOV_MAX). On mod‐
ern Linux systems, the limit is 1024. Back in Linux 2.0 days, this limit was 16.
So it doesn't say that it is strictly limited but it "allows" to put a limit. While testing async I/0 libcephfs call, I found that we can happily breach the limit of 1024, there is no check for the IOV_MAX in client code but I do see a lot of code paths within ceph handling IOV_MAX:
blk/kernel/KernelDevice.cc: if ((!buffered || bl.get_num_buffers() >= IOV_MAX) &&
blk/kernel/KernelDevice.cc: bl.rebuild_aligned_size_and_memory(block_size, block_size, IOV_MAX)) {
blk/kernel/KernelDevice.cc: if ((!buffered || bl.get_num_buffers() >= IOV_MAX) &&
blk/kernel/KernelDevice.cc: bl.rebuild_aligned_size_and_memory(block_size, block_size, IOV_MAX)) {
client/fuse_ll.cc: if (bl.get_num_buffers() > IOV_MAX)
common/buffer.cc: iovec iov[IOV_MAX];
common/buffer.cc: if (iovlen == IOV_MAX ||
common/buffer.cc: iovec iov[IOV_MAX];
common/buffer.cc: uint64_t size = std::min<uint64_t>(left_pbrs, IOV_MAX);
common/buffer.cc: iov_vec_t iovs{_num / IOV_MAX + 1};
common/buffer.cc: std::min(_num - IOV_MAX * nr_iov_created, (size_t)IOV_MAX));
common/buffer.cc: if (++index == IOV_MAX) {
include/buffer.h: ceph_assert(_num <= IOV_MAX);
include/buffer.h: assert(_num <= IOV_MAX);
include/compat.h:#define IOV_MAX 1024
msg/async/AsyncConnection.h:static const int ASYNC_IOV_MAX = (IOV_MAX >= 1024 ? IOV_MAX / 4 : IOV_MAX);
msg/async/PosixStack.cc: struct iovec msgvec[IOV_MAX];
msg/async/PosixStack.cc: uint64_t size = std::min<uint64_t>(left_pbrs, IOV_MAX);
msg/async/PosixStack.cc: WSABUF msgvec[IOV_MAX];
msg/async/PosixStack.cc: uint64_t size = std::min<uint64_t>(left_pbrs, IOV_MAX);
os/bluestore/BlueStore.cc: bl.rebuild_aligned_size_and_memory(BDEV_LABEL_BLOCK_SIZE, BDEV_LABEL_BLOCK_SIZE, IOV_MAX);
test/bufferlist.cc: for (unsigned i = 0; i < IOV_MAX * 2; i++) {
test/bufferlist.cc: EXPECT_EQ(IOV_MAX * 2, st.st_size);
test/bufferlist.cc: for (unsigned i = 0; i < IOV_MAX * 2; i++) {
test/bufferlist.cc: EXPECT_EQ(IOV_MAX * 2 + offset, (unsigned)st.st_size);
the kernel allocates resources and performs operations based on assumptions tied to IOV_MAX, exceeding this limit might lead to resource allocation issues, such as exhausting available kernel memory or encountering buffer overflows thus impacting system stability and performance. What i feel is if not now then maybe later on we might eventually have to handle this in client code too.