Bug #4603

ceph: writeback pagevec pool is created incorrectly

Added by Alex Elder about 8 years ago. Updated almost 8 years ago.

Target version:
% Done:


3 - minor
Affected Versions:
Pull request ID:
Crash signature (v1):
Crash signature (v2):


In ceph_writepages_start() if there are any pages to write back
an array of page pointers is needed, sufficient to hold pointes
to as many pages as are required to satisfy the minimum I/O size
for the inode (or the file system containing it).

If a "normal" kmalloc() request fails to allocate the array,
a memory pool associated with the file system client is used.

The memory pool is created in create_fs_client(), and its size
parameter is computed incorrectly in two ways:
- it uses fsc->mount_options->wsize >> PAGE_CACHE_SHIFT, to
determine how many pages should be supported for a request; and
- the size supplied is supposed to be a multiple of the size
of a page pointer, but it's the count of pages.

The first can be a problem because the wsize parameter could
be less than PAGE_SIZE (including 0), but we will always need
to support writing at least one page. This might not be a
real problem in practice.

The second problem is serious though. The argument value needs
to be the product of the page count and the size of a page pointer.

I have a fix for this, I'm just documenting the problem.


#1 Updated by Alex Elder about 8 years ago

I forgot to mention, the other problem is that the number
of pages required in ceph_writepages_start() is computed
as (essentially):

    max_size = 1 << inode->i_blkbits;
    if (fsc->mount_options->wsize && fsc->mount_options->wsize < max_size)
        max_size = fsc->mount_options->wsize;
    if (max_size < PAGE_CACHE_SIZE)
        max_size = PAGE_CACHE_SIZE;
    pages_required = max_size >> PAGE_CACHE_SHIFT;

So as mentioned above, we need at least one page. But in addition
if the "wsize" mount option has value 0 (which is the default), then
the number of pages required could be as much as

    inode->i_blkbits >> PAGE_CACHE_SHIFT

And that is initialized in fs/ceph/inode.c:fill_inode() as:

        inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;

The layout comes from the mds, but it looks to me like the
stripe unit therein can be set without limit by an ioctl() via

So bottom line of all this is that inside ceph_writepages_start()
if we are going to fall back to using the memory pool to satisfy
our request for a page pointer array, the size of the array in
the pool needs to be sufficient to hold the maximum sized stripe
unit value (which is apparently 0xffffffff).

My fix addresses the more simple issues described initially. I
do not intend to address this broader problem, I'll leave it to
someone else (or at least some other time) to fix that.

#2 Updated by Alex Elder about 8 years ago

  • Status changed from In Progress to Fix Under Review

The following patch has been posted for review, to
address the two simple problems initially described
in this issue:
[PATCH] ceph: set up page array mempool with correct size

#3 Updated by Alex Elder about 8 years ago

It's possible the maximum number of pages we'll ever need
is PAGEVEC_SIZE. But I'm not going to spend any time right
now trying to verify. It looks a bit like that might be
an upper bound though, so thought I'd mention that here.

#4 Updated by Alex Elder about 8 years ago

  • Status changed from Fix Under Review to Resolved
  • Target version set to v0.62a

The following has been committed to the ceph-client
"testing" branch:
df39ef2 ceph: set up page array mempool with correct size

#5 Updated by Alex Elder almost 8 years ago

On the osd, it looks to me like CEPH_MDS_OP_SETLAYOUT uses
ceph_file_layout_is_valid() to verify the layout supplied
is valid. The constraints defined for the stripe unit
there are:
- it must be non-zero; and
- it must be a multiple of CEPH_MIN_STRIPE_UNIT=65536

Since the stripe unit is represented in 32 bits, that
means the maximum valid stripe unit is 65535 or 0xffff.

#6 Updated by Alex Elder almost 8 years ago

Whoops, mean to update

Also available in: Atom PDF