Project

General

Profile

Bug #3793

wrong size reported in some distributions/toolchains

Added by Greg Farnum over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature:

Description

In ceph_statfs we set f_bsize to be 1MB in order to report very large available spaces. However, nowadays it is apparently correct to set to set f_bsize to the "optimal transfer size" and to use f_frsize (the "fundamental block size" instead of f_bsize's "block size") as the actual block size to use alongside the blocks free/used/available reporting.
In some newer toolchains this means our free space is reported incorrectly; see eg df from corutils 8.20 versus that in corutils 8.13 (from Debian; thanks dwm37!).

Presumably that means we want to set f_frsize to 1MB — but do we then want to reduce the f_bsize since 1MB transfers really aren't necessary? I'd guess that older tools will break themselves if we reduce f_bsize since they appear to be using it right now, so we probably shouldn't do that. And hopefully simply updating the f_frsize won't break anybody, but we should check this on a range of distros before distributing it.

History

#2 Updated by David McBride over 7 years ago

I spent a bit of time with gregaf trying to find authoritative sources for what the different values denote. While `man statfs` is unfortunately non-specific, `man statvfs` is more clear:

struct statvfs {
unsigned long f_bsize; /* file system block size /
unsigned long f_frsize; /
fragment size /
fsblkcnt_t f_blocks; /
size of fs in f_frsize units /
fsblkcnt_t f_bfree; /
# free blocks /
fsblkcnt_t f_bavail; /
# free blocks for unprivileged users /
fsfilcnt_t f_files; /
# inodes /
fsfilcnt_t f_ffree; /
# free inodes /
fsfilcnt_t f_favail; /
# free inodes for unprivileged users /
unsigned long f_fsid; /
file system ID /
unsigned long f_flag; /
mount flags /
unsigned long f_namemax; /
maximum filename length */
};

At present, Ceph is reporting an f_frsize of PAGE_CACHE_SIZE (read: 4kb) and an f_bsize of 1 << CEPH_BLOCK_SHIFT (read: 1MB). It is then reporting the number of free blocks in multiples of 1MB.

See: http://lxr.free-electrons.com/source/fs/ceph/super.c?a=x86#L55

When using an older 'stat' utility from coreutils, this results in:

Block size: 1048576    Fundamental block size: 1048576

... and 'df' returns the correct capacity a mounted CephFS filesystems.

However, modern versions of 'stat' show:

Block size: 1048576    Fundamental block size: 4096

... and 'df' dramatically under-reports the CephFS filesystem size (by a factor of 2^18.)

I think this may affect all programs using the 'statvfs' call to determine filesystem capacities, such as Samba; see: http://code.metager.de/source/xref/samba/source3/smbd/statvfs.c#118

(I note that Samba apparently considers the f_frsize value to be the optimal transfer block size for the filesystem, and not the f_bsize value as for the statfs syscall... In any case, Samba misreports the capacity of a kernel-mounted CephFS filesystem in my testing.)

This LKML discussion thread may be useful for supplying some wider context: https://lkml.org/lkml/2010/6/24/356

#3 Updated by Sage Weil over 7 years ago

That makes this sounds like a simple fix... we need to swap the frsize and bsize fields. Except that right now we are only getting correct results because the userland tools are old. Once we make this change, we will get bad results for old tools and good results for new tools.

I suggest we change both frsize and bsize to 4MB (instead of 1MB) since by default, 4MB is the default block size.

#4 Updated by Sage Weil over 7 years ago

  • Project changed from Linux kernel client to fs
  • Category deleted (fs/ceph)

#5 Updated by Ian Colle over 7 years ago

  • Assignee set to Sage Weil
  • Priority changed from Normal to High

#6 Updated by Sage Weil over 7 years ago

I pushed a wip-statvfs which fixes this for ceph-fuse.

#7 Updated by Sage Weil over 7 years ago

  • Status changed from New to Resolved

'ceph: fix statvfs fr_size' in kernel tree.

Also available in: Atom PDF