Project

General

Profile

Bug #38482

Quotas: mounting quotas subdir doesn't respect quotas in kernel client

Added by Luis Henriques almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

The kernel client doesn't respect quotas that are set in directories it doesn't see in the mount point. Here's a report from ceph-users mailing list:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-February/033357.html

The fuse client seems to be able to handle these scenarios correctly.

History

#1 Updated by Luis Henriques almost 3 years ago

After an initial analysis, it looks like this could be solved without any major drawbacks. The relevant code is in get_quota_realm:

    while (realm) {
        spin_lock(&realm->inodes_with_caps_lock);
        in = realm->inode ? igrab(realm->inode) : NULL;
        spin_unlock(&realm->inodes_with_caps_lock);
        if (!in)
            break;
    ...

The realm->inode may be NULL when we're not mounting the filesystem in the real CephFS root, and thus we don't see the realm inode.

Here's a possible solution: when realm->inode is NULL, we try to get that inode (sending a CEPH_MDS_OP_LOOKUPINO, something similar to what function __fh_to_dentry does). If we succeed, the realm needs to igrab() it (and iput() it again when the realm is deleted). A possible issue is that we need to drop snap_rwsem when getting the inode, otherwise we'll have a deadlock.

And that should be enough. Or almost... because the optimization we have with ceph_has_realms_with_quotas(inode) needs to be modified to only trust the info in mdsc if we're mounting the real CephFS root.

Does this make sense?

#2 Updated by Yanhu Cao almost 3 years ago

 0xffffffffc099eed0 : ceph_adjust_quota_realms_count+0x0/0x30 [ceph]
 0xffffffffc0980447 : fill_inode.isra.23+0x187/0xdf0 [ceph]
 0xffffffffc098139e : ceph_fill_trace+0x2ee/0x970 [ceph]
 0xffffffffc09a4a46 : handle_reply+0x4c6/0xc90 [ceph]
 0xffffffffc09a6d17 : dispatch+0x107/0xa60 [ceph]
 0xffffffffc0912f81 : try_read+0x801/0x11f0 [libceph]
 0xffffffffc0914848 : ceph_con_workfn+0xa8/0x5c0 [libceph]
 0xffffffffae4aeeb1 : process_one_work+0x171/0x370 [kernel]
 0xffffffffae4af5d9 : worker_thread+0x49/0x3f0 [kernel]
 0xffffffffae4b4b58 : kthread+0xf8/0x130 [kernel]
 0xffffffffaec00215 : ret_from_fork+0x35/0x40 [kernel]

When the kernel client mounts a quotas directory, it will call ceph_adjust_quota_realms_count to increase the quotarealms_count(default 0). if quotas subdir is mounted, it will not.
When writing data, judge whether there is quota according to quotarealms_count(call ceph_has_realms_with_quotas).

The ceph-fuse client determines if the parent directory has quotas,up to the root directory.

bool Client::is_quota_files_exceeded(Inode *in, const UserPerm& perms)
{
  return check_quota_condition(in, perms,
      [](const Inode &in) {
        return in.quota.max_files && in.rstat.rsize() >= in.quota.max_files;
      });
}

bool Client::check_quota_condition(Inode *in, const UserPerm& perms,
                   std::function<bool (const Inode &in)> test)
{
  while (true) {
    ceph_assert(in != NULL);
    if (test(*in)) {
      return true;
    }

    if (in == root_ancestor) {
      // We're done traversing, drop out
      return false;
    } else {
      // Continue up the tree
      in = get_quota_root(in, perms);
    }
  }

  return false;
}

The kernel has one more than ceph-fuse to determine if there is a quota based on quotarealms_count.

#3 Updated by Luis Henriques almost 3 years ago

You're absolutely right, Yanhu. But the quotarealms_count is just part of the story, because we actually need to be able to access the realms inodes that we don't see in our mountpoint.

I've just sent an RFC1 with a possible fix for this issue. It's still under test, but it should be enough to get some feedback to validate the approach.

[1] https://lkml.kernel.org/r/20190301175752.17808-1-lhenriques@suse.com

#4 Updated by Ilya Dryomov over 2 years ago

  • Status changed from New to Resolved
  • Assignee set to Luis Henriques

Merged into 5.2-rc1.

Also available in: Atom PDF