Bug #48203: qa: quota failure - CephFS - Ceph

Actions

Copy link

Bug #48203

closed

qa: quota failure

Added by Patrick Donnelly over 3 years ago. Updated about 2 years ago.

Status:

Resolved

Priority:

High

Assignee:

Luis Henriques

Category:

Target version:

Ceph - v16.0.0

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Client, kceph, qa-suite

Labels (FS):

Pull request ID:

38112

Crash signature (v1):

Crash signature (v2):

Description

2020-11-04T18:03:28.550 INFO:tasks.workunit.client.1.smithi096.stderr:+ mv files limit/
2020-11-04T18:03:28.559 INFO:tasks.workunit.client.1.smithi096.stderr:+ return 1
2020-11-04T18:03:28.561 DEBUG:teuthology.orchestra.run:got remote process result: 1
...
2020-11-04T18:03:29.472 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 90, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 69, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20201103.210407/qa/tasks/workunit.py", line 147, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20201103.210407/qa/tasks/workunit.py", line 297, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20201103.210407/qa/tasks/workunit.py", line 425, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 215, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed (workunit test fs/quota/quota.sh) on smithi096 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=d2e2c4f1d55b90d4d72fec898522c82d26aa11c4 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="1" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.1 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.1/qa/workunits/fs/quota/quota.sh'

From: /ceph/teuthology-archive/pdonnell-2020-11-04_17:39:34-fs-wip-pdonnell-testing-20201103.210407-distro-basic-smithi/5590496/teuthology.log

See also discussion here: https://tracker.ceph.com/issues/36593#note-13

Actions

Copy link

Updated by Patrick Donnelly over 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Patrick Donnelly over 3 years ago

In response to: https://tracker.ceph.com/issues/36593#note-14

Yes, there is not an easy solution here. I guess we have to transmit the truncate as soon as it occurs.

Actions

Copy link

Updated by Luis Henriques over 3 years ago

After discussing this with Jeff on the mailing-list¹ we agreed that the best thing to do is to simply revert to returning -EXDEV when doing cross quotarealms renames. Any other solution would have a big performance impact. This effectively means reverting commit dffdcd71458e ("ceph: allow rename operation under different quota realms").

[1] https://www.spinics.net/lists/ceph-devel/msg49850.html

Actions

Copy link

Updated by Patrick Donnelly over 3 years ago

Luis Henriques wrote:

After discussing this with Jeff on the mailing-list¹ we agreed that the best thing to do is to simply revert to returning -EXDEV when doing cross quotarealms renames. Any other solution would have a big performance impact. This effectively means reverting commit dffdcd71458e ("ceph: allow rename operation under different quota realms").

[1] https://www.spinics.net/lists/ceph-devel/msg49850.html

Also need to revert the userspace change too. That was done for issue #39715.

To avoid forgetting why this is not a good idea, let's also add some good comments about why we don't allow cross-quota renames.

Actions

Copy link

Updated by Luis Henriques over 3 years ago

Patrick Donnelly wrote:

Luis Henriques wrote:

After discussing this with Jeff on the mailing-list¹ we agreed that the best thing to do is to simply revert to returning -EXDEV when doing cross quotarealms renames. Any other solution would have a big performance impact. This effectively means reverting commit dffdcd71458e ("ceph: allow rename operation under different quota realms").

[1] https://www.spinics.net/lists/ceph-devel/msg49850.html

Also need to revert the userspace change too. That was done for issue #39715.

I can submit the revert for the fuse client, but I wonder if that's really necessary (other than for consistency between the two clients, of course). AFAICS the fuse client does not have the same problem because the truncate operations are sync. Or did you saw a similar failure?

To avoid forgetting why this is not a good idea, let's also add some good comments about why we don't allow cross-quota renames.

The kernel client revert is currently queued on the testing branch, and I tried to describe there the problem with the commit.

Actions

Copy link

Updated by Patrick Donnelly over 3 years ago

Luis Henriques wrote:

Patrick Donnelly wrote:

Luis Henriques wrote:

After discussing this with Jeff on the mailing-list¹ we agreed that the best thing to do is to simply revert to returning -EXDEV when doing cross quotarealms renames. Any other solution would have a big performance impact. This effectively means reverting commit dffdcd71458e ("ceph: allow rename operation under different quota realms").

[1] https://www.spinics.net/lists/ceph-devel/msg49850.html

Also need to revert the userspace change too. That was done for issue #39715.

I can submit the revert for the fuse client, but I wonder if that's really necessary (other than for consistency between the two clients, of course). AFAICS the fuse client does not have the same problem because the truncate operations are sync. Or did you saw a similar failure?

The issue is not confined to a single client's consistent view of the quotas. We could have another client truncating files during the rename.

To avoid forgetting why this is not a good idea, let's also add some good comments about why we don't allow cross-quota renames.

The kernel client revert is currently queued on the testing branch, and I tried to describe there the problem with the commit.

Thanks!

Actions

Copy link

Updated by Luis Henriques over 3 years ago

Pull request ID set to 38112

Created pull-request https://github.com/ceph/ceph/pull/38112 that simply reverts the fuse-client commit b8954e5734b3 ("client: optimize rename operation under different quota root").

Actions

Copy link

Updated by Patrick Donnelly over 3 years ago

Status changed from New to Resolved
Component(FS) Client added

Actions

Copy link

Updated by Patrick Donnelly over 3 years ago

Status changed from Resolved to Need More Info

Hey Luis, I think this is still broken; the revert didn't work: https://pulpito.ceph.com/teuthology-2021-01-03_03:15:02-fs-master-distro-basic-smithi/5751916/

Can you take a look?

Actions

Copy link

#10

Updated by Luis Henriques over 3 years ago

Patrick Donnelly wrote:

Hey Luis, I think this is still broken; the revert didn't work: https://pulpito.ceph.com/teuthology-2021-01-03_03:15:02-fs-master-distro-basic-smithi/5751916/

Can you take a look?

Hmm... It looks like a distro kernel (kernel-4.18.0-240.1.1.el8_3.x86_64) is being used. Maybe the fix hasn't been backported to it? Or am I not looking at it right?

Actions

Copy link

#11

Updated by Patrick Donnelly over 3 years ago

Status changed from Need More Info to Resolved

Luis Henriques wrote:

Patrick Donnelly wrote:

Hey Luis, I think this is still broken; the revert didn't work: https://pulpito.ceph.com/teuthology-2021-01-03_03:15:02-fs-master-distro-basic-smithi/5751916/

Can you take a look?

Hmm... It looks like a distro kernel (kernel-4.18.0-240.1.1.el8_3.x86_64) is being used. Maybe the fix hasn't been backported to it? Or am I not looking at it right?

Ah, good catch. That's why. Thanks for looking!

Actions

Copy link

#12

Updated by Frank Schilder about 2 years ago

Dear all, I went through the discussion about the failed test and would like to ask you to consider an alternative solution to this problem:

Please introduce a mount option "fast_move" (or similar) to re-enable the code path with a proper move instead of falling back to cp+rm, and to document the implications of this fast_move option in such a way that operators can either prevent it or be prepared for accidents.

To be honest, the "corner case" that was discovered is certainly of academic interest and I understand the desire to have a file system pass all sorts of tests. However, there is also the practical side that some operators face a situation where users sometimes need to move large amounts of data (I talk hundreds of TB) across quota realms and the "fix" to remove quotas temporarily for this purpose does not really work.

"Moving" a large amount of data with "cp+rm" is really harmful. Several hundred TB usually require an amount of time that contains a service window and users will end up with interrupted operations even though a move is expected to be an atomic operation.

I would like to add to this that on our system (I would actually assume on any properly configured system) an ordinary user cannot even execute a truncate+mv in the way it is executed in the fs test suite. Root of course can, but root always has the power to screw up and also fix a system again. I don't think this is enough reason to disallow fast moves for everybody.

In my specific situation, I need to deploy a server where users can see a higher level view of our file systems (a server where /home and another directory do not show up as separate file systems, but as subdirectories of the same file system) and users can execute moves between different quota realms within this file system. On this occasion I would need an option like "fast_move" for the mount on this machine.

In general, whenever a decision between "clean theory" and "real life of operators" implies such a conflicting outcome, I would prefer if an implementation option similar to an additional mount flag is chosen, with a safe default and performance improved expert settings with known and documented risks involved.

At this very moment I'm exceedingly happy that I still have a server with a client that still does the fast moves. However, I need to update and really really need the fast move functionality. We are migrating a huge amount of data and this "cp+rm" default is a serious bottleneck, if not a real danger.

Thank you very much for your considerations.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #48203

qa: quota failure

Updated by Patrick Donnelly over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Luis Henriques over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Luis Henriques over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Luis Henriques over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Luis Henriques over 3 years ago

Updated by Patrick Donnelly over 3 years ago

Updated by Frank Schilder about 2 years ago