Project

General

Profile

Documentation #61902

Recommend pinning _deleting directory to another rank for certain use-cases

Added by Patrick Donnelly 8 months ago. Updated 3 months ago.

Status:
New
Priority:
High
Assignee:
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Tags:
Backport:
reef,quincy
Reviewed:
Affected Versions:
Labels (FS):
task(easy), task(intern)
Pull request ID:

Description

The _deleting directory can often get sudden large volumes to recursively unlink. Rank 0 is not an ideal default target for this extra workload. We should just select rank 1 by default but introduce a configuration to alter that setting.

If max_mds==1, of course, the _deleting directory still stays on rank 0.

As an add-on to this, add a file to the directory which persists (i.e. not deleted by the async deleter threads) so that the directory is not exported back to rank 0 whenever it is empty.

History

#1 Updated by Venky Shankar 8 months ago

Patrick Donnelly wrote:

The _deleting directory can often get sudden large volumes to recursively unlink. Rank 0 is not an ideal default target for this extra workload. We should just select rank 1 by default but introduce a configuration to alter that setting.

If max_mds==1, of course, the _deleting directory still stays on rank 0.

As an add-on to this, add a file to the directory which persists (i.e. not deleted by the async deleter threads) so that the directory is not exported back to rank 0 whenever it is empty.

Is that how the balancer works? I thought the dummy entry is required for the balancer to kick exporting without which the balancer would ignore the pin. I didn't know that the balancer would re-export to rank-0 (from rank-N) if a directory becomes empty irrespective of the pin.

#2 Updated by Venky Shankar 8 months ago

Also, I think there is a catch to this feature. Commit aae7a70ed2cf9c32684cfdaf701778a05f229e09 introduces per subvolume trash directory with the /volumes/_deleting/ having a symlink entry to point to the deleted subvolume. The reason for this was that the subvolume is marked with ceph.dir.subvolume xattr which disables nested snapshots and renames across subvolumes.

#3 Updated by Patrick Donnelly 8 months ago

Venky Shankar wrote:

I didn't know that the balancer would re-export to rank-0 (from rank-N) if a directory becomes empty irrespective of the pin.

Yes, it will. This is part of the merging subtree logic.

#4 Updated by Patrick Donnelly 8 months ago

Venky Shankar wrote:

Also, I think there is a catch to this feature. Commit aae7a70ed2cf9c32684cfdaf701778a05f229e09 introduces per subvolume trash directory with the /volumes/_deleting/ having a symlink entry to point to the deleted subvolume. The reason for this was that the subvolume is marked with ceph.dir.subvolume xattr which disables nested snapshots and renames across subvolumes.

Ah, good point. In that case, this feature may be undesirable as this will work poorly with distributed ephemeral pins on the subvolume group.

#5 Updated by Venky Shankar 7 months ago

  • Tracker changed from Bug to Documentation
  • Subject changed from pybind/mgr/volumes: pin _deleting directory to Recommend pinning _deleting directory to another rank for certain use-cases
  • Priority changed from High to Normal

Patrick Donnelly wrote:

Venky Shankar wrote:

Also, I think there is a catch to this feature. Commit aae7a70ed2cf9c32684cfdaf701778a05f229e09 introduces per subvolume trash directory with the /volumes/_deleting/ having a symlink entry to point to the deleted subvolume. The reason for this was that the subvolume is marked with ceph.dir.subvolume xattr which disables nested snapshots and renames across subvolumes.

Ah, good point. In that case, this feature may be undesirable as this will work poorly with distributed ephemeral pins on the subvolume group.

Yeh. That would require inspecting _deleting for possible symlinks or directories which can change over time depending on the subvolume version being used. We can possibly document this in our docs under some recommended settings section and pull it downstream too.

#6 Updated by Patrick Donnelly 4 months ago

  • Priority changed from Normal to High

#7 Updated by Venky Shankar 3 months ago

  • Category set to Performance/Resource Usage
  • Assignee changed from Patrick Donnelly to Rishabh Dave

#8 Updated by Venky Shankar 3 months ago

  • Backport changed from reef,quincy,pacific to reef,quincy

Also available in: Atom PDF