Project

General

Profile

Bug #39987

mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps

Added by Zheng Yan 6 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature:

Description

An user reported a bug that mds couldn't finish freezing dirfrag. Cache dump includes following entries.

[inode 0x1000056a9ad [112,head] /hpc/home/neda/ffpy_test/ffpy/interp_test/smooth_z0/z0meso.nc auth v5478173 s=2319190 n(v0 rc2019-05-18 07:15:27.742396 b2319190 1=1+0) (iversion lock) caps={62472=pAsLsXsFscr/-@115} | ptrwaiter=0 request=0 lock=0 caps=1 truncating=0 needsnapflush=0 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x55c2475ed800]
...
[inode 0x1000056a9ad [10c,10d] /hpc/home/neda/ffpy_test/ffpy/interp_test/smooth_z0/z0meso.nc auth v5478065 s=2319190 n(v0 rc2019-05-17 19:36:50.430298 b2319190 1=1+0) (iversion lock) 0x55c24feeae00]
[inode 0x1000056a9ad [10e,111] /hpc/home/neda/ffpy_test/ffpy/interp_test/smooth_z0/z0meso.nc auth v5478171 ap=3+0 s=2319190 n(v0 rc2019-05-18 07:15:27.742396 b2319190 1=1+0)/n(v0 rc2019-05-17 19:36:50.430298 b2319190 1=1+0) (iauth snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) | ptrwaiter=0 request=0 lock=3 dirty=0 authpin=1 0x55c24c07d800]

It's likely that inode 0x1000056a9ad [10c,111] was COWed and head_in->split_need_snapflush contained no items in range [10e, 111]


Related issues

Duplicated by fs - Bug #42338: file system keeps on deadlocking with unresolved slow requests (failed to authpin, subtree is being exported) Duplicate 10/16/2019
Copied to fs - Backport #40444: mimic: mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps Resolved
Copied to fs - Backport #40445: nautilus: mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps Resolved

History

#1 Updated by Patrick Donnelly 6 months ago

  • Priority changed from Normal to High
  • Target version set to v15.0.0
  • Start date deleted (05/21/2019)
  • Source set to Community (user)
  • Component(FS) MDS added

#2 Updated by Zheng Yan 6 months ago

  • Status changed from New to Need Review
  • Pull request ID set to 28190

#3 Updated by Patrick Donnelly 5 months ago

  • Status changed from Need Review to Pending Backport
  • Assignee set to Zheng Yan

#4 Updated by Zheng Yan 5 months ago

  • Status changed from Pending Backport to Need Review

#6 Updated by Patrick Donnelly 5 months ago

  • Status changed from Need Review to Pending Backport

#7 Updated by Nathan Cutler 5 months ago

  • Copied to Backport #40444: mimic: mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps added

#8 Updated by Nathan Cutler 5 months ago

  • Copied to Backport #40445: nautilus: mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps added

#9 Updated by Nathan Cutler about 2 months ago

When there are multiple PRs fixing a single tracker, it's a good idea to "unset" (depopulate) the Pull request ID field. When that field is populated, it's easy for the backporter to miss the follow-on PR (PRs).

#10 Updated by Patrick Donnelly about 2 months ago

Nathan Cutler wrote:

When there are multiple PRs fixing a single tracker, it's a good idea to "unset" (depopulate) the Pull request ID field. When that field is populated, it's easy for the backporter to miss the follow-on PR (PRs).

FWIW, I don't blame you for this Nathan. I plan to go a step further and not permit a tracker ticket to go backwards like this, i.e. from PB back to NR. Instead, we'll create a new tracker ticket and note the issue in the broken backport ticket.

#11 Updated by Nathan Cutler about 2 months ago

I plan to go a step further and not permit a tracker ticket to go backwards like this, i.e. from PB back to NR. Instead, we'll create a new tracker ticket and note the issue in the broken backport ticket.

@Patrick Sometimes it's necessary to create backport issues in advance (before the issue enters PB status), and I had been doing this by setting PB temporarily. With the new, stricter workflow you just described (which I fully support BTW) it will no longer be possible to set PB "temporarily", so I added a --force option to backport-create-issue which will make it create backport issues regardless of the issue status. See https://github.com/ceph/ceph/pull/30571 for details.

#12 Updated by Nathan Cutler about 2 months ago

Oh, and one more thing: issues in Resolved status can be reverted to Need Review (or In Progress, or even New) as well.

#13 Updated by Patrick Donnelly about 2 months ago

Nathan Cutler wrote:

I plan to go a step further and not permit a tracker ticket to go backwards like this, i.e. from PB back to NR. Instead, we'll create a new tracker ticket and note the issue in the broken backport ticket.

@Patrick Sometimes it's necessary to create backport issues in advance (before the issue enters PB status), and I had been doing this by setting PB temporarily. With the new, stricter workflow you just described (which I fully support BTW) it will no longer be possible to set PB "temporarily", so I added a --force option to backport-create-issue which will make it create backport issues regardless of the issue status. See https://github.com/ceph/ceph/pull/30571 for details.

I wasn't planning a technical change to redmine to enforce this policy but if you find a way to do it I'd support it.

#14 Updated by Nathan Cutler about 1 month ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

#15 Updated by Patrick Donnelly 24 days ago

  • Duplicated by Bug #42338: file system keeps on deadlocking with unresolved slow requests (failed to authpin, subtree is being exported) added

Also available in: Atom PDF