Project

General

Profile

Actions

Bug #39987

closed

mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps

Added by Zheng Yan almost 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

An user reported a bug that mds couldn't finish freezing dirfrag. Cache dump includes following entries.

[inode 0x1000056a9ad [112,head] /hpc/home/neda/ffpy_test/ffpy/interp_test/smooth_z0/z0meso.nc auth v5478173 s=2319190 n(v0 rc2019-05-18 07:15:27.742396 b2319190 1=1+0) (iversion lock) caps={62472=pAsLsXsFscr/-@115} | ptrwaiter=0 request=0 lock=0 caps=1 truncating=0 needsnapflush=0 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x55c2475ed800]
...
[inode 0x1000056a9ad [10c,10d] /hpc/home/neda/ffpy_test/ffpy/interp_test/smooth_z0/z0meso.nc auth v5478065 s=2319190 n(v0 rc2019-05-17 19:36:50.430298 b2319190 1=1+0) (iversion lock) 0x55c24feeae00]
[inode 0x1000056a9ad [10e,111] /hpc/home/neda/ffpy_test/ffpy/interp_test/smooth_z0/z0meso.nc auth v5478171 ap=3+0 s=2319190 n(v0 rc2019-05-18 07:15:27.742396 b2319190 1=1+0)/n(v0 rc2019-05-17 19:36:50.430298 b2319190 1=1+0) (iauth snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) | ptrwaiter=0 request=0 lock=3 dirty=0 authpin=1 0x55c24c07d800]

It's likely that inode 0x1000056a9ad [10c,111] was COWed and head_in->split_need_snapflush contained no items in range [10e, 111]


Related issues 3 (0 open3 closed)

Has duplicate CephFS - Bug #42338: file system keeps on deadlocking with unresolved slow requests (failed to authpin, subtree is being exported)Duplicate10/16/2019

Actions
Copied to CephFS - Backport #40444: mimic: mds: MDCache::cow_inode does not cleanup unneeded client_snap_capsResolvedNathan CutlerActions
Copied to CephFS - Backport #40445: nautilus: mds: MDCache::cow_inode does not cleanup unneeded client_snap_capsResolvedPrashant DActions
Actions #1

Updated by Patrick Donnelly almost 5 years ago

  • Priority changed from Normal to High
  • Target version set to v15.0.0
  • Start date deleted (05/21/2019)
  • Source set to Community (user)
  • Component(FS) MDS added
Actions #2

Updated by Zheng Yan almost 5 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 28190
Actions #3

Updated by Patrick Donnelly almost 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Assignee set to Zheng Yan
Actions #4

Updated by Zheng Yan almost 5 years ago

  • Status changed from Pending Backport to Fix Under Review
Actions #6

Updated by Patrick Donnelly almost 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #40444: mimic: mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps added
Actions #8

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #40445: nautilus: mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps added
Actions #9

Updated by Nathan Cutler over 4 years ago

When there are multiple PRs fixing a single tracker, it's a good idea to "unset" (depopulate) the Pull request ID field. When that field is populated, it's easy for the backporter to miss the follow-on PR (PRs).

Actions #10

Updated by Patrick Donnelly over 4 years ago

Nathan Cutler wrote:

When there are multiple PRs fixing a single tracker, it's a good idea to "unset" (depopulate) the Pull request ID field. When that field is populated, it's easy for the backporter to miss the follow-on PR (PRs).

FWIW, I don't blame you for this Nathan. I plan to go a step further and not permit a tracker ticket to go backwards like this, i.e. from PB back to NR. Instead, we'll create a new tracker ticket and note the issue in the broken backport ticket.

Actions #11

Updated by Nathan Cutler over 4 years ago

I plan to go a step further and not permit a tracker ticket to go backwards like this, i.e. from PB back to NR. Instead, we'll create a new tracker ticket and note the issue in the broken backport ticket.

@Patrick Sometimes it's necessary to create backport issues in advance (before the issue enters PB status), and I had been doing this by setting PB temporarily. With the new, stricter workflow you just described (which I fully support BTW) it will no longer be possible to set PB "temporarily", so I added a --force option to backport-create-issue which will make it create backport issues regardless of the issue status. See https://github.com/ceph/ceph/pull/30571 for details.

Actions #12

Updated by Nathan Cutler over 4 years ago

Oh, and one more thing: issues in Resolved status can be reverted to Need Review (or In Progress, or even New) as well.

Actions #13

Updated by Patrick Donnelly over 4 years ago

Nathan Cutler wrote:

I plan to go a step further and not permit a tracker ticket to go backwards like this, i.e. from PB back to NR. Instead, we'll create a new tracker ticket and note the issue in the broken backport ticket.

@Patrick Sometimes it's necessary to create backport issues in advance (before the issue enters PB status), and I had been doing this by setting PB temporarily. With the new, stricter workflow you just described (which I fully support BTW) it will no longer be possible to set PB "temporarily", so I added a --force option to backport-create-issue which will make it create backport issues regardless of the issue status. See https://github.com/ceph/ceph/pull/30571 for details.

I wasn't planning a technical change to redmine to enforce this policy but if you find a way to do it I'd support it.

Actions #14

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #15

Updated by Patrick Donnelly over 4 years ago

  • Has duplicate Bug #42338: file system keeps on deadlocking with unresolved slow requests (failed to authpin, subtree is being exported) added
Actions

Also available in: Atom PDF