Bug #39987: mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps - CephFS - Ceph

Actions

Copy link

Bug #39987

closed

mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps

Added by Zheng Yan almost 5 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

High

Assignee:

Zheng Yan

Category:

Target version:

Ceph - v15.0.0

% Done:

Source:

Community (user)

Tags:

Backport:

nautilus,mimic

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

28190

Crash signature (v1):

Crash signature (v2):

Description

An user reported a bug that mds couldn't finish freezing dirfrag. Cache dump includes following entries.

[inode 0x1000056a9ad [112,head] /hpc/home/neda/ffpy_test/ffpy/interp_test/smooth_z0/z0meso.nc auth v5478173 s=2319190 n(v0 rc2019-05-18 07:15:27.742396 b2319190 1=1+0) (iversion lock) caps={62472=pAsLsXsFscr/-@115} | ptrwaiter=0 request=0 lock=0 caps=1 truncating=0 needsnapflush=0 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x55c2475ed800]
...
[inode 0x1000056a9ad [10c,10d] /hpc/home/neda/ffpy_test/ffpy/interp_test/smooth_z0/z0meso.nc auth v5478065 s=2319190 n(v0 rc2019-05-17 19:36:50.430298 b2319190 1=1+0) (iversion lock) 0x55c24feeae00]
[inode 0x1000056a9ad [10e,111] /hpc/home/neda/ffpy_test/ffpy/interp_test/smooth_z0/z0meso.nc auth v5478171 ap=3+0 s=2319190 n(v0 rc2019-05-18 07:15:27.742396 b2319190 1=1+0)/n(v0 rc2019-05-17 19:36:50.430298 b2319190 1=1+0) (iauth snap->sync w=1) (ifile snap->sync w=1) (ixattr snap->sync w=1) (iversion lock) | ptrwaiter=0 request=0 lock=3 dirty=0 authpin=1 0x55c24c07d800]

It's likely that inode 0x1000056a9ad [10c,111] was COWed and head_in->split_need_snapflush contained no items in range [10e, 111]

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Patrick Donnelly almost 5 years ago

Priority changed from Normal to High
Target version set to v15.0.0
Start date deleted (~~05/21/2019~~)
Source set to Community (user)
Component(FS) MDS added

Actions

Copy link

Updated by Zheng Yan almost 5 years ago

Status changed from New to Fix Under Review
Pull request ID set to 28190

Actions

Copy link

Updated by Patrick Donnelly almost 5 years ago

Status changed from Fix Under Review to Pending Backport
Assignee set to Zheng Yan

Actions

Copy link

Updated by Zheng Yan almost 5 years ago

Status changed from Pending Backport to Fix Under Review

Actions

Copy link

Updated by Zheng Yan almost 5 years ago

https://github.com/ceph/ceph/pull/28190 is incomplete

https://github.com/ceph/ceph/pull/28459

Actions

Copy link

Updated by Patrick Donnelly almost 5 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Nathan Cutler almost 5 years ago

Copied to Backport #40444: mimic: mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps added

Actions

Copy link

Updated by Nathan Cutler almost 5 years ago

Copied to Backport #40445: nautilus: mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps added

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

When there are multiple PRs fixing a single tracker, it's a good idea to "unset" (depopulate) the Pull request ID field. When that field is populated, it's easy for the backporter to miss the follow-on PR (PRs).

Actions

Copy link

#10

Updated by Patrick Donnelly over 4 years ago

Nathan Cutler wrote:

When there are multiple PRs fixing a single tracker, it's a good idea to "unset" (depopulate) the Pull request ID field. When that field is populated, it's easy for the backporter to miss the follow-on PR (PRs).

FWIW, I don't blame you for this Nathan. I plan to go a step further and not permit a tracker ticket to go backwards like this, i.e. from PB back to NR. Instead, we'll create a new tracker ticket and note the issue in the broken backport ticket.

Actions

Copy link

#11

Updated by Nathan Cutler over 4 years ago

I plan to go a step further and not permit a tracker ticket to go backwards like this, i.e. from PB back to NR. Instead, we'll create a new tracker ticket and note the issue in the broken backport ticket.

@Patrick Sometimes it's necessary to create backport issues in advance (before the issue enters PB status), and I had been doing this by setting PB temporarily. With the new, stricter workflow you just described (which I fully support BTW) it will no longer be possible to set PB "temporarily", so I added a --force option to backport-create-issue which will make it create backport issues regardless of the issue status. See https://github.com/ceph/ceph/pull/30571 for details.

Actions

Copy link

#12

Updated by Nathan Cutler over 4 years ago

Oh, and one more thing: issues in Resolved status can be reverted to Need Review (or In Progress, or even New) as well.

Actions

Copy link

#13

Updated by Patrick Donnelly over 4 years ago

Nathan Cutler wrote:

I plan to go a step further and not permit a tracker ticket to go backwards like this, i.e. from PB back to NR. Instead, we'll create a new tracker ticket and note the issue in the broken backport ticket.

@Patrick Sometimes it's necessary to create backport issues in advance (before the issue enters PB status), and I had been doing this by setting PB temporarily. With the new, stricter workflow you just described (which I fully support BTW) it will no longer be possible to set PB "temporarily", so I added a --force option to backport-create-issue which will make it create backport issues regardless of the issue status. See https://github.com/ceph/ceph/pull/30571 for details.

I wasn't planning a technical change to redmine to enforce this policy but if you find a way to do it I'd support it.

Actions

Copy link

#14

Updated by Nathan Cutler over 4 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

#15

Updated by Patrick Donnelly over 4 years ago

Has duplicate Bug #42338: file system keeps on deadlocking with unresolved slow requests (failed to authpin, subtree is being exported) added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #39987

mds: MDCache::cow_inode does not cleanup unneeded client_snap_caps

Updated by Patrick Donnelly almost 5 years ago

Updated by Zheng Yan almost 5 years ago

Updated by Patrick Donnelly almost 5 years ago

Updated by Zheng Yan almost 5 years ago

Updated by Zheng Yan almost 5 years ago

Updated by Patrick Donnelly almost 5 years ago

Updated by Nathan Cutler almost 5 years ago

Updated by Nathan Cutler almost 5 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Patrick Donnelly over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Patrick Donnelly over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Patrick Donnelly over 4 years ago