Project

General

Profile

Actions

Bug #41759

closed

mgr/volumes: test_async_subvolume_rm fails since purge threads did not cleanup trash directory

Added by Venky Shankar over 4 years ago. Updated about 4 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Patrick saw this recently here: http://qa-proxy.ceph.com/teuthology/pdonnell-2019-09-11_00:33:51-fs-wip-pdonnell-testing-20190910.212946-distro-basic-smithi/4297026/teuthology.log

From MDS log:

2019-09-11T03:06:23.745+0000 7fd8a3f61700  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.4673:7 rename #0x10000000013/aa40b188-8269-40fd-a029-0909c6ba664a #0x10000000001/subvolume_2815 2019-09-11T03:06:23.740171+0000 caller_uid=0, caller_gid=0{}) v4
2019-09-11T03:06:23.749+0000 7fd8a3f61700  7 mds.0.locker local_wrlock_finish  on (dversion lock w=1 last_client=4673) on [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dn xlock x=1 by 0x55bffd4f6c00) (dversion lock w=1 last_client=4673) v=4129 ap=2 ino=0x10000000002 state=1610612737|new | request=1 lock=2 inodepin=1 dirty=1 authpin=1 0x55bffdc172c0]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.locker xlock_finish on (dn xlock x=1 by 0x55bffd4f6c00) [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dn xlock x=1 by 0x55bffd4f6c00) (dversion lock) v=4129 ap=2 ino=0x10000000002 state=1610612737|new | request=1 lock=1 inodepin=1 dirty=1 authpin=1 0x55bffdc172c0]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.locker eval_gather (dn xlock) on [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dn xlock) (dversion lock) v=4129 ap=2 ino=0x10000000002 state=1610612737|new | request=1 lock=0 inodepin=1 dirty=1 authpin=1 0x55bffdc172c0]
2019-09-11T03:06:23.749+0000 7fd8a3f61700  7 mds.0.locker eval_gather finished gather on (dn xlock) on [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dn xlock) (dversion lock) v=4129 ap=2 ino=0x10000000002 state=1610612737|new | request=1 lock=0 inodepin=1 dirty=1 authpin=1 0x55bffdc172c0]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.cache.den(0x10000000013 aa40b188-8269-40fd-a029-0909c6ba664a) auth_unpin by 0x55bffdc17380 on [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dversion lock) v=4129 ap=1 ino=0x10000000002 state=1610612737|new | request=1 lock=0 inodepin=1 dirty=1 authpin=1 0x55bffdc172c0] now 1
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.locker simple_eval (dn sync) on [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dversion lock) v=4129 ap=1 ino=0x10000000002 state=1610612737|new | request=1 lock=0 inodepin=1 dirty=1 authpin=1 0x55bffdc172c0]
2019-09-11T03:06:23.749+0000 7fd8a3f61700  7 mds.0.locker local_wrlock_finish  on (iversion lock w=1 last_client=4673) on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=3 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink xlockdone x=1) (isnap xlockdone x=1) (inest lock) (ifile excl) (iversion lock w=1 last_client=4673) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=3 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.locker xlock_finish on (isnap xlockdone x=1) [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=3 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink xlockdone x=1) (isnap xlockdone x=1) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=2 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.locker eval_gather (isnap xlockdone) on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=3 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink xlockdone x=1) (isnap xlockdone) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=1 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700  7 mds.0.locker eval_gather finished gather on (isnap xlockdone) on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=3 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink xlockdone x=1) (isnap xlockdone) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=1 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.cache.ino(0x10000000002) auth_unpin by 0x55bffd452f00 on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=2 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink xlockdone x=1) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=1 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00] now 2
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.locker simple_eval (isnap sync) on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=2 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink xlockdone x=1) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=1 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.locker xlock_finish on (ilink xlockdone x=1) [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=2 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink xlockdone x=1) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=1 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.cache.ino(0x10000000002) auth_unpin by 0x55bffd452e50 on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=1 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink excl) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=0 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00] now 1
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.locker simple_eval (ilink excl) on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=1 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink excl) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=0 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700  7 mds.0.locker simple_eval stable, syncing (ilink excl) on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=1 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink excl) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=0 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700  7 mds.0.locker simple_sync on (ilink excl) on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=1 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (ilink excl) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=0 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 12 mds.0.server including    dn [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dversion lock) v=4129 ap=1 ino=0x10000000002 state=1610612737|new | request=1 lock=0 inodepin=1 dirty=1 authpin=1 0x55bffdc172c0]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 20 mds.0.locker issue_client_lease no/null lease on [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dversion lock) v=4129 ap=1 ino=0x10000000002 state=1610612737|new | request=1 lock=0 inodepin=1 dirty=1 authpin=1 0x55bffdc172c0]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 12 mds.0.server including inode [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=1 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@2},l=4673 | request=1 lock=0 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700  7 mds.0.locker issue_caps loner client.4673 allowed=pAsLsXsFsxcrwb, xlocker allowed=pAsLsXsFsxcrwb, others allowed=pAsLsXs on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 ap=1 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@3},l=4673 | request=1 lock=0 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55bffd452a00]
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.cache.ino(0x10000000002) auth_unpin by 0x55bffd4f6c00 on [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@3},l=4673 | request=1 lock=0 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x55bffd452a00] now 0
2019-09-11T03:06:23.749+0000 7fd8a3f61700 10 mds.0.cache.den(0x10000000013 aa40b188-8269-40fd-a029-0909c6ba664a) auth_unpin by 0x55bffd4f6c00 on [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dversion lock) v=4129 ino=0x10000000002 state=1610612737|new | request=1 lock=0 inodepin=1 dirty=1 authpin=0 0x55bffdc172c0] now 0
2019-09-11T03:06:23.749+0000 7fd8aa76e700 12 mds.0.server including    dn [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dversion lock) v=4129 ino=0x10000000002 state=1610612737|new | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55bffdc172c0]
2019-09-11T03:06:23.749+0000 7fd8aa76e700 20 mds.0.locker issue_client_lease no/null lease on [dentry #0x1/volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a [2,head] auth (dversion lock) v=4129 ino=0x10000000002 state=1610612737|new | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55bffdc172c0]
2019-09-11T03:06:23.749+0000 7fd8aa76e700 12 mds.0.server including inode [inode 0x10000000002 [...2,head] /volumes/_deleting/aa40b188-8269-40fd-a029-0909c6ba664a/ auth fragtree_t(*^6) v4129 dirtyparent f(v0 m2019-09-11T03:06:17.067189+0000 1024=1024+0) n(v0 rc2019-09-11T03:06:23.740171+0000 b1073741824 1025=1024+1) (inest lock) (ifile excl) (iversion lock) caps={4673=pAsLsXsFsx/-@3},l=4673 | request=0 lock=0 dirfrag=64 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=0 0x55bffd452a00]

`aa40b188-8269-40fd-a029-0909c6ba664a` is the trash directory that's being removed by a purge thread.

When fixing this it seems that we need more debugability w/ purge threads. This could be as simple as logging queue of pending purges before starting a new purge or an interface to query ongoing and pending purges (even if the purge queue encounter an exception and bailout, we could still query pending purges). As simple as logging thread name/id in purge thread would help in following thread execution. Changing log level to log a bit more frequently would help too (candidate: rmtree() in src/pybing/mgr/volumes/fs/subvolume.py).

Actions #1

Updated by Venky Shankar over 4 years ago

As seen from the MDS log, there are no filesystem ops after the rename ack to the client. This hints that the purge thread hit an exception of some kind (but not related to a filesystem operation).

Actions #2

Updated by Patrick Donnelly over 4 years ago

2019-09-11T03:06:23.922 INFO:tasks.ceph.mgr.y.smithi073.stderr:Exception in thread purgejob.3:
2019-09-11T03:06:23.922 INFO:tasks.ceph.mgr.y.smithi073.stderr:Traceback (most recent call last):
2019-09-11T03:06:23.922 INFO:tasks.ceph.mgr.y.smithi073.stderr:  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
2019-09-11T03:06:23.922 INFO:tasks.ceph.mgr.y.smithi073.stderr:    self.run()
2019-09-11T03:06:23.922 INFO:tasks.ceph.mgr.y.smithi073.stderr:  File "/usr/share/ceph/mgr/volumes/fs/purge_queue.py", line 24, in run
2019-09-11T03:06:23.923 INFO:tasks.ceph.mgr.y.smithi073.stderr:    trace = "".join(traceback.format_exception(None, e, e.__traceback__))
2019-09-11T03:06:23.923 INFO:tasks.ceph.mgr.y.smithi073.stderr:AttributeError: 'exceptions.TypeError' object has no attribute '__traceback__'

From the teuthology log.

Actions #3

Updated by Venky Shankar over 4 years ago

Patrick Donnelly wrote:

[...]

From the teuthology log.

yeh -- that masks the logging of the actual traceback. https://github.com/ceph/ceph/pull/29735 includes fix for that.

Actions #4

Updated by Patrick Donnelly over 4 years ago

  • Assignee set to Venky Shankar
Actions #5

Updated by Patrick Donnelly about 4 years ago

Venky, is this still a problem?

Actions #6

Updated by Venky Shankar about 4 years ago

  • Status changed from New to Can't reproduce

Patrick Donnelly wrote:

Venky, is this still a problem?

haven't seen it lately. moving to "can't reproduce"

please reopen then this is seen again.

Actions

Also available in: Atom PDF