Project

General

Profile

Actions

Bug #5463

closed

mds_thrasher: sometimes doesn't stop thrashing

Added by Greg Farnum almost 11 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

This results in hung tasks so I haven't been able to make sure this is an issue with the task rather than something in the cluster, but the teuthology log doesn't have any indications I've found. It looks like in some cases the thrasher simply never stops for the main teuthology thread to join it.

I've been told this exception shouldn't be a problem, but maybe it's causing the execution order to get a little broken? Or maybe it's something else entirely.

2013-06-25T22:40:57.061 INFO:teuthology.task.workunit:Deleted dir /home/ubuntu/cephtest/46383/mnt.0/client.0
2013-06-25T22:40:57.061 DEBUG:teuthology.orchestra.run:Running [10.214.132.11]: 'rmdir -- /home/ubuntu/cephtest/46383/mnt.0'
2013-06-25T22:40:57.076 INFO:teuthology.orchestra.run.err:rmdir: failed to remove `/home/ubuntu/cephtest/46383/mnt.0': Device or resource busy
2013-06-25T22:40:57.076 DEBUG:teuthology.task.workunit:Caught an execption deleting dir /home/ubuntu/cephtest/46383/mnt.0
2013-06-25T22:40:57.076 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x2448cd0>
2013-06-25T22:40:57.076 INFO:teuthology.task.ceph-fuse:Unmounting ceph-fuse clients...
2013-06-25T22:40:57.077 DEBUG:teuthology.orchestra.run:Running [10.214.132.11]: 'sudo fusermount -u /home/ubuntu/cephtest/46383/mnt.0'
2013-06-25T22:40:57.164 INFO:teuthology.task.ceph-fuse.ceph-fuse.0.err:ceph-fuse[4281]: fuse finished with error 0
2013-06-25T22:40:59.930 INFO:teuthology.task.mds_thrash.mds_thrasher.failure_group.[a, b-s-a]:reviving mds.a

(That's from /a/teuthology-2013-06-25_20:00:50-fs-cuttlefish-testing-basic/46383/teuthology.log)

Actions #1

Updated by Zack Cerza about 10 years ago

  • Status changed from New to Resolved

Workunits have a timeout now, so this ought to be fixed. If not, please reopen or file a new ticket.

Actions

Also available in: Atom PDF