Actions
Bug #3895
closedlibrados test hang during mon thrashing
Status:
Resolved
Priority:
Urgent
Assignee:
Joao Eduardo Luis
Category:
Monitor
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-01-21_19:00:03-regression-master-testing-gcov/2929
job was
kernel: kdb: true sha1: e0b49868d3629708eda593b6739cb78f33ab238a nuke-on-error: true overrides: ceph: conf: global: ms inject socket failures: 5000 coverage: true fs: btrfs log-whitelist: - slow request sha1: 3399860de2724281ee024b52f461b60f769ee0ee s3tests: branch: master workunit: sha1: 3399860de2724281ee024b52f461b60f769ee0ee roles: - - mon.a - mon.b - osd.0 - osd.1 - osd.2 - - mon.c - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: null - mon_thrash: revive_delay: 20 thrash_delay: 1 - ceph-fuse: null - workunit: clients: client.0: - rados/test.sh
and test is hung on
2013-01-21T20:42:38.430 INFO:teuthology.task.workunit.client.0.out:[ RUN ] LibRadosAio.IsSafePP
note that i also saw an ENOENT on a just-created pool the other day, too, so there are probably several similar bugs (or hopefully, the same pattern of bug) triggered by the mon thrashing.
yay testing!
Files
Actions
#1
Updated by Sam Lang about 11 years ago
- File teuthology-2.log teuthology-2.log added
- File teuthology.log teuthology.log added
Attached log files for this from hung runs (librados and kernel untar).
Actions
#2
Updated by Sam Lang about 11 years ago
Attached mon logs from a recent run after the rados test seemed to hang for a big (100 mon elections or so). The logs are with debug mon = 20, debug ms = 1
Updated by Sage Weil about 11 years ago
- Status changed from 12 to Fix Under Review
- Priority changed from High to Urgent
tracked this down; see wip-mon-eagain
qa run against rados api tests seems to confirm that this fixes it (previously was easily reproduced)
Updated by Sage Weil about 11 years ago
- Status changed from Fix Under Review to Resolved
Updated by Sage Weil about 11 years ago
Actions