Bug #7747
closedmonthrash: thrasher keeps going even after a test fails
0%
Description
see teuthology-2014-03-14_19:00:49-rados-dumpling-testing-basic-plana/130941
the subsequent test failed, and the thrasher kept going forever instead of stopping.
Updated by Sage Weil about 10 years ago
this passes against master but fails against firefly. you can tell it hangs when you grep for SUCCESS in teuthology log and see that they test command has stopped running. usually it fails with a "rados got -2" or similar error.
roles: - - mon.a - mon.b - mon.c - mon.d - mon.e - mon.f - mon.g - mon.h - mon.i - osd.0 - osd.1 - osd.2 - mds.a - client.0 overrides: ceph: conf: mon: debug ms: 1 debug mon: 20 debug paxos: 20 client: debug ms: 1 debug objecter: 20 global: ms inject socket failures: 2500 ms inject delay type: mon ms inject delay probability: .1 ms inject delay max: 1 ms inject internal delays: .002 tasks: - chef: null - clock.check: null - install: branch: firefly - ceph: - mon_thrash: revive_delay: 90 thrash_delay: 1 thrash_many: true - exec: client.0: - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel - ceph_test_rados_delete_pools_parallel
Updated by Zack Cerza about 10 years ago
Found a case where we are completely ignoring raised exceptions. This commit will log them:
https://github.com/ceph/teuthology/commit/addfed2da8c736a18f251847bdbfd1de983255da
I don't know why we don't want to raise the exception, but I'm hoping this will shed light on the issue.
Updated by Zack Cerza about 10 years ago
- Status changed from New to Need More Info
- Assignee changed from Zack Cerza to Sage Weil
I've run the mentioned yaml a couple times and it's passed both times. Can you recommend a reliable reproducer?
Updated by Sage Weil over 9 years ago
- Status changed from Need More Info to Can't reproduce