Project

General

Profile

Actions

Bug #7747

closed

monthrash: thrasher keeps going even after a test fails

Added by Sage Weil about 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

see teuthology-2014-03-14_19:00:49-rados-dumpling-testing-basic-plana/130941

the subsequent test failed, and the thrasher kept going forever instead of stopping.

Actions #1

Updated by Sage Weil about 10 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Sage Weil about 10 years ago

  • Assignee set to Zack Cerza
Actions #3

Updated by Sage Weil about 10 years ago

this passes against master but fails against firefly. you can tell it hangs when you grep for SUCCESS in teuthology log and see that they test command has stopped running. usually it fails with a "rados got -2" or similar error.

roles:
- - mon.a
  - mon.b
  - mon.c
  - mon.d
  - mon.e
  - mon.f
  - mon.g
  - mon.h
  - mon.i
  - osd.0
  - osd.1
  - osd.2
  - mds.a
  - client.0
overrides:
  ceph:
    conf:
      mon:
        debug ms: 1
        debug mon: 20
        debug paxos: 20
      client:
        debug ms: 1
        debug objecter: 20
      global:
        ms inject socket failures: 2500
        ms inject delay type: mon
        ms inject delay probability: .1
        ms inject delay max: 1
        ms inject internal delays: .002
tasks:
- chef: null
- clock.check: null
- install:
    branch: firefly
- ceph:
- mon_thrash:
    revive_delay: 90
    thrash_delay: 1
    thrash_many: true
- exec:
    client.0:
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
      - ceph_test_rados_delete_pools_parallel
Actions #5

Updated by Zack Cerza about 10 years ago

Found a case where we are completely ignoring raised exceptions. This commit will log them:
https://github.com/ceph/teuthology/commit/addfed2da8c736a18f251847bdbfd1de983255da

I don't know why we don't want to raise the exception, but I'm hoping this will shed light on the issue.

Actions #6

Updated by Zack Cerza about 10 years ago

  • Status changed from New to Need More Info
  • Assignee changed from Zack Cerza to Sage Weil

I've run the mentioned yaml a couple times and it's passed both times. Can you recommend a reliable reproducer?

Actions #7

Updated by Ian Colle almost 10 years ago

  • Priority changed from High to Normal
Actions #8

Updated by Sage Weil over 9 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF