Project

General

Profile

Actions

Bug #7673

closed

"reached maximum tries" in /teuthology-2014-03-09_03:00:01-rados-firefly-testing-basic-plana suite

Added by Yuri Weinstein about 10 years ago. Updated about 10 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-03-09_03:00:01-rados-firefly-testing-basic-plana/123828/

2014-03-09T12:20:28.624 DEBUG:teuthology.orchestra.run:Running [10.214.132.27]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph status --format=json-pretty'
2014-03-09T12:20:28.880 INFO:teuthology.task.thrashosds.ceph_manager:{u'election_epoch': 4, u'quorum': [0, 1, 2], u'mdsmap': {u'max': 1, u'epoch': 5, u'by_rank': [{u'status': u'up:active', u'name': u'a', u'rank': 0}], u'up': 1, u'in': 1}, u'monmap': {u'epoch': 1, u'mons': [{u'name': u'b', u'rank': 0, u'addr': u'10.214.131.3:6789/0'}, {u'name': u'a', u'rank': 1, u'addr': u'10.214.132.27:6789/0'}, {u'name': u'c', u'rank': 2, u'addr': u'10.214.132.27:6790/0'}], u'modified': u'2014-03-09 11:19:40.635841', u'fsid': u'257857a3-1b65-4492-bfe3-35d5a54c5acd', u'created': u'2014-03-09 11:19:40.635841'}, u'health': {u'detail': [], u'timechecks': {u'round_status': u'finished', u'epoch': 4, u'round': 26, u'mons': [{u'latency': u'0.000000', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'b'}, {u'latency': u'0.009557', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'a'}, {u'latency': u'0.009445', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'c'}]}, u'health': {u'health_services': [{u'mons': [{u'last_updated': u'2014-03-09 12:20:05.993151', u'name': u'b', u'avail_percent': 92, u'kb_total': 472345880, u'kb_avail': 438195344, u'health': u'HEALTH_OK', u'kb_used': 10133640, u'store_stats': {u'bytes_total': 3999988, u'bytes_log': 983040, u'last_updated': u'0.000000', u'bytes_misc': 65552, u'bytes_sst': 2951396}}, {u'last_updated': u'2014-03-09 12:20:06.124380', u'name': u'a', u'avail_percent': 93, u'kb_total': 472345880, u'kb_avail': 441166052, u'health': u'HEALTH_OK', u'kb_used': 7162932, u'store_stats': {u'bytes_total': 5048606, u'bytes_log': 2031616, u'last_updated': u'0.000000', u'bytes_misc': 65552, u'bytes_sst': 2951438}}, {u'last_updated': u'2014-03-09 12:20:06.124467', u'name': u'c', u'avail_percent': 93, u'kb_total': 472345880, u'kb_avail': 441166052, u'health': u'HEALTH_OK', u'kb_used': 7162932, u'store_stats': {u'bytes_total': 5048606, u'bytes_log': 2031616, u'last_updated': u'0.000000', u'bytes_misc': 65552, u'bytes_sst': 2951438}}]}]}, u'overall_status': u'HEALTH_WARN', u'summary': [{u'severity': u'HEALTH_WARN', u'summary': u'1 pgs recovering'}, {u'severity': u'HEALTH_WARN', u'summary': u'1 pgs stuck unclean'}, {u'severity': u'HEALTH_WARN', u'summary': u'10 requests are blocked > 32 sec'}, {u'severity': u'HEALTH_WARN', u'summary': u'recovery 2691/39593 objects degraded (6.797%)'}, {u'severity': u'HEALTH_WARN', u'summary': u'pool data pg_num 44 > pgp_num 34'}, {u'severity': u'HEALTH_WARN', u'summary': u'pool metadata pg_num 54 > pgp_num 24'}, {u'severity': u'HEALTH_WARN', u'summary': u'pool rbd pg_num 34 > pgp_num 24'}, {u'severity': u'HEALTH_WARN', u'summary': u'pool unique_pool_0 has too few pgs'}]}, u'pgmap': {u'bytes_total': 1499591012352, u'recovering_bytes_per_sec': 9225307, u'degraded_objects': 2691, u'num_pgs': 143, u'recovering_keys_per_sec': 0, u'data_bytes': 55218016448, u'degraded_total': 39593, u'bytes_used': 78067666944, u'recovering_objects_per_sec': 2, u'version': 918, u'pgs_by_state': [{u'count': 142, u'state_name': u'active+clean'}, {u'count': 1, u'state_name': u'active+recovering'}], u'degraded_ratio': u'6.797', u'bytes_avail': 1421523345408}, u'quorum_names': [u'b', u'a', u'c'], u'osdmap': {u'osdmap': {u'full': u'false', u'nearfull': u'false', u'num_osds': 6, u'num_up_osds': 6, u'epoch': 61, u'num_in_osds': 3}}, u'fsid': u'257857a3-1b65-4492-bfe3-35d5a54c5acd'}
2014-03-09T12:20:28.881 DEBUG:teuthology.orchestra.run:Running [10.214.132.27]: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format=json'
2014-03-09T12:20:29.119 INFO:teuthology.orchestra.run.err:[10.214.132.27]: dumped all in format json
2014-03-09T12:20:29.350 INFO:teuthology.task.radosbench.radosbench.0.out:[10.214.132.27]: 2014-03-09 12:20:29.350115min lat: 0.11649 max lat: 2003.8 avg lat: 3.265
2014-03-09T12:20:29.351 INFO:teuthology.task.radosbench.radosbench.0.out:[10.214.132.27]:    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
2014-03-09T12:20:29.351 INFO:teuthology.task.radosbench.radosbench.0.out:[10.214.132.27]:   3600       4     11935     11931   13.2547         0         -     3.265
2014-03-09T12:20:29.533 ERROR:teuthology.run_tasks:Manager failed: radosbench
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-firefly/teuthology/run_tasks.py", line 84, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/teuthworker/teuthology-firefly/teuthology/task/radosbench.py", line 80, in task
    run.wait(radosbench.itervalues(), timeout=timeout)
  File "/home/teuthworker/teuthology-firefly/teuthology/orchestra/run.py", line 349, in wait
    check_time()
  File "/home/teuthworker/teuthology-firefly/teuthology/contextutil.py", line 125, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: reached maximum tries (600) after waiting for 3600 seconds
archive_path: /var/lib/teuthworker/archive/teuthology-2014-03-09_03:00:01-rados-firefly-testing-basic-plana/123828
description: rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml
  thrashers/morepggrow.yaml workloads/ec-radosbench.yaml}
email: null
job_id: '123828'
kernel: &id001
  kdb: true
  sha1: f31a96afabfad92cb917fd52a421b23275cdb6da
last_in_suite: false
machine_type: plana
name: teuthology-2014-03-09_03:00:01-rados-firefly-testing-basic-plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject internal delays: 0.002
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug ms: 1
        debug osd: 20
        osd sloppy crc: true
    fs: xfs
    log-whitelist:
    - slow request
    sha1: a4cbb192ab9e1b2a997e3a831e58648a30e16e59
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: a4cbb192ab9e1b2a997e3a831e58648a30e16e59
  s3tests:
    branch: master
  workunit:
    sha1: a4cbb192ab9e1b2a997e3a831e58648a30e16e59
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
  - client.0
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.1
targets:
  ubuntu@plana37.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8FPsPKV1KVlb89QL2k0kNMTM3mIenC2wHxnVb9EgA7MGjC/gJFv4FoYFtTn0SadJl2hZNJ8kk7HjBsgCQG3f+LL3l7DPlqSJG8zFFXW6LCzjk0YQX/JX7X6nK33HdxzzOZVecglaQnTSWKbPDp8ofd9EQX4gN7mPb/C0/FUtT0Hjrb97QBYqDDVWEMBo7BCT4YdsisPBkCFpQ1Khl2K89e9uhfw4wvVvqveLnU3NEAULbEhMeLg0LMsSlmK2gfiyJbyxweApXo4VqfuNd6DnUqUzilAM0VJL3KgJqJGW46IYC76VPMSHPKD66kgrYiyBm12iLEy70kODNVaNe3wnX
  ubuntu@plana51.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDgNXP3p/sw2sy34ARorzUh9QvDPit80IHDKQ71BGtytSAQL5ijlSpjJRjGT0HB9xHvR6v8115ikzmot1HgVeJSnC07UQKWp3CfVIUHZOtbMgw0exON14083tSlvn2djTA/bphuwag5u9y+0XkufOXBNrY4aBlQS9vNXnsW0PQwlgJ6YqK3W2e1qirpvfMamLugAFLdycCXXmjriXFuAxvHqbFrJYVEvNbsK8Bt+cRE5l0gcBin+5wJmz4iKagwYVAVqW7i1lZM1F0QdffYwuUrQ110/iz9AcnNvu6dSU+3g7agjBKvWCA+DVEn0RWbaRJ7M+FCl2PmLULnjvK44Qsp
tasks:
- internal.lock_machines:
  - 2
  - plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 3
    chance_pgpnum_fix: 1
    timeout: 1200
- radosbench:
    clients:
    - client.0
    ec_pool: true
    time: 1800
    unique_pool: true
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.11186
description: rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml
  thrashers/morepggrow.yaml workloads/ec-radosbench.yaml}
duration: 7009.944432973862
failure_reason: reached maximum tries (600) after waiting for 3600 seconds
flavor: basic
mon.a-kernel-sha1: f31a96afabfad92cb917fd52a421b23275cdb6da
mon.b-kernel-sha1: f31a96afabfad92cb917fd52a421b23275cdb6da
owner: scheduled_teuthology@teuthology
success: false

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #7649: ec ceph_test_rados stuck recoveringResolvedSamuel Just03/07/2014

Actions
Actions #1

Updated by Yuri Weinstein about 10 years ago

  • Severity changed from 3 - minor to 2 - major

There are seem to be several of those, so 'major'

Actions #2

Updated by Samuel Just about 10 years ago

  • Assignee set to Samuel Just
  • Priority changed from Normal to Urgent
Actions #3

Updated by Samuel Just about 10 years ago

The non-ec ones are probably just an inadequate timeout -- the cleanup is likely to take longer than the writeout. The ec ones are probably 7649.

Actions #4

Updated by Samuel Just about 10 years ago

I've turned up the timeout in the tests.

Actions #5

Updated by Samuel Just about 10 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF