Project

General

Profile

Actions

Bug #7702

closed

osd thrashing + rgw = timeouts

Added by Yuri Weinstein about 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-03-12_01:35:02-upgrade:dumpling-x:stress-split-firefly---basic-plana/127155/

2014-03-12T06:44:44.836 INFO:teuthology.orchestra.run.err:[10.214.133.37]: dumped all in format json
2014-03-12T06:44:45.206 INFO:teuthology.orchestra.run.err:[10.214.132.32]: testStackedOverwrite (test.functional.tests.TestFileUTF8) ... ok
2014-03-12T06:44:45.208 INFO:teuthology.orchestra.run.err:[10.214.132.32]: testTooLongName (test.functional.tests.TestFileUTF8) ... ok
2014-03-12T06:44:45.284 INFO:teuthology.orchestra.run.err:[10.214.132.32]: testZeroByteFile (test.functional.tests.TestFileUTF8) ... ok
2014-03-12T06:44:45.285 INFO:teuthology.orchestra.run.err:[10.214.132.32]: 
2014-03-12T06:44:45.285 INFO:teuthology.orchestra.run.err:[10.214.132.32]: ======================================================================
2014-03-12T06:44:45.286 INFO:teuthology.orchestra.run.err:[10.214.132.32]: ERROR: testContainerExistenceCachingProblem (test.functional.tests.TestContainer)
2014-03-12T06:44:45.286 INFO:teuthology.orchestra.run.err:[10.214.132.32]: ----------------------------------------------------------------------
2014-03-12T06:44:45.286 INFO:teuthology.orchestra.run.err:[10.214.132.32]: Traceback (most recent call last):
2014-03-12T06:44:45.286 INFO:teuthology.orchestra.run.err:[10.214.132.32]:   File "/home/ubuntu/cephtest/swift/test/functional/tests.py", line 104, in setUp
2014-03-12T06:44:45.286 INFO:teuthology.orchestra.run.err:[10.214.132.32]:     cls.env.setUp()
2014-03-12T06:44:45.286 INFO:teuthology.orchestra.run.err:[10.214.132.32]:   File "/home/ubuntu/cephtest/swift/test/functional/tests.py", line 342, in setUp
2014-03-12T06:44:45.286 INFO:teuthology.orchestra.run.err:[10.214.132.32]:     cls.account.delete_containers()
2014-03-12T06:44:45.286 INFO:teuthology.orchestra.run.err:[10.214.132.32]:   File "/home/ubuntu/cephtest/swift/test/functional/swift.py", line 360, in delete_containers
2014-03-12T06:44:45.287 INFO:teuthology.orchestra.run.err:[10.214.132.32]:     if not cont.delete_recursive():
2014-03-12T06:44:45.287 INFO:teuthology.orchestra.run.err:[10.214.132.32]:   File "/home/ubuntu/cephtest/swift/test/functional/swift.py", line 404, in delete_recursive
2014-03-12T06:44:45.287 INFO:teuthology.orchestra.run.err:[10.214.132.32]:     return self.delete_files() and self.delete()
2014-03-12T06:44:45.287 INFO:teuthology.orchestra.run.err:[10.214.132.32]:   File "/home/ubuntu/cephtest/swift/test/functional/swift.py", line 398, in delete_files
2014-03-12T06:44:45.287 INFO:teuthology.orchestra.run.err:[10.214.132.32]:     if not file.delete():
2014-03-12T06:44:45.287 INFO:teuthology.orchestra.run.err:[10.214.132.32]:   File "/home/ubuntu/cephtest/swift/test/functional/swift.py", line 540, in delete
2014-03-12T06:44:45.287 INFO:teuthology.orchestra.run.err:[10.214.132.32]:     raise ResponseError(self.conn.response)
2014-03-12T06:44:45.287 INFO:teuthology.orchestra.run.err:[10.214.132.32]: ResponseError: 500: Internal Server Error
2014-03-12T06:44:45.288 INFO:teuthology.orchestra.run.err:[10.214.132.32]: 
2014-03-12T06:44:45.288 INFO:teuthology.orchestra.run.err:[10.214.132.32]: ======================================================================
2014-03-12T06:44:45.288 INFO:teuthology.orchestra.run.err:[10.214.132.32]: ERROR: testSerialization (test.functional.tests.TestFile)
2014-03-12T06:44:45.288 INFO:teuthology.orchestra.run.err:[10.214.132.32]: ----------------------------------------------------------------------
2014-03-12T06:44:45.288 INFO:teuthology.orchestra.run.err:[10.214.132.32]: Traceback (most recent call last):
2014-03-12T06:44:45.288 INFO:teuthology.orchestra.run.err:[10.214.132.32]:   File "/home/ubuntu/cephtest/swift/test/functional/tests.py", line 1326, in testSerialization
2014-03-12T06:44:45.288 INFO:teuthology.orchestra.run.err:[10.214.132.32]:     file.write_random(f['bytes'])
2014-03-12T06:44:45.288 INFO:teuthology.orchestra.run.err:[10.214.132.32]:   File "/home/ubuntu/cephtest/swift/test/functional/swift.py", line 739, in write_random
2014-03-12T06:44:45.288 INFO:teuthology.orchestra.run.err:[10.214.132.32]:     if not self.write(data, hdrs=hdrs, parms=parms, cfg=cfg):
2014-03-12T06:44:45.289 INFO:teuthology.orchestra.run.err:[10.214.132.32]:   File "/home/ubuntu/cephtest/swift/test/functional/swift.py", line 731, in write
2014-03-12T06:44:45.289 INFO:teuthology.orchestra.run.err:[10.214.132.32]:     raise ResponseError(self.conn.response)
2014-03-12T06:44:45.289 INFO:teuthology.orchestra.run.err:[10.214.132.32]: ResponseError: 500: Internal Server Error
2014-03-12T06:44:45.289 INFO:teuthology.orchestra.run.err:[10.214.132.32]: 
2014-03-12T06:44:45.289 INFO:teuthology.orchestra.run.err:[10.214.132.32]: ----------------------------------------------------------------------
2014-03-12T06:44:45.289 INFO:teuthology.orchestra.run.err:[10.214.132.32]: Ran 137 tests in 384.466s
2014-03-12T06:44:45.289 INFO:teuthology.orchestra.run.err:[10.214.132.32]: 
2014-03-12T06:44:45.289 INFO:teuthology.orchestra.run.err:[10.214.132.32]: FAILED (errors=2)
2014-03-12T06:44:45.303 ERROR:teuthology.contextutil:Saw exception from nested tasks
archive_path: /var/lib/teuthworker/archive/teuthology-2014-03-12_01:35:02-upgrade:dumpling-x:stress-split-firefly---basic-plana/127155
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/readwrite.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
email: null
job_id: '127155'
last_in_suite: false
machine_type: plana
name: teuthology-2014-03-12_01:35:02-upgrade:dumpling-x:stress-split-firefly---basic-plana
nuke-on-error: true
os_type: ubuntu
os_version: '12.04'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        debug filestore: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: c55da14a3d057c36a730383c3a53d8ca14e30d16
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: c55da14a3d057c36a730383c3a53d8ca14e30d16
  s3tests:
    branch: master
  workunit:
    sha1: c55da14a3d057c36a730383c3a53d8ca14e30d16
owner: scheduled_teuthology@teuthology
roles:
- - mon.a
  - mon.b
  - mds.a
  - osd.0
  - osd.1
  - osd.2
- - osd.3
  - osd.4
  - osd.5
  - mon.c
- - client.0
targets:
  ubuntu@plana41.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDO1APNwMPhJnDeyE8W4NBl4Fdsx8dBV8IzROahOlY56SZsVVthCKUZm1cHPE6nN9L4iEZw7ibM9JpqAI/cfFoSQh4HVAwe3lfIQTO3dh7EF7vPjMNowiiPEmQcby0RNi85x33Q6m+44E+5A72ZmVdmmuOLsi7ERd+m3eAnzI4GdTLL4bJuxLMfpj8X4aGdMnopICuCOmzJGCw8+ye5pC/NdX9PBmMsg2G/Yjb54SQXELTMTgSCPOt+LoemSD1CxsPgaXVM3KMSPGQ3cRaOr3n+UpnC8bpzfvqGEIYtU40zTqLeElgkrn7lTZl35+yG3y20mcvCDiL0VITU/zToTLaD
  ubuntu@plana46.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGKM4kQKdDX2q2T0mrUqeGpcBoHn3G4Ro+axOJNV1U01TDovMBBhdMM7QK4QfDeIf7JCKynfHQc7zGjHbGOHcXB5M82035i948JCVZkHizQqtEAVx17UzYy6F0yQH/TlP6c6pBMLUmzP/uTg4gdFPrJPYVh+LLSTMz1tpEsRqlOFwmBFtIH6wJDPrRMQNLIWBy933MUY31bkpr6iE91YGhU+QtpwtNBte/oXG/JPWsyXuCVtucqyTVtqE+/6hoEOIidK+p0RY6a8NcMMDQwCljfY66bw2kAgbMIX1mG2dhFt9ZNQsifreoPdKhvjC2UTbEDEjfCiZa4ijZJ8Ih0vRH
  ubuntu@plana79.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGJX9/cZFlm+ll32X716yKrmR/RE94iH9TusIFLY5vLoE8CBCupjAKdkn4mqOB5eNvwqapMG63Vww5Cl9zo0wKPEHi3jsZCwbAxByc9sBFivZeBnmTUrYNesQvAs1Izr49/4h71oCt98hfX3hl81iEhIEGjjj7XoD3blyYHlyR9LBNoYMllLfi9Pw4KD1snEikGKlR8zMyAgndlZ5ODU4usiCMXLypAa4wFKfR6w17nZI2/Q2xkUy7l59oap50bOR9mSNMOPqpN1KgsK717JCKMU4jHHw+zN0oWvQPmDZK2ckzbyftBfpOqKyidGHpLjpXfuQ1Xk4R9fgUkpE6gcD5
tasks:
- internal.lock_machines:
  - 3
  - plana
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    thrash_primary_affinity: false
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test-upgrade-firefly.sh
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
  - client.0
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: firefly
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.plana.11138
description: upgrade/dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/readwrite.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
duration: 4205.675462961197
failure_reason: 'Command failed on 10.214.132.32 with status 1: "SWIFT_TEST_CONFIG_FILE=/home/ubuntu/cephtest/archive/testswift.client.0.conf
  /home/ubuntu/cephtest/swift/virtualenv/bin/nosetests -w /home/ubuntu/cephtest/swift/test/functional
  -v -a ''!fails_on_rgw''"'
flavor: basic
owner: scheduled_teuthology@teuthology
sentry_event: http://sentry.ceph.com/inktank/teuthology/search?q=a0bfdbee0a574c24bca16026e3f00c54
success: false

Related issues 1 (0 open1 closed)

Related to rgw - Bug #8988: AssertionError(s) in upgrade:firefly-x:stress-split-next---basic-planaResolvedLoïc Dachary07/31/2014

Actions
Actions #1

Updated by Sage Weil about 10 years ago

  • Priority changed from High to Urgent
Actions #2

Updated by Sage Weil about 10 years ago

  • Status changed from New to Need More Info
Actions #3

Updated by Sage Weil about 10 years ago

  • Subject changed from "ERROR: testContainerExistenceCachingProblem" in upgrade:dumpling-x:stress-split-firefly---basic-plana suite to osd thrashing + rgw = timeouts
  • Status changed from Need More Info to 12
  • Source changed from other to Q/A

this affects the dumpling-x/stress-split tests, and would affect rados thrashing if we had an rgw workload in there.

Actions #4

Updated by Sage Weil about 10 years ago

  • Status changed from 12 to Resolved
Actions #5

Updated by Loïc Dachary over 9 years ago

  • Project changed from Ceph to rgw
Actions

Also available in: Atom PDF