Bug #9027
closedFailed "create unique_pool_0 16 16 erasure teuthologyprofile" in upgrade:dumpling-firefly-x-next-testing-basic-vps suite
100%
Description
Tests failed in upgrade:dumpling-firefly-x-next-testing-basic-vps suite with EC enabled.
2014-08-05T22:28:52.569 INFO:teuthology.task.mon_thrash.mon_thrasher:waiting for 1.0 secs before continuing thrashing 2014-08-05T22:28:53.569 INFO:teuthology.task.mon_thrash.ceph_manager:waiting for quorum size 3 2014-08-05T22:28:53.569 INFO:teuthology.orchestra.run.vpm068:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph quorum_status' 2014-08-05T22:28:53.767 INFO:teuthology.task.mon_thrash.ceph_manager:quorum_status is {"election_epoch":232,"quorum":[0,1,2],"quorum_names":["b","a","c"],"quorum_leader_name":"b","monmap":{"epoch":1,"fsid":"75400fef-e668-48f0-aae8-ecdd8751422a","modified":"2014-08-06 02:49:06.214815","created":"2014-08-06 02:49:06.214815","mons":[{"rank":0,"name":"b","addr":"10.214.138.129:6789\/0"},{"rank":1,"name":"a","addr":"10.214.138.139:6789\/0"},{"rank":2,"name":"c","addr":"10.214.138.129:6790\/0"}]}} 2014-08-05T22:28:53.768 INFO:teuthology.task.mon_thrash.ceph_manager:quorum is size 3 2014-08-05T22:28:53.768 DEBUG:teuthology.run_tasks:Unwinding manager rados 2014-08-05T22:28:53.768 INFO:teuthology.task.rados:joining rados 2014-08-05T22:28:53.768 ERROR:teuthology.run_tasks:Manager failed: rados Traceback (most recent call last): File "/home/teuthworker/src/teuthology_next/teuthology/run_tasks.py", line 105, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/home/teuthworker/src/teuthology_next/teuthology/task/rados.py", line 200, in task running.get() File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get raise self._exception AssertionError 2014-08-05T22:28:53.769 DEBUG:teuthology.run_tasks:Unwinding manager rados 2014-08-05T22:28:53.769 INFO:teuthology.task.rados:joining rados 2014-08-05T22:28:53.769 ERROR:teuthology.run_tasks:Manager failed: rados Traceback (most recent call last): File "/home/teuthworker/src/teuthology_next/teuthology/run_tasks.py", line 105, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/home/teuthworker/src/teuthology_next/teuthology/task/rados.py", line 200, in task running.get() File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get raise self._exception CommandFailedError: Command failed on vpm068 with status 22: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph osd pool create unique_pool_0 16 16 erasure teuthologyprofile'
archive_path: /var/lib/teuthworker/archive/teuthology-2014-08-05_19:05:01-upgrade:dumpling-firefly-x-next-testing-basic-vps/402505 branch: next description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml 6-final-workload/{ec-readwrite.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml} distros/ubuntu_12.04.yaml} email: ceph-qa@ceph.com job_id: '402505' kernel: &id001 kdb: true sha1: 967166011221589288348b893720d358150176b9 last_in_suite: false machine_type: vps name: teuthology-2014-08-05_19:05:01-upgrade:dumpling-firefly-x-next-testing-basic-vps nuke-on-error: true os_type: ubuntu os_version: '12.04' overrides: admin_socket: branch: next ceph: conf: global: osd heartbeat grace: 40 mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - scrub mismatch - ScrubResult sha1: dceab8dc49d4f458e3bc9fd40eb8b487f1e35948 ceph-deploy: branch: dev: next conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: dceab8dc49d4f458e3bc9fd40eb8b487f1e35948 s3tests: branch: next workunit: sha1: dceab8dc49d4f458e3bc9fd40eb8b487f1e35948 owner: scheduled_teuthology@teuthology priority: 1000 roles: - - mon.a - mds.a - osd.0 - osd.1 - - mon.b - mon.c - osd.2 - osd.3 - - client.0 - client.1 suite: upgrade:dumpling-firefly-x suite_branch: master targets: ubuntu@vpm068.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCu9XeLYF+bwlBoMJ5PvYBve6WQp7YUJefyhPy43SGxPwdep80xM3XievRLw+pT3D1jEjNhteWY5sNlwu3FbbcLJfymRcRTYMyzUEcraI9V+w1SH8Zcr0Xk/axGLRvfj0hXxc1GBlrIP5r25hROtGOl3xG2vIxvKtjUzLZ82SXDjbKEWSW2Vqu3fh+yHgGEEsbc4L+XlWLd5T3IQEQpJ3jikqqTIlJozO24TEd4SgTonA45Kn5zWD3F5KIIcwcBT8xZTKTmBiZ/mACMBjOGxrWWo1JQ9KNOYXCNPiP917OlXK51a1i225FNxP8JNl+S/iP7QZKQXtB24Zpo1Xpmo1LF ubuntu@vpm071.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC+SPcXJRlWMjKtD2N72o3M8TOKhhV3zze4se57HFyW17zvRYduLgMqTgZwoSa2HP6qs/uKxaWJhBt1W5DGagIJ9p1XZpJv8c9bukpAL+JytDWTdhmPzmW/FtLevNECmog9wGhn2APSMIySjpqXvHZzLqssgDXsko6rO7YH1auYyo1ef9lm9XM2vDoo376sA3lh4T6cajL4GoK30ikk+gBR/raMWhZhlfVvP97kFGs+0Yc/ZOOOs/fikOTMEVIxsnPgV8mlIJj5Z2HvnaOHf3ny6SHVhqmhELCCd+wgdtdjy6rOSVSnT0FhmegQV4yRnaE0Gy5YFkQAImmX0RRPoe99 ubuntu@vpm072.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEkYA0FL84PKCSYqnRqM8MafXqgSwp1sTxLU7Lbus2Xw3AKYXrrRNzHbnaGl12R5q7TfYdXCwE7gd5MT4RE5dSbZNHm361qFsRUq+9Kj7ScBjby4XY28b0Ce8Pkg84If2umsdExssdqAzvkDlJgEaO15RnxDAGKLt3N3jwCSM/y0ajqjToG9MvrRgvRfiYxfNhrU4Iwlr1nDY11nJ6eaBZIZhT1yUOsynCzRp3ubFkSHI8ZnpyInNJGdjob+DN81Up+DpXaKwRBl7tvMDyczQ8hAAiPGkpQ8Cg5mWnv4SbxTZyGzVy4FpGEpHWSh72mEsT8fFNHaZVnCxb3bUV/Bo3 tasks: - internal.lock_machines: - 3 - vps - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.serialize_remote_roles: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: dumpling - print: '**** done dumpling install' - ceph: fs: xfs - parallel: - workload - print: '**** done parallel' - install.upgrade: client.0: branch: firefly mon.a: branch: firefly mon.b: branch: firefly - print: '**** done install.upgrade' - ceph.restart: null - print: '**** done restart' - parallel: - workload2 - upgrade-sequence - print: '**** done parallel' - install.upgrade: client.0: null - print: '**** done install.upgrade client.0 to the version from teuthology-suite arg' - rados: clients: - client.0 ec_pool: true objects: 500 op_weights: append: 45 delete: 10 read: 45 write: 0 ops: 4000 - rados: clients: - client.1 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 - workunit: clients: client.1: - rados/load-gen-mix.sh - mon_thrash: revive_delay: 20 thrash_delay: 1 - workunit: clients: client.1: - rados/test.sh - workunit: clients: client.1: - cls/test_cls_rbd.sh - workunit: clients: client.1: - rbd/import_export.sh env: RBD_CREATE_ARGS: --new-format - rgw: - client.1 - s3tests: client.1: rgw_server: client.1 - swift: client.1: rgw_server: client.1 teuthology_branch: next tube: vps upgrade-sequence: sequential: - install.upgrade: mon.a: null - print: '**** done install.upgrade mon.a to the version from teuthology-suite arg' - install.upgrade: mon.b: null - print: '**** done install.upgrade mon.b to the version from teuthology-suite arg' - ceph.restart: daemons: - mon.a - sleep: duration: 60 - ceph.restart: daemons: - mon.b - sleep: duration: 60 - ceph.restart: - mon.c - sleep: duration: 60 - ceph.restart: - osd.0 - sleep: duration: 60 - ceph.restart: - osd.1 - sleep: duration: 60 - ceph.restart: - osd.2 - sleep: duration: 60 - ceph.restart: - osd.3 - sleep: duration: 60 - ceph.restart: - mds.a - exec: mon.a: - ceph osd crush tunables firefly verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.8709 workload: sequential: - workunit: branch: dumpling clients: client.0: - rados/test.sh - cls - print: '**** done rados/test.sh & cls' - workunit: branch: dumpling clients: client.0: - rados/load-gen-big.sh - print: '**** done rados/load-gen-big.sh' - workunit: branch: dumpling clients: client.0: - rbd/test_librbd.sh - print: '**** done rbd/test_librbd.sh' - workunit: branch: dumpling clients: client.0: - rbd/test_librbd_python.sh - print: '**** done rbd/test_librbd_python.sh' workload2: sequential: - workunit: branch: firefly clients: client.0: - rados/test.sh - cls - print: '**** done #rados/test.sh and cls 2' - workunit: branch: firefly clients: client.0: - rados/load-gen-big.sh - print: '**** done rados/load-gen-big.sh 2' - workunit: branch: firefly clients: client.0: - rbd/test_librbd.sh - print: '**** done rbd/test_librbd.sh 2' - workunit: branch: firefly clients: client.0: - rbd/test_librbd_python.sh - print: '**** done rbd/test_librbd_python.sh 2'
client.0-kernel-sha1: 967166011221589288348b893720d358150176b9 description: upgrade:dumpling-firefly-x/parallel/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-firefly-upgrade/firefly.yaml 4-workload/{rados_api.yaml rados_loadgenbig.yaml test_rbd_api.yaml test_rbd_python.yaml} 5-upgrade-sequence/upgrade-by-daemon.yaml 6-final-workload/{ec-readwrite.yaml rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_s3tests.yaml rgw_swift.yaml} distros/ubuntu_12.04.yaml} duration: 10086.872591018677 failure_reason: '' flavor: basic mon.a-kernel-sha1: 967166011221589288348b893720d358150176b9 mon.b-kernel-sha1: 967166011221589288348b893720d358150176b9 owner: scheduled_teuthology@teuthology success: false
Updated by Loïc Dachary over 9 years ago
- Status changed from New to 12
For some reason it is trying to re-create a pool that already exists and fails
2014-08-05T21:33:16.761 INFO:teuthology.orchestra.run.vpm068.stderr:Error EINVAL: pool 'unique_pool_0' cannot change to type erasure
if it was a replicated pool it would silently succeed because it is idempotent. But trying to create a pool of a different type fails instead (which is good, IMHO ;-).
Updated by Loïc Dachary over 9 years ago
Updated by Sage Weil over 9 years ago
- Assignee set to Sage Weil
- Priority changed from Normal to Urgent
ceph osd pool create unique_pool_0 hung
Updated by Sage Weil over 9 years ago
def create_pool_with_unique_name(self, pg_num=16, ec_pool=False, ec_m=1, ec_k=2): """ Create a pool named unique_pool_X where X is unique. """ name = "" with self.lock: name = "unique_pool_%s" % (str(self.next_pool_id),) self.next_pool_id += 1 self.create_pool( name, pg_num, ec_pool=ec_pool, ec_m=ec_m, ec_k=ec_k) return name
2014-08-05T21:33:04.619 INFO:teuthology.run_tasks:Running task rados... 2014-08-05T21:33:04.621 INFO:teuthology.task.rados:Beginning rados... 2014-08-05T21:33:04.621 INFO:teuthology.run_tasks:Running task rados... 2014-08-05T21:33:04.621 INFO:teuthology.task.rados:Beginning rados...
i think they raced, but my ignorant reading of that python says that it should give back unique pool names?
Updated by Sage Weil over 9 years ago
- Assignee changed from Sage Weil to Loïc Dachary
Updated by Loïc Dachary over 9 years ago
The two rados tasks
- rados: clients: - client.0 ec_pool: true objects: 500 op_weights: append: 45 delete: 10 read: 45 write: 0 ops: 4000 - rados: clients: - client.1 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000
are run by calling the task method which spawns a gevent thread which creates a CephManager object . The CephManager will later be used to create a pool .
The manager is stored in the ctx object which is common to all tasks. However, there may be a race condition since the tasks are run in paralell:
- parallel: - workload - upgrade-sequence
If the two tasks have a different CephManager object, both will create the unique_pool_0 object.
Updated by Loïc Dachary over 9 years ago
- Project changed from Ceph to teuthology
- Status changed from 12 to Fix Under Review
- % Done changed from 0 to 80
Updated by Loïc Dachary over 9 years ago
There needs to be a real lock to protect the part of the code changing ctx. A lock is created in ctx and used by rados.py to ensure the uniqueness of the ctx.manager instance.
Updated by Loïc Dachary over 9 years ago
Alternative solution : initialize ctx.manager in ceph.py
Updated by Loïc Dachary over 9 years ago
- Status changed from Fix Under Review to Resolved
- % Done changed from 80 to 100