Bug #11297
closed"AssertionError: fewer data and journal disks than required" in ceph-deploy-hammer-distro-basic-multi run
Added by Yuri Weinstein about 9 years ago. Updated almost 9 years ago.
0%
Description
Run: http://pulpito.front.sepia.ceph.com/teuthology-2015-03-31_15:00:38-ceph-deploy-hammer-distro-basic-multi/
Jobs - many, 830669 for example
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-31_15:00:38-ceph-deploy-hammer-distro-basic-multi/830669/
2015-03-31T16:44:11.468 INFO:teuthology.orchestra.run.plana92:Running: 'rmdir -- /home/ubuntu/cephtest' 2015-03-31T16:44:11.471 INFO:teuthology.orchestra.run.plana95:Running: 'rmdir -- /home/ubuntu/cephtest' 2015-03-31T16:44:11.477 DEBUG:teuthology.run_tasks:Unwinding manager internal.lock_machines 2015-03-31T16:44:11.477 DEBUG:teuthology.run_tasks:Exception was not quenched, exiting: AssertionError: fewer data and journal disks than required plana17 2015-03-31T16:44:11.478 INFO:teuthology.nuke:Checking targets against current locks 2015-03-31T16:44:11.616 DEBUG:teuthology.nuke:shortname: plana95
Updated by Zack Cerza about 9 years ago
- Assignee set to Travis Rhoden
Tests that need >4 disks need to be scheduled on mira or burnupi
Updated by Travis Rhoden about 9 years ago
- Assignee changed from Travis Rhoden to Yuri Weinstein
Since this test suite has roles with 3 OSDs, and matrixes in an override that runs tests with separate block devices for OSD and journal, this test suite needs nodes with at least 6 block devices.
So, as zack pointed out, these tests need to be run on mira or burnupi.
Updated by Yuri Weinstein about 9 years ago
see corresponding PR https://github.com/ceph/ceph-qa-suite/pull/389
Updated by Yuri Weinstein about 9 years ago
- Assignee changed from Yuri Weinstein to Travis Rhoden
See on VPS run
Run: http://pulpito.ceph.com/teuthology-2015-04-07_18:15:26-ceph-deploy:rados-hammer-distro-basic-vps/
Jobs: many, for example 840644
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-04-07_18:15:26-ceph-deploy:rados-hammer-distro-basic-vps/840644/
2015-04-07T19:01:13.445 INFO:teuthology.orchestra.run.vpm165.stderr:[vpm047][DEBUG ] detect machine type 2015-04-07T19:01:13.449 INFO:teuthology.orchestra.run.vpm165.stderr:[ceph_deploy.install][INFO ] Distro info: Ubuntu 12.04 precise 2015-04-07T19:01:13.449 INFO:teuthology.orchestra.run.vpm165.stderr:[vpm047][INFO ] purging data on vpm047 2015-04-07T19:01:13.451 INFO:teuthology.orchestra.run.vpm165.stderr:[vpm047][INFO ] Running command: sudo rm -rf --one-file-system -- /var/lib/ceph 2015-04-07T19:01:13.463 INFO:teuthology.orchestra.run.vpm165.stderr:[vpm047][INFO ] Running command: sudo rm -rf --one-file-system -- /etc/ceph/ 2015-04-07T19:01:13.492 INFO:teuthology.orchestra.run.vpm165.stderr:Unhandled exception in thread started by 2015-04-07T19:01:13.495 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 251, in build_ceph_cluster node_dev_list = get_dev_for_osd(ctx, config) File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 119, in get_dev_for_osd assert num_devs_reqd <= len(devs), 'fewer data and journal disks than required ' + shortname AssertionError: fewer data and journal disks than required vpm047 2015-04-07T19:01:13.497 INFO:tasks.ceph_deploy:Removing ceph-deploy ... 2015-04-07T19:01:13.497 INFO:teuthology.orchestra.run.vpm165:Running: 'rm -rf /home/ubuntu/cephtest/ceph-deploy' 2015-04-07T19:01:13.584 INFO:teuthology.task.install:Removing shipped files: /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits... 2015-04-07T19:01:13.584 INFO:teuthology.orchestra.run.vpm047:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits' 2015-04-07T19:01:13.588 INFO:teuthology.orchestra.run.vpm165:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits' 2015-04-07T19:01:13.634 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 55, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 477, in task lambda: build_ceph_cluster(ctx=ctx, config=dict( File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 251, in build_ceph_cluster node_dev_list = get_dev_for_osd(ctx, config) File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 119, in get_dev_for_osd assert num_devs_reqd <= len(devs), 'fewer data and journal disks than required ' + shortname AssertionError: fewer data and journal disks than required vpm047 2015-04-07T19:01:13.663 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/sepia/teuthology/search?q=73589e3458924aefb870c61fe0086a93 AssertionError: fewer data and journal disks than required vpm047 2015-04-07T19:01:13.663 DEBUG:teuthology.run_tasks:Unwinding manager ssh_keys 2015-04-07T19:01:13.663 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested yield vars File "/home/teuthworker/src/teuthology_master/teuthology/task/ssh_keys.py", line 179, in task yield AssertionError: fewer data and journal disks than required vpm047 2015-04-07T19:01:13.663 INFO:teuthology.task.ssh_keys:Cleaning up SSH keys 2015-04-07T19:01:13.663 INFO:teuthology.task.ssh_keys:cleaning up keys added for testing
Updated by Travis Rhoden about 9 years ago
Let's discuss this during the infra stand up today. The piece that I keep missing is "what" we are trying to test. What I mean by this is the following:
Does the test require a certain number of nodes to be useful?
A certain number of disks?
Do we need more than on OSD per node?
Do we require 3 copies? or Does 2 work?
The Hammer branch was running tests that used 3 nodes with 3 OSDs each. Because the test suite pulled in enable_diff_journal_disk.yaml, there were certain variations of the tests that required 6 block devices. That means the tests can't run on on VPS or mira nodes.
The most recent run that Yuri did was run off of wip_ceph_deploy, which reduced the node count to 2, with 2 OSDs each. However, when scheduled against VPS nodes, this will still fail:
2015-04-07T18:56:35.993 INFO:teuthology.orchestra.run.vpm046:Running: 'ls /dev/[sv]d?' 2015-04-07T18:56:36.101 WARNING:teuthology.misc:Removing root device: /dev/vda from device list 2015-04-07T18:56:37.350 DEBUG:teuthology.misc:devs=['/dev/vdb', '/dev/vdc', '/dev/vdd']
You can see that a VPS node has three disks available -- /dev/sd[bcd]. That means that if we are going to keep enable_diff_journal_disk.yaml enabled, you can have at most 1 OSD per node.
If the tests don't care how many OSD there are (this isn't a performance test), it seems to me like the best solution would be to change the roles to the following:
roles: - - mon.a - mds.0 - osd.0 - - osd.1 - mon.b - - mon.c - osd.2 - client.0
This most closely resembles the original test, but merely reduces the per-node OSD count from 3 to 1. It should then be able to run on mira, plana, burnupi, and vps nodes. We also don't have to change the osd pool default size.
Thoughts?
Updated by Yuri Weinstein about 9 years ago
Run off wip_ceph_deploy
- all passed but two
with 3 node, one osd/node configuration:
- - mon.a - mds.0 - osd.0 - - osd.1 - mon.b - - mon.c - osd.2 - client.0
http://pulpito.ceph.com/teuthology-2015-04-09_09:12:48-ceph-deploy:rados-hammer-distro-basic-vps/
failed two due to vps clock problem (unrelated)
Also see https://github.com/ceph/ceph-qa-suite/pull/400
Run on burnupi,mira
http://pulpito.ceph.com/teuthology-2015-04-09_14:04:38-ceph-deploy:rados-hammer-distro-basic-multi/
Updated by Dan Mick almost 9 years ago
- Assignee changed from Travis Rhoden to Yuri Weinstein
- Priority changed from Urgent to Normal
Updated by Yuri Weinstein almost 9 years ago
- Status changed from New to 7
Note to self: Is it still happening ?
Updated by Zack Cerza almost 9 years ago
Yuri Weinstein wrote:
Note to self: Is it still happening ?
From paddles:
>>> Job.query.filter(Job.failure_reason.contains("fewer data and journal disks than required")).order_by(Job.updated.desc())[0] <Job u'teuthology-2015-04-28_10:09:12-ceph-deploy:rados-hammer-distro-basic-vps' u'866886'>
Doesn't look like it.
Updated by Zack Cerza almost 9 years ago
- Status changed from 7 to Need More Info
Updated by Dan Mick almost 9 years ago
- Status changed from Need More Info to Closed
reopen if it occurs again.