Bug #11297: "AssertionError: fewer data and journal disks than required" in ceph-deploy-hammer-distro-basic-multi run - teuthology - Ceph

Custom queries

Bug queue
Bug triage
Crash queue
Crash triage
Feedback
My issues
Need Review
Pending backports
Product Backlog Scrub

Actions

Copy link

Bug #11297

closed

"AssertionError: fewer data and journal disks than required" in ceph-deploy-hammer-distro-basic-multi run

Added by Yuri Weinstein about 9 years ago. Updated almost 9 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Yuri Weinstein

Category:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

Run: http://pulpito.front.sepia.ceph.com/teuthology-2015-03-31_15:00:38-ceph-deploy-hammer-distro-basic-multi/
Jobs - many, 830669 for example
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-31_15:00:38-ceph-deploy-hammer-distro-basic-multi/830669/

2015-03-31T16:44:11.468 INFO:teuthology.orchestra.run.plana92:Running: 'rmdir -- /home/ubuntu/cephtest'
2015-03-31T16:44:11.471 INFO:teuthology.orchestra.run.plana95:Running: 'rmdir -- /home/ubuntu/cephtest'
2015-03-31T16:44:11.477 DEBUG:teuthology.run_tasks:Unwinding manager internal.lock_machines
2015-03-31T16:44:11.477 DEBUG:teuthology.run_tasks:Exception was not quenched, exiting: AssertionError: fewer data and journal disks than required plana17
2015-03-31T16:44:11.478 INFO:teuthology.nuke:Checking targets against current locks
2015-03-31T16:44:11.616 DEBUG:teuthology.nuke:shortname: plana95

History
Notes
Property changes

Actions

Copy link

Updated by Zack Cerza about 9 years ago

Assignee set to Travis Rhoden

Tests that need >4 disks need to be scheduled on mira or burnupi

Actions

Copy link

Updated by Travis Rhoden about 9 years ago

Assignee changed from Travis Rhoden to Yuri Weinstein

Since this test suite has roles with 3 OSDs, and matrixes in an override that runs tests with separate block devices for OSD and journal, this test suite needs nodes with at least 6 block devices.

So, as zack pointed out, these tests need to be run on mira or burnupi.

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

see corresponding PR https://github.com/ceph/ceph-qa-suite/pull/389

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

Assignee changed from Yuri Weinstein to Travis Rhoden

See on VPS run

Run: http://pulpito.ceph.com/teuthology-2015-04-07_18:15:26-ceph-deploy:rados-hammer-distro-basic-vps/
Jobs: many, for example 840644
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-04-07_18:15:26-ceph-deploy:rados-hammer-distro-basic-vps/840644/

2015-04-07T19:01:13.445 INFO:teuthology.orchestra.run.vpm165.stderr:[vpm047][DEBUG ] detect machine type
2015-04-07T19:01:13.449 INFO:teuthology.orchestra.run.vpm165.stderr:[ceph_deploy.install][INFO  ] Distro info: Ubuntu 12.04 precise
2015-04-07T19:01:13.449 INFO:teuthology.orchestra.run.vpm165.stderr:[vpm047][INFO  ] purging data on vpm047
2015-04-07T19:01:13.451 INFO:teuthology.orchestra.run.vpm165.stderr:[vpm047][INFO  ] Running command: sudo rm -rf --one-file-system -- /var/lib/ceph
2015-04-07T19:01:13.463 INFO:teuthology.orchestra.run.vpm165.stderr:[vpm047][INFO  ] Running command: sudo rm -rf --one-file-system -- /etc/ceph/
2015-04-07T19:01:13.492 INFO:teuthology.orchestra.run.vpm165.stderr:Unhandled exception in thread started by
2015-04-07T19:01:13.495 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 251, in build_ceph_cluster
    node_dev_list = get_dev_for_osd(ctx, config)
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 119, in get_dev_for_osd
    assert num_devs_reqd <= len(devs), 'fewer data and journal disks than required ' + shortname
AssertionError: fewer data and journal disks than required vpm047
2015-04-07T19:01:13.497 INFO:tasks.ceph_deploy:Removing ceph-deploy ...
2015-04-07T19:01:13.497 INFO:teuthology.orchestra.run.vpm165:Running: 'rm -rf /home/ubuntu/cephtest/ceph-deploy'
2015-04-07T19:01:13.584 INFO:teuthology.task.install:Removing shipped files: /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits...
2015-04-07T19:01:13.584 INFO:teuthology.orchestra.run.vpm047:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits'
2015-04-07T19:01:13.588 INFO:teuthology.orchestra.run.vpm165:Running: 'sudo rm -f -- /home/ubuntu/cephtest/valgrind.supp /usr/bin/daemon-helper /usr/bin/adjust-ulimits'
2015-04-07T19:01:13.634 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 55, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 477, in task
    lambda: build_ceph_cluster(ctx=ctx, config=dict(
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 28, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 251, in build_ceph_cluster
    node_dev_list = get_dev_for_osd(ctx, config)
  File "/var/lib/teuthworker/src/ceph-qa-suite_wip_ceph_deploy/tasks/ceph_deploy.py", line 119, in get_dev_for_osd
    assert num_devs_reqd <= len(devs), 'fewer data and journal disks than required ' + shortname
AssertionError: fewer data and journal disks than required vpm047
2015-04-07T19:01:13.663 ERROR:teuthology.run_tasks: Sentry event: http://sentry.ceph.com/sepia/teuthology/search?q=73589e3458924aefb870c61fe0086a93
AssertionError: fewer data and journal disks than required vpm047
2015-04-07T19:01:13.663 DEBUG:teuthology.run_tasks:Unwinding manager ssh_keys
2015-04-07T19:01:13.663 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 30, in nested
    yield vars
  File "/home/teuthworker/src/teuthology_master/teuthology/task/ssh_keys.py", line 179, in task
    yield
AssertionError: fewer data and journal disks than required vpm047
2015-04-07T19:01:13.663 INFO:teuthology.task.ssh_keys:Cleaning up SSH keys
2015-04-07T19:01:13.663 INFO:teuthology.task.ssh_keys:cleaning up keys added for testing

Actions

Copy link

Updated by Travis Rhoden about 9 years ago

Let's discuss this during the infra stand up today. The piece that I keep missing is "what" we are trying to test. What I mean by this is the following:

Does the test require a certain number of nodes to be useful?
A certain number of disks?
Do we need more than on OSD per node?
Do we require 3 copies? or Does 2 work?

The Hammer branch was running tests that used 3 nodes with 3 OSDs each. Because the test suite pulled in enable_diff_journal_disk.yaml, there were certain variations of the tests that required 6 block devices. That means the tests can't run on on VPS or mira nodes.

The most recent run that Yuri did was run off of wip_ceph_deploy, which reduced the node count to 2, with 2 OSDs each. However, when scheduled against VPS nodes, this will still fail:

2015-04-07T18:56:35.993 INFO:teuthology.orchestra.run.vpm046:Running: 'ls /dev/[sv]d?'
2015-04-07T18:56:36.101 WARNING:teuthology.misc:Removing root device: /dev/vda from device list
2015-04-07T18:56:37.350 DEBUG:teuthology.misc:devs=['/dev/vdb', '/dev/vdc', '/dev/vdd']

You can see that a VPS node has three disks available -- /dev/sd[bcd]. That means that if we are going to keep enable_diff_journal_disk.yaml enabled, you can have at most 1 OSD per node.

If the tests don't care how many OSD there are (this isn't a performance test), it seems to me like the best solution would be to change the roles to the following:

roles:
- - mon.a
  - mds.0
  - osd.0
- - osd.1
  - mon.b
- - mon.c
  - osd.2
- client.0

This most closely resembles the original test, but merely reduces the per-node OSD count from 3 to 1. It should then be able to run on mira, plana, burnupi, and vps nodes. We also don't have to change the osd pool default size.

Thoughts?

Actions

Copy link

Updated by Zack Cerza about 9 years ago

Possibly useful: #10767

Actions

Copy link

Updated by Yuri Weinstein about 9 years ago

Run off wip_ceph_deploy - all passed but two
with 3 node, one osd/node configuration:

- - mon.a
  - mds.0
  - osd.0
- - osd.1
  - mon.b
- - mon.c
  - osd.2
  - client.0

http://pulpito.ceph.com/teuthology-2015-04-09_09:12:48-ceph-deploy:rados-hammer-distro-basic-vps/

failed two due to vps clock problem (unrelated)

Also see https://github.com/ceph/ceph-qa-suite/pull/400

Run on burnupi,mira
http://pulpito.ceph.com/teuthology-2015-04-09_14:04:38-ceph-deploy:rados-hammer-distro-basic-multi/

Actions

Copy link

Updated by Dan Mick almost 9 years ago

Regression set to No

Travis, what's up with this?

Actions

Copy link

Updated by Dan Mick almost 9 years ago

Assignee changed from Travis Rhoden to Yuri Weinstein
Priority changed from Urgent to Normal

Actions

Copy link

#10

Updated by Yuri Weinstein almost 9 years ago

Status changed from New to 7

Note to self: Is it still happening ?

Actions

Copy link

#11

Updated by Zack Cerza almost 9 years ago

Yuri Weinstein wrote:

Note to self: Is it still happening ?

From paddles:

>>> Job.query.filter(Job.failure_reason.contains("fewer data and journal disks than required")).order_by(Job.updated.desc())[0]
<Job u'teuthology-2015-04-28_10:09:12-ceph-deploy:rados-hammer-distro-basic-vps' u'866886'>

Doesn't look like it.

Actions

Copy link

#12

Updated by Zack Cerza almost 9 years ago

Status changed from 7 to Need More Info

Actions

Copy link

#13

Updated by Dan Mick almost 9 years ago

Status changed from Need More Info to Closed

reopen if it occurs again.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Tools » teuthology

Custom queries

Bug #11297

"AssertionError: fewer data and journal disks than required" in ceph-deploy-hammer-distro-basic-multi run

Updated by Zack Cerza about 9 years ago

Updated by Travis Rhoden about 9 years ago

Updated by Yuri Weinstein about 9 years ago

Updated by Yuri Weinstein about 9 years ago

Updated by Travis Rhoden about 9 years ago

Updated by Zack Cerza about 9 years ago

Updated by Yuri Weinstein about 9 years ago

Updated by Dan Mick almost 9 years ago

Updated by Dan Mick almost 9 years ago

Updated by Yuri Weinstein almost 9 years ago

Updated by Zack Cerza almost 9 years ago

Updated by Zack Cerza almost 9 years ago

Updated by Dan Mick almost 9 years ago