Bug #5559: ARM rbd command CommandFailedError in teuthology - teuthology - Ceph

Actions

Copy link

Bug #5559

closed

ARM rbd command CommandFailedError in teuthology

Added by Anonymous almost 11 years ago. Updated about 10 years ago.

Status:

Won't Fix

Priority:

Normal

Assignee:

Category:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

rbd cannot connect to the cluster from a teuthology test.

This problem can be reproduced on tala002 and tala004 doing the following:

Step 1: Power cycle tala002 and tala004.

Step 2: Make sure that things are clean on these machines so that teuthology does not complain:
cd
sudo dpkg --configure -a
sudo rm -fr /var/lib/ceph
sudo rm -fr cephtest/*
sudo rm -fr /var/log/ceph
sudo mkdir /var/log/ceph

Step 3: run teuthology, using the following yaml file:

machine_type: tala
interactive-on-error: true
roles:
- [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3,]
- [client.0]
tasks:
- ceph:
- rbd:
    all:
- workunit:
    clients:
      all:
        - kernel_untar_build.sh
targets:
  ubuntu@tala002.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD5mFt7raxufuhfx3dxQDJg5mzJ4N+n94rHC/pEqCFvXSp5Fly9cZZxdmn6N5vNUerXIt7/ui2AlVii/bSNjBJrXGYwi+IK+tRPpHb1e5OaS1FdNeHHIeIofeTmUVC7wzsit7sWCcN0I+FjlVqWjXs4qsjI56MbAMC+YVAepbhOUT/j8tFFLXgMN4xFKx10G4TqGWJqsMA1+WD4DLHWI8GrqccGTdokzaotSFHH3uMJIzXfTpCLts1n6yX2iogmK2ayFyD7TmMPRI9ZQ2E5yvkMsYrAOyyPp7h3RVGRRYWR47mmdrENfjuVKQcK30tBSO3tl13BXxWNl1+rfMOk9Cqz
  ubuntu@tala004.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4XQUem3ze9TfBfsJ3pL8kPm+Y98TEJDQ76rOcdjMR4Rs8mte1Q1B93hT0CI8uRjFcv9uiKaOlweiqKXSx6N/20dsPQ2LN54FtXLB346vsxDmZH0RRzg7KfHja/AilEW3pN3nlLlYkCN/9yWuId3g1sN1L6Shylyc96OL2b++O5fZhZnzbbaHSvyngU73GY/sfRWWA6bB6suXRe/QMbHA/ge/+EvcjJ74nZynenujAchjcVmY6xzpXsXYtSSpYcdgkVh+7P1H0KkfWJwH8aRvsni7TE/6Zp8AtaROelCW1v5vMaLAUjjFtz2nVy2KSViktX3jIpwHDXoFd3eJumXxT

This produces the follow output (extraneous stuff removed):

2013-07-09T12:17:05.821 DEBUG:teuthology.misc:Ceph health: HEALTH_OK
2013-07-09T12:17:05.822 INFO:teuthology.run_tasks:Running task rbd...
2013-07-09T12:17:05.823 DEBUG:teuthology.task.rbd:rbd config is: {'client.0': None}
2013-07-09T12:17:05.823 DEBUG:teuthology.misc:basedir: /home/ubuntu/cephtest
2013-07-09T12:17:05.823 INFO:teuthology.task.rbd:Creating image testimage.client.0 with size 10240
2013-07-09T12:17:05.824 DEBUG:teuthology.orchestra.run:Running [10.214.143.4]: '/home/ubuntu/cephtest/wu1307091216/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/wu1307091216/archive/coverage rbd -p rbd create --size 10240 testimage.client.0'
2013-07-09T12:17:06.244 INFO:teuthology.orchestra.run.err:rbd: couldn't connect to the cluster!
2013-07-09T12:17:06.245 INFO:teuthology.orchestra.run.err:2013-07-09 12:17:06.066417 b6f6a2a0  0 monclient(hunting): authenticate timed out after 1.49351e-154
2013-07-09T12:17:06.245 INFO:teuthology.orchestra.run.err:2013-07-09 12:17:06.066567 b6f6a2a0  0 librados: client.admin authentication error (110) Connection timed out
2013-07-09T12:17:07.645 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/wusui/src/teuthology/teuthology/contextutil.py", line 25, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/src/teuthology/teuthology/task/rbd.py", line 70, in create_image
    remote.run(args=args)
  File "/home/wusui/src/teuthology/teuthology/orchestra/remote.py", line 43, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/wusui/src/teuthology/teuthology/orchestra/run.py", line 266, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/wusui/src/teuthology/teuthology/orchestra/run.py", line 262, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.143.4 with status 1: '/home/ubuntu/cephtest/wu1307091216/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/wu1307091216/archive/coverage rbd -p rbd create --size 10240 testimage.client.0'
2013-07-09T12:17:07.648 ERROR:teuthology.run_tasks:Saw exception from tasks
Traceback (most recent call last):
  File "/home/wusui/src/teuthology/teuthology/run_tasks.py", line 27, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/src/teuthology/teuthology/task/rbd.py", line 605, in task
    lambda: mount(ctx=ctx, config=role_images),
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/src/teuthology/teuthology/contextutil.py", line 25, in nested
    vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/wusui/src/teuthology/teuthology/task/rbd.py", line 70, in create_image
    remote.run(args=args)
  File "/home/wusui/src/teuthology/teuthology/orchestra/remote.py", line 43, in run
    r = self._runner(client=self.ssh, **kwargs)
  File "/home/wusui/src/teuthology/teuthology/orchestra/run.py", line 266, in run
    r.exitstatus = _check_status(r.exitstatus)
  File "/home/wusui/src/teuthology/teuthology/orchestra/run.py", line 262, in _check_status
    raise CommandFailedError(command=r.command, exitstatus=status, node=host)
CommandFailedError: Command failed on 10.214.143.4 with status 1: '/home/ubuntu/cephtest/wu1307091216/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/wu1307091216/archive/coverage rbd -p rbd create --size 10240 testimage.client.0'
2013-07-09T12:17:07.652 WARNING:teuthology.run_tasks:Saw failure, going into interactive mode...
2013-07-09T12:27:18.494 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x298d250>
2013-07-09T12:27:18.494 ERROR:teuthology.contextutil:Saw exception from nested tasks

Note that if the rbd command is run manually, it seems to work.

Actions

Copy link

Updated by Anonymous almost 11 years ago

Assignee set to Josh Durgin

Note that this could be related to other problems with the ARM kernel. The
following was output on the console while this test was run.

[  168.139917] huh, entered softirq 4 BLOCK c0262c3c preempt_count 00000100, exited with 00000000?
[  445.430452] huh, entered softirq 4 BLOCK c0262c3c preempt_count 00000100, exited with 00000000?

Actions

Copy link

Updated by Anonymous almost 11 years ago

This could very well be one of the Kernel problems that we already detected. Fresh reinstallation attempts cause that BUG: scheduling while atomic error
to occur on the same rbd call.

Actions

Copy link

Updated by Anonymous almost 11 years ago

Assignee changed from Josh Durgin to Anonymous

The problems shown here were greatly exacerbated by some problems in my yaml files while running indvidual test. I have corrected those issues and now can execute many of the rbd kernel tests.

The problems still occur, but a lot less frequently than before. The kernels that we build are based off of 3.10 versions on gitbuilder. Rossen said that they recommend using the 3.5.0-1000-highbank kernel which appears to be several releases earlier than ours. If we are to build on this version, we need to have a new gitbuilder directory for this file.

Actions

Copy link

Updated by Anonymous almost 11 years ago

Status changed from New to In Progress

Behavior is inconsistent between builds.

Actions

Copy link

Updated by Anonymous about 10 years ago

Status changed from In Progress to Won't Fix

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Tools » teuthology

Custom queries

Bug #5559

ARM rbd command CommandFailedError in teuthology

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous almost 11 years ago

Updated by Anonymous about 10 years ago