Bug #5559
closedARM rbd command CommandFailedError in teuthology
0%
Description
rbd cannot connect to the cluster from a teuthology test.
This problem can be reproduced on tala002 and tala004 doing the following:
Step 1: Power cycle tala002 and tala004.
Step 2: Make sure that things are clean on these machines so that teuthology does not complain:
cd
sudo dpkg --configure -a
sudo rm -fr /var/lib/ceph
sudo rm -fr cephtest/*
sudo rm -fr /var/log/ceph
sudo mkdir /var/log/ceph
Step 3: run teuthology, using the following yaml file:
machine_type: tala interactive-on-error: true roles: - [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3,] - [client.0] tasks: - ceph: - rbd: all: - workunit: clients: all: - kernel_untar_build.sh targets: ubuntu@tala002.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD5mFt7raxufuhfx3dxQDJg5mzJ4N+n94rHC/pEqCFvXSp5Fly9cZZxdmn6N5vNUerXIt7/ui2AlVii/bSNjBJrXGYwi+IK+tRPpHb1e5OaS1FdNeHHIeIofeTmUVC7wzsit7sWCcN0I+FjlVqWjXs4qsjI56MbAMC+YVAepbhOUT/j8tFFLXgMN4xFKx10G4TqGWJqsMA1+WD4DLHWI8GrqccGTdokzaotSFHH3uMJIzXfTpCLts1n6yX2iogmK2ayFyD7TmMPRI9ZQ2E5yvkMsYrAOyyPp7h3RVGRRYWR47mmdrENfjuVKQcK30tBSO3tl13BXxWNl1+rfMOk9Cqz ubuntu@tala004.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC4XQUem3ze9TfBfsJ3pL8kPm+Y98TEJDQ76rOcdjMR4Rs8mte1Q1B93hT0CI8uRjFcv9uiKaOlweiqKXSx6N/20dsPQ2LN54FtXLB346vsxDmZH0RRzg7KfHja/AilEW3pN3nlLlYkCN/9yWuId3g1sN1L6Shylyc96OL2b++O5fZhZnzbbaHSvyngU73GY/sfRWWA6bB6suXRe/QMbHA/ge/+EvcjJ74nZynenujAchjcVmY6xzpXsXYtSSpYcdgkVh+7P1H0KkfWJwH8aRvsni7TE/6Zp8AtaROelCW1v5vMaLAUjjFtz2nVy2KSViktX3jIpwHDXoFd3eJumXxT
This produces the follow output (extraneous stuff removed):
2013-07-09T12:17:05.821 DEBUG:teuthology.misc:Ceph health: HEALTH_OK 2013-07-09T12:17:05.822 INFO:teuthology.run_tasks:Running task rbd... 2013-07-09T12:17:05.823 DEBUG:teuthology.task.rbd:rbd config is: {'client.0': None} 2013-07-09T12:17:05.823 DEBUG:teuthology.misc:basedir: /home/ubuntu/cephtest 2013-07-09T12:17:05.823 INFO:teuthology.task.rbd:Creating image testimage.client.0 with size 10240 2013-07-09T12:17:05.824 DEBUG:teuthology.orchestra.run:Running [10.214.143.4]: '/home/ubuntu/cephtest/wu1307091216/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/wu1307091216/archive/coverage rbd -p rbd create --size 10240 testimage.client.0' 2013-07-09T12:17:06.244 INFO:teuthology.orchestra.run.err:rbd: couldn't connect to the cluster! 2013-07-09T12:17:06.245 INFO:teuthology.orchestra.run.err:2013-07-09 12:17:06.066417 b6f6a2a0 0 monclient(hunting): authenticate timed out after 1.49351e-154 2013-07-09T12:17:06.245 INFO:teuthology.orchestra.run.err:2013-07-09 12:17:06.066567 b6f6a2a0 0 librados: client.admin authentication error (110) Connection timed out 2013-07-09T12:17:07.645 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/wusui/src/teuthology/teuthology/contextutil.py", line 25, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/src/teuthology/teuthology/task/rbd.py", line 70, in create_image remote.run(args=args) File "/home/wusui/src/teuthology/teuthology/orchestra/remote.py", line 43, in run r = self._runner(client=self.ssh, **kwargs) File "/home/wusui/src/teuthology/teuthology/orchestra/run.py", line 266, in run r.exitstatus = _check_status(r.exitstatus) File "/home/wusui/src/teuthology/teuthology/orchestra/run.py", line 262, in _check_status raise CommandFailedError(command=r.command, exitstatus=status, node=host) CommandFailedError: Command failed on 10.214.143.4 with status 1: '/home/ubuntu/cephtest/wu1307091216/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/wu1307091216/archive/coverage rbd -p rbd create --size 10240 testimage.client.0' 2013-07-09T12:17:07.648 ERROR:teuthology.run_tasks:Saw exception from tasks Traceback (most recent call last): File "/home/wusui/src/teuthology/teuthology/run_tasks.py", line 27, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/src/teuthology/teuthology/task/rbd.py", line 605, in task lambda: mount(ctx=ctx, config=role_images), File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/src/teuthology/teuthology/contextutil.py", line 25, in nested vars.append(enter()) File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/wusui/src/teuthology/teuthology/task/rbd.py", line 70, in create_image remote.run(args=args) File "/home/wusui/src/teuthology/teuthology/orchestra/remote.py", line 43, in run r = self._runner(client=self.ssh, **kwargs) File "/home/wusui/src/teuthology/teuthology/orchestra/run.py", line 266, in run r.exitstatus = _check_status(r.exitstatus) File "/home/wusui/src/teuthology/teuthology/orchestra/run.py", line 262, in _check_status raise CommandFailedError(command=r.command, exitstatus=status, node=host) CommandFailedError: Command failed on 10.214.143.4 with status 1: '/home/ubuntu/cephtest/wu1307091216/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/wu1307091216/archive/coverage rbd -p rbd create --size 10240 testimage.client.0' 2013-07-09T12:17:07.652 WARNING:teuthology.run_tasks:Saw failure, going into interactive mode... 2013-07-09T12:27:18.494 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x298d250> 2013-07-09T12:27:18.494 ERROR:teuthology.contextutil:Saw exception from nested tasks
Note that if the rbd command is run manually, it seems to work.
Updated by Anonymous almost 11 years ago
- Assignee set to Josh Durgin
Note that this could be related to other problems with the ARM kernel. The
following was output on the console while this test was run.
[ 168.139917] huh, entered softirq 4 BLOCK c0262c3c preempt_count 00000100, exited with 00000000? [ 445.430452] huh, entered softirq 4 BLOCK c0262c3c preempt_count 00000100, exited with 00000000?
Updated by Anonymous almost 11 years ago
This could very well be one of the Kernel problems that we already detected. Fresh reinstallation attempts cause that BUG: scheduling while atomic error
to occur on the same rbd call.
Updated by Anonymous almost 11 years ago
- Assignee changed from Josh Durgin to Anonymous
The problems shown here were greatly exacerbated by some problems in my yaml files while running indvidual test. I have corrected those issues and now can execute many of the rbd kernel tests.
The problems still occur, but a lot less frequently than before. The kernels that we build are based off of 3.10 versions on gitbuilder. Rossen said that they recommend using the 3.5.0-1000-highbank kernel which appears to be several releases earlier than ours. If we are to build on this version, we need to have a new gitbuilder directory for this file.
Updated by Anonymous almost 11 years ago
- Status changed from New to In Progress
Behavior is inconsistent between builds.
Updated by Anonymous about 10 years ago
- Status changed from In Progress to Won't Fix