Project

General

Profile

Actions

Bug #13130

closed

qa:failed pool create in mkpool_layout_vxattrs.sh

Added by Greg Farnum over 8 years ago. Updated over 8 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/teuthology-2015-09-12_23:04:01-fs-infernalis---basic-multi/1054419

This looks a lot like #12970, but it's running commit cac601d63103f67d842559c5b76202c4dec1e6f6, which contains the CrushTool fix commit 0f82f461b33d93d868e185912a2c7e4074d06900.

2015-09-14T19:13:33.870 INFO:teuthology.orchestra.run.burnupi54:Running (workunit test fs/misc/mkpool_layout_vxattrs.sh): 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=cac601d63103f67d842559c5b76202c4dec1e6f6 TESTDIR=
"/home/ubuntu/cephtest" CEPH_ID="0" PATH=$PATH:/usr/sbin adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/workunit.client.0/fs/misc/mkpool_layout_vxattrs.sh'
2015-09-14T19:13:34.333 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.334232 7f5554ab7840  1 -- :/0 messenger.start
2015-09-14T19:13:34.333 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.334990 7f5554ab7840  1 -- :/1054610 --> 10.214.135.104:6790/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f5555ec4e80 con 0x7f5555ec7a40
2015-09-14T19:13:34.387 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.388736 7f5554ab3700  1 -- 10.214.136.22:0/1054610 learned my addr 10.214.136.22:0/1054610
2015-09-14T19:13:34.769 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.770324 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 1 ==== mon_map magic: 0 v1 ==== 473+0+0 (2554730853 0 0) 0x7f5534000c20 con 0x7f5555ec7a40
2015-09-14T19:13:34.885 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.885990 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (1727625110 0 0) 0x7f5534000fe0 con 0x7f5555ec7a40
2015-09-14T19:13:34.885 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.886158 7f5548f92700  1 -- 10.214.136.22:0/1054610 --> 10.214.135.104:6790/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7f55280017a0 con 0x7f5555ec7a40
2015-09-14T19:13:34.885 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.886944 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (2133621294 0 0) 0x7f5534000b10 con 0x7f5555ec7a40
2015-09-14T19:13:34.886 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.887065 7f5548f92700  1 -- 10.214.136.22:0/1054610 --> 10.214.135.104:6790/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7f55280069f0 con 0x7f5555ec7a40
2015-09-14T19:13:34.887 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.887805 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 393+0+0 (2016495843 0 0) 0x7f5534001090 con 0x7f5555ec7a40
2015-09-14T19:13:34.887 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.887896 7f5548f92700  1 -- 10.214.136.22:0/1054610 --> 10.214.135.104:6790/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f5555ef89f0 con 0x7f5555ec7a40
2015-09-14T19:13:34.888 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.887973 7f5554ab7840  1 -- 10.214.136.22:0/1054610 --> 10.214.135.104:6790/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7f5555ebdc30 con 0x7f5555ec7a40
2015-09-14T19:13:34.888 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.888044 7f5554ab7840  1 -- 10.214.136.22:0/1054610 --> 10.214.135.104:6790/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7f5555ec4e80 con 0x7f5555ec7a40
2015-09-14T19:13:34.888 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.888576 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 5 ==== mon_map magic: 0 v1 ==== 473+0+0 (2554730853 0 0) 0x7f5534001280 con 0x7f5555ec7a40
2015-09-14T19:13:34.888 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.888624 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3267281642 0 0) 0x7f5534001680 con 0x7f5555ec7a40
2015-09-14T19:13:34.982 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.983353 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 7 ==== osd_map(7..7 src has 1..7) v3 ==== 3995+0+0 (1799981947 0 0) 0x7f5534002040 con 0x7f5555ec7a40
2015-09-14T19:13:34.983 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.983505 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 8 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3267281642 0 0) 0x7f5534002470 con 0x7f5555ec7a40
2015-09-14T19:13:34.983 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.983707 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 9 ==== osd_map(7..7 src has 1..7) v3 ==== 3995+0+0 (1799981947 0 0) 0x7f5534002040 con 0x7f5555ec7a40
2015-09-14T19:13:34.983 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.983739 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 10 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3267281642 0 0) 0x7f55340024f0 con 0x7f5555ec7a40
2015-09-14T19:13:34.992 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:34.984941 7f5554ab7840  1 -- 10.214.136.22:0/1054610 --> 10.214.135.104:6790/0 -- pool_op(create pool 0 auid 0 tid 1 name foo.54608 v0) v4 -- ?+0 0x7f5555ec55a0 con 0x7f5555ec7a40
2015-09-14T19:13:40.689 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:40.690396 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 11 ==== pool_op_reply(tid 1 (22) Invalid argument v8) v1 ==== 43+0+0 (515685353 0 0) 0x7f5534001090 con 0x7f5555ec7a40
2015-09-14T19:13:40.689 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:40.690448 7f5548f92700  1 -- 10.214.136.22:0/1054610 --> 10.214.135.104:6790/0 -- mon_subscribe({monmap=2+,osdmap=8}) v2 -- ?+0 0x7f552800abd0 con 0x7f5555ec7a40
2015-09-14T19:13:40.689 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:40.691166 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 12 ==== osd_map(8..8 src has 1..8) v3 ==== 214+0+0 (3078272234 0 0) 0x7f55340013c0 con 0x7f5555ec7a40
2015-09-14T19:13:40.692 INFO:tasks.workunit.client.0.burnupi54.stderr:error creating pool foo.54608: (22) Invalid argument
2015-09-14T19:13:40.692 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:40.693243 7f5548f92700  1 -- 10.214.136.22:0/1054610 <== mon.2 10.214.135.104:6790/0 13 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (3267281642 0 0) 0x7f55340024f0 con 0x7f5555ec7a40
2015-09-14T19:13:40.692 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:40.693614 7f5554ab7840  1 -- 10.214.136.22:0/1054610 mark_down 0x7f5555ec7a40 -- 0x7f5555efd590
2015-09-14T19:13:40.692 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:40.693642 7f5554ab7840  1 -- 10.214.136.22:0/1054610 mark_down_all
2015-09-14T19:13:40.693 INFO:tasks.workunit.client.0.burnupi54.stderr:2015-09-14 19:13:40.693927 7f5554ab7840  1 -- 10.214.136.22:0/1054610 shutdown complete.
2015-09-14T19:13:40.702 INFO:tasks.workunit:Stopping ['fs/misc'] on client.0...
2015-09-14T19:13:40.702 INFO:teuthology.orchestra.run.burnupi54:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/workunit.client.0'
2015-09-14T19:13:40.754 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/var/lib/teuthworker/src/ceph-qa-suite_infernalis/tasks/workunit.py", line 361, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 156, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 378, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 114, in wait
    label=self.label)
CommandFailedError: Command failed (workunit test fs/misc/mkpool_layout_vxattrs.sh) on burnupi54 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=cac601d63103f67d842559c5b76202c4dec1e6f6 TESTDIR="/home/ubuntu/cephtest" CEPH_ID="0" PATH=$PATH:/usr/sbin adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/workunit.client.0/fs/misc/mkpool_layout_vxattrs.sh'
Actions #1

Updated by Zheng Yan over 8 years ago

  • Status changed from New to 12

I reproduce this bug locally. the failure is because that monitors do not know correct path for crushtool

Actions #2

Updated by Greg Farnum over 8 years ago

  • Assignee set to Sage Weil
Actions #3

Updated by Sage Weil over 8 years ago

  • Project changed from CephFS to Ceph
  • Priority changed from Normal to Urgent
Actions #4

Updated by Sage Weil over 8 years ago

  • Status changed from 12 to Resolved
Actions #5

Updated by Greg Farnum over 8 years ago

  • Status changed from Resolved to 12

We hit this again in http://qa-proxy.ceph.com/teuthology/teuthology-2015-10-06_23:04:02-fs-infernalis---basic-multi/1093441/

It is running commit 7e8432a25950ba119825592d00e6298fc4e5433a, which includes 524b0bdcc45c2f4b95f2239c988e93250f337f3d.

2015-10-09 05:52:10.965746 7efcd7145700 10 mon.b@0(leader).paxosservice(osdmap 1..6) dispatch 0x7efce5458b40 pool_op(create pool 0 auid 0 tid 1 name foo.26926 v0) v4 from client.4132 10.214.134.22:0/2278156571 con 0x7efce63d3700
2015-10-09 05:52:10.965752 7efcd7145700  5 mon.b@0(leader).paxos(paxos active c 1..607) is_readable = 1 - now=2015-10-09 05:52:10.965753 lease_expire=2015-10-09 05:52:15.862392 has v0 lc 607
2015-10-09 05:52:11.062911 7efcd7145700 10 mon.b@0(leader).osd e6 preprocess_query pool_op(create pool 0 auid 0 tid 1 name foo.26926 v0) v4 from client.4132 10.214.134.22:0/2278156571
2015-10-09 05:52:11.092512 7efcd7145700 20 is_capable service=osd command= write on cap allow *
2015-10-09 05:52:11.092515 7efcd7145700 20  allow so far , doing grant allow *
2015-10-09 05:52:11.092516 7efcd7145700 20  allow all
2015-10-09 05:52:11.092526 7efcd7145700  7 mon.b@0(leader).osd e6 prepare_update pool_op(create pool 0 auid 0 tid 1 name foo.26926 v0) v4 from client.4132 10.214.134.22:0/2278156571
2015-10-09 05:52:11.095409 7efcd7145700 10 mon.b@0(leader).osd e6 prepare_pool_op pool_op(create pool 0 auid 0 tid 1 name foo.26926 v0) v4
2015-10-09 05:52:11.095424 7efcd7145700 10 mon.b@0(leader).osd e6 prepare_new_pool from 0x7efce63d3700
2015-10-09 05:52:16.101399 7efcd7145700 10 mon.b@0(leader).osd e6  tester.test_with_crushtool returns -22
...
2015-10-09 05:52:17.087841 7efcd8948700 15 mon.b@0(leader) e1 send_reply routing reply to 10.214.134.22:0/2278156571 via 10.214.134.22:6789/0 for request pool_op(create pool 0 auid 0 tid 1 name foo.26926 v0) v4
Actions #7

Updated by Sage Weil over 8 years ago

stumped. https://github.com/ceph/ceph/pull/6230 will help debug.

Actions #8

Updated by Sage Weil over 8 years ago

  • Status changed from 12 to Can't reproduce
Actions

Also available in: Atom PDF