Bug #14849
closed(infernalis) [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack
0%
Description
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-02-23_02:00:02-rados-infernalis-distro-basic-openstack/
Job: 3568
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-02-23_02:00:02-rados-infernalis-distro-basic-openstack/3568/teuthology.log
2016-02-23T03:56:22.058 INFO:teuthology.orchestra.run.target091202:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config set filestore_inject_stall 3' 2016-02-23T03:56:25.518 INFO:tasks.ceph.mon.c.target091202.stderr:2016-02-23 03:56:25.472230 7fa7c6fe0700 -1 mon.c@2(peon).paxos(paxos updating c 1..708) lease_expire from mon.0 158.69.91.202:6789/0 is 2.512609 seconds in the past; mons are probably laggy (or possibly clocks are too skewed) 2016-02-23T03:56:26.735 INFO:tasks.workunit.client.0.target091202.stdout:test/librados/aio.cc:2528: Failure 2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:Value of: test_data.init() 2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout: Actual: "create_one_ec_pool(test-rados-api-target091202.ovh.sepia.ceph.com-13446-58) failed: error mon_command osd pool create pool:test-rados-api-target091202.ovh.sepia.ceph.com-13446-58 pool_type:erasure failed with error -22" 2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:Expected: "" 2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:[ FAILED ] LibRadosAioEC.ReturnValuePP (10841 ms)
Updated by Sage Weil about 8 years ago
crushtool test timeout. took 7s, timeout is 5s.
Updated by Sage Weil about 8 years ago
- Subject changed from [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack to (infernalis) [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack
- Status changed from New to Fix Under Review
the timeout comes from mon_lease.
Updated by Sage Weil about 8 years ago
- Has duplicate Bug #14935: "[ FAILED ] ClsHello.Filter" in upgrade:infernalis-x-jewel-distro-basic-vps added
Updated by Sage Weil about 8 years ago
- Has duplicate Bug #14927: "[ FAILED ] ClsLock.Test*" (4 tests) failed in upgrade:hammer-x-jewel-distro-basic-openstack added
Updated by Dan Mick about 8 years ago
Additional logging added in prepare_new_pool shows:
prepare_new_pool failed with -22: error running crushmap through crushtool: (125) Operation canceled
which is consistent with the analysis above.
Updated by Dan Mick about 8 years ago
Sage, what does that PR have to do with this bug? Is the spare rule somehow making crushtool time out, or something? In massive need of dot-connection here
Updated by Loïc Dachary about 8 years ago
That happens because validating the crushmap is blocking the mon. If the validation takes more than mon_lease ... the mon will appear to be unresponsive and bad things start happening.
Updated by Loïc Dachary about 8 years ago
http://tracker.ceph.com/issues/15040 is the backport of http://tracker.ceph.com/issues/13878 which explains the issue
Updated by Loïc Dachary about 8 years ago
- Status changed from Fix Under Review to Duplicate
Updated by Loïc Dachary about 8 years ago
- Is duplicate of Backport #15040: infernalis: test/librados/tier.cc doesn't completely clean up EC pools added
Updated by Dan Mick about 8 years ago
mon calls crushtool with a timeout of 5s (mon_lease), and kills it if it takes longer. Because of accumulated EC-pool crush rules due to #13878, crushtool takes longer than it normally would to run; long enough that it violates the 5s timeout, and so the mon kills it; ultimately this shows up as the pool create failure.