Project

General

Profile

Actions

Bug #14849

closed

(infernalis) [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack

Added by Yuri Weinstein about 8 years ago. Updated about 8 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-02-23_02:00:02-rados-infernalis-distro-basic-openstack/
Job: 3568
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-02-23_02:00:02-rados-infernalis-distro-basic-openstack/3568/teuthology.log

2016-02-23T03:56:22.058 INFO:teuthology.orchestra.run.target091202:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config set filestore_inject_stall 3'
2016-02-23T03:56:25.518 INFO:tasks.ceph.mon.c.target091202.stderr:2016-02-23 03:56:25.472230 7fa7c6fe0700 -1 mon.c@2(peon).paxos(paxos updating c 1..708) lease_expire from mon.0 158.69.91.202:6789/0 is 2.512609 seconds in the past; mons are probably laggy (or possibly clocks are too skewed)
2016-02-23T03:56:26.735 INFO:tasks.workunit.client.0.target091202.stdout:test/librados/aio.cc:2528: Failure
2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:Value of: test_data.init()
2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:  Actual: "create_one_ec_pool(test-rados-api-target091202.ovh.sepia.ceph.com-13446-58) failed: error mon_command osd pool create pool:test-rados-api-target091202.ovh.sepia.ceph.com-13446-58 pool_type:erasure failed with error -22" 
2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:Expected: "" 
2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:[  FAILED  ] LibRadosAioEC.ReturnValuePP (10841 ms)

Related issues 3 (0 open3 closed)

Has duplicate Ceph - Bug #14935: "[ FAILED ] ClsHello.Filter" in upgrade:infernalis-x-jewel-distro-basic-vpsDuplicate03/01/2016

Actions
Has duplicate Ceph - Bug #14927: "[ FAILED ] ClsLock.Test*" (4 tests) failed in upgrade:hammer-x-jewel-distro-basic-openstackResolved02/29/2016

Actions
Is duplicate of Ceph - Backport #15040: infernalis: test/librados/tier.cc doesn't completely clean up EC poolsResolvedSage WeilActions
Actions #1

Updated by Samuel Just about 8 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Sage Weil about 8 years ago

crushtool test timeout. took 7s, timeout is 5s.

Actions #3

Updated by Sage Weil about 8 years ago

  • Subject changed from [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack to (infernalis) [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack
  • Status changed from New to Fix Under Review

the timeout comes from mon_lease.

https://github.com/ceph/ceph/pull/7987

Actions #4

Updated by Sage Weil about 8 years ago

  • Has duplicate Bug #14935: "[ FAILED ] ClsHello.Filter" in upgrade:infernalis-x-jewel-distro-basic-vps added
Actions #5

Updated by Sage Weil about 8 years ago

  • Has duplicate Bug #14927: "[ FAILED ] ClsLock.Test*" (4 tests) failed in upgrade:hammer-x-jewel-distro-basic-openstack added
Actions #6

Updated by Dan Mick about 8 years ago

Additional logging added in prepare_new_pool shows:

prepare_new_pool failed with -22: error running crushmap through crushtool: (125) Operation canceled

which is consistent with the analysis above.

Actions #7

Updated by Dan Mick about 8 years ago

Sage, what does that PR have to do with this bug? Is the spare rule somehow making crushtool time out, or something? In massive need of dot-connection here

Actions #8

Updated by Loïc Dachary about 8 years ago

That happens because validating the crushmap is blocking the mon. If the validation takes more than mon_lease ... the mon will appear to be unresponsive and bad things start happening.

Actions #9

Updated by Loïc Dachary about 8 years ago

Actions #10

Updated by Loïc Dachary about 8 years ago

  • Status changed from Fix Under Review to Duplicate
Actions #11

Updated by Loïc Dachary about 8 years ago

  • Is duplicate of Backport #15040: infernalis: test/librados/tier.cc doesn't completely clean up EC pools added
Actions #12

Updated by Dan Mick about 8 years ago

mon calls crushtool with a timeout of 5s (mon_lease), and kills it if it takes longer. Because of accumulated EC-pool crush rules due to #13878, crushtool takes longer than it normally would to run; long enough that it violates the 5s timeout, and so the mon kills it; ultimately this shows up as the pool create failure.

Actions

Also available in: Atom PDF