Bug #14849: (infernalis) [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack - Ceph - Ceph

Actions

Copy link

Bug #14849

closed

(infernalis) [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack

Added by Yuri Weinstein about 8 years ago. Updated about 8 years ago.

Status:

Duplicate

Priority:

Urgent

Assignee:

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

rados

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-02-23_02:00:02-rados-infernalis-distro-basic-openstack/
Job: 3568
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-02-23_02:00:02-rados-infernalis-distro-basic-openstack/3568/teuthology.log

2016-02-23T03:56:22.058 INFO:teuthology.orchestra.run.target091202:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config set filestore_inject_stall 3'
2016-02-23T03:56:25.518 INFO:tasks.ceph.mon.c.target091202.stderr:2016-02-23 03:56:25.472230 7fa7c6fe0700 -1 mon.c@2(peon).paxos(paxos updating c 1..708) lease_expire from mon.0 158.69.91.202:6789/0 is 2.512609 seconds in the past; mons are probably laggy (or possibly clocks are too skewed)
2016-02-23T03:56:26.735 INFO:tasks.workunit.client.0.target091202.stdout:test/librados/aio.cc:2528: Failure
2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:Value of: test_data.init()
2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:  Actual: "create_one_ec_pool(test-rados-api-target091202.ovh.sepia.ceph.com-13446-58) failed: error mon_command osd pool create pool:test-rados-api-target091202.ovh.sepia.ceph.com-13446-58 pool_type:erasure failed with error -22" 
2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:Expected: "" 
2016-02-23T03:56:26.736 INFO:tasks.workunit.client.0.target091202.stdout:[  FAILED  ] LibRadosAioEC.ReturnValuePP (10841 ms)

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Samuel Just about 8 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by Sage Weil about 8 years ago

crushtool test timeout. took 7s, timeout is 5s.

Actions

Copy link

Updated by Sage Weil about 8 years ago

Subject changed from [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack to (infernalis) [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack
Status changed from New to Fix Under Review

the timeout comes from mon_lease.

https://github.com/ceph/ceph/pull/7987

Actions

Copy link

Updated by Sage Weil about 8 years ago

Has duplicate Bug #14935: "[ FAILED ] ClsHello.Filter" in upgrade:infernalis-x-jewel-distro-basic-vps added

Actions

Copy link

Updated by Sage Weil about 8 years ago

Has duplicate Bug #14927: "[ FAILED ] ClsLock.Test*" (4 tests) failed in upgrade:hammer-x-jewel-distro-basic-openstack added

Actions

Copy link

Updated by Dan Mick about 8 years ago

Additional logging added in prepare_new_pool shows:

prepare_new_pool failed with -22: error running crushmap through crushtool: (125) Operation canceled

which is consistent with the analysis above.

Actions

Copy link

Updated by Dan Mick about 8 years ago

Sage, what does that PR have to do with this bug? Is the spare rule somehow making crushtool time out, or something? In massive need of dot-connection here

Actions

Copy link

Updated by Loïc Dachary about 8 years ago

That happens because validating the crushmap is blocking the mon. If the validation takes more than mon_lease ... the mon will appear to be unresponsive and bad things start happening.

Actions

Copy link

Updated by Loïc Dachary about 8 years ago

http://tracker.ceph.com/issues/15040 is the backport of http://tracker.ceph.com/issues/13878 which explains the issue

Actions

Copy link

#10

Updated by Loïc Dachary about 8 years ago

Status changed from Fix Under Review to Duplicate

Actions

Copy link

#11

Updated by Loïc Dachary about 8 years ago

Is duplicate of Backport #15040: infernalis: test/librados/tier.cc doesn't completely clean up EC pools added

Actions

Copy link

#12

Updated by Dan Mick about 8 years ago

mon calls crushtool with a timeout of 5s (mon_lease), and kills it if it takes longer. Because of accumulated EC-pool crush rules due to #13878, crushtool takes longer than it normally would to run; long enough that it violates the 5s timeout, and so the mon kills it; ultimately this shows up as the pool create failure.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #14849

(infernalis) [ FAILED ] LibRadosAioEC.ReturnValuePP in rados-infernalis-distro-basic-openstack

Updated by Samuel Just about 8 years ago

Updated by Sage Weil about 8 years ago

Updated by Sage Weil about 8 years ago

Updated by Sage Weil about 8 years ago

Updated by Sage Weil about 8 years ago

Updated by Dan Mick about 8 years ago

Updated by Dan Mick about 8 years ago

Updated by Loïc Dachary about 8 years ago

Updated by Loïc Dachary about 8 years ago

Updated by Loïc Dachary about 8 years ago

Updated by Loïc Dachary about 8 years ago

Updated by Dan Mick about 8 years ago