Project

General

Profile

Bug #13878

test/librados/tier.cc doesn't completely clean up EC pools

Added by Dan Mick about 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer,infernalis
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Creating an EC pool also creates a custom crush rule for it; tier.cc creates and destroys the pool, but leaves behind the crush rule; with repeated runs, this can get to overflowing the crush rule table (256 entries) fairly quickly and cause a confusing failure, including a backtrace on a Lock of a null mutex (probably a second bug).

Make destroy_one_ec_pool* also remove the crush rule.

Another side effect of having too many ruleset is that injecting the crushmap into the mon will timeout. The crushmap validation happens in the mon by calling crushtool. If it does not return within mon_lease seconds, it is interrupted. If it was not interrupted the mon would be perceived as unresponsive by the other mons and that would trigger an election, which is not the desired outcome.


Related issues

Copied to Ceph - Backport #15040: infernalis: test/librados/tier.cc doesn't completely clean up EC pools Resolved
Copied to Ceph - Backport #15051: hammer: test/librados/tier.cc doesn't completely clean up EC pools Resolved

Associated revisions

Revision 04b4795f (diff)
Added by Dan Mick almost 7 years ago

test/librados/test.cc: clean up EC pools' crush rules too

SetUp was adding an erasure-coded pool, which automatically adds
a new crush rule named after the pool, but only removing the
pool. Remove the crush rule as well.

http://tracker.ceph.com/issues/13878 Fixes: #13878

Signed-off-by: Dan Mick <>
Signed-off-by: Loic Dachary <>

Revision 15a419be (diff)
Added by Dan Mick over 6 years ago

test/librados/test.cc: clean up EC pools' crush rules too

SetUp was adding an erasure-coded pool, which automatically adds
a new crush rule named after the pool, but only removing the
pool. Remove the crush rule as well.

http://tracker.ceph.com/issues/13878 Fixes: #13878

Signed-off-by: Dan Mick <>
Signed-off-by: Loic Dachary <>
(cherry picked from commit 04b4795f81c15bfcb62ba5807745470ce0e5e949)

Revision 57fd7f85 (diff)
Added by Dan Mick over 6 years ago

test/librados/test.cc: clean up EC pools' crush rules too

SetUp was adding an erasure-coded pool, which automatically adds
a new crush rule named after the pool, but only removing the
pool. Remove the crush rule as well.

http://tracker.ceph.com/issues/13878 Fixes: #13878

Signed-off-by: Dan Mick <>
Signed-off-by: Loic Dachary <>
(cherry picked from commit 04b4795f81c15bfcb62ba5807745470ce0e5e949)

History

#1 Updated by Dan Mick about 7 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Dan Mick

#2 Updated by Loïc Dachary almost 7 years ago

  • Assignee changed from Dan Mick to Loïc Dachary

#4 Updated by Loïc Dachary almost 7 years ago

  • Status changed from Fix Under Review to Resolved

#5 Updated by Loïc Dachary over 6 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to infernalis

#6 Updated by Loïc Dachary over 6 years ago

  • Copied to Backport #15040: infernalis: test/librados/tier.cc doesn't completely clean up EC pools added

#7 Updated by Loïc Dachary over 6 years ago

  • Description updated (diff)

#8 Updated by Loïc Dachary over 6 years ago

from dmick: mon calls crushtool with a timeout of 5s (mon_lease), and kills it if it takes longer. Because of accumulated EC-pool crush rules due to #13878, crushtool takes longer than it normally would to run; long enough that it violates the 5s timeout, and so the mon kills it; ultimately this shows up as the pool create failure.

#9 Updated by Nathan Cutler over 6 years ago

  • Backport changed from infernalis to hammer,infernalis

Added hammer backport based on IRC discussion with Yuri and Dan:

(10:53:33 PM) smithfarm: yuriw: loicd set the backport to infernalis (only) here: http://tracker.ceph.com/issues/13878#note-5
(10:54:46 PM) yuriw: smithfarm - i saw and wondering if hammer needs to be set too
(11:00:13 PM) yuriw: smithfarm - i thought that dupes should be fixed by it:
(11:00:13 PM) yuriw: http://tracker.ceph.com/issues/14927
(11:00:13 PM) yuriw: http://tracker.ceph.com/issues/14935
(11:05:44 PM) smithfarm: yuri: I don't know. dmick, do you think https://github.com/dachary/ceph/commit/04b4795f81c15bfcb62ba5807745470ce0e5e949 needs to be backported to hammer as well as infernalis? The issue is triggered by the hammer-x-jewel as well as the infernalis-x-jewel upgrade suites.
(11:06:05 PM) smithfarm: dmick: loicd staged the infernalis backport, but not hammer
(11:06:21 PM) dmick: seems to me it ought to go everywhere we're running tests, yes

#10 Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #15051: hammer: test/librados/tier.cc doesn't completely clean up EC pools added

#11 Updated by Loïc Dachary over 6 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF