Project

General

Profile

Bug #282

osd: heartbeat can't keep up with large cluster changes

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

in wido's case, a new crushmap makes osds flap.

History

#1 Updated by Greg Farnum over 9 years ago

Do we still think this is an issue after 856999eda434fa9b7d93b152427cf7c82240f220 ("osd: clear failure_queue when marked down"), or were there other issues with OSD crushmap changes that started the chain and the delayed failure_queue just kept it going?

#2 Updated by Sage Weil over 9 years ago

Greg Farnum wrote:

Do we still think this is an issue after 856999eda434fa9b7d93b152427cf7c82240f220 ("osd: clear failure_queue when marked down"), or were there other issues with OSD crushmap changes that started the chain and the delayed failure_queue just kept it going?

There might be multiple issues, not sure. I want to make sure it's working well under pretty heavy osd repeering load before closing this out.

#3 Updated by Sage Weil over 9 years ago

  • Target version changed from v0.21 to v0.21.1

#4 Updated by Sage Weil over 9 years ago

fixed what i think is the last issue here in 9bfb8da9f925642bca46528a999124cd8b28ba2a

#5 Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF