Project

General

Profile

Actions

Bug #282

closed

osd: heartbeat can't keep up with large cluster changes

Added by Sage Weil almost 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

in wido's case, a new crushmap makes osds flap.

Actions #1

Updated by Greg Farnum over 13 years ago

Do we still think this is an issue after 856999eda434fa9b7d93b152427cf7c82240f220 ("osd: clear failure_queue when marked down"), or were there other issues with OSD crushmap changes that started the chain and the delayed failure_queue just kept it going?

Actions #2

Updated by Sage Weil over 13 years ago

Greg Farnum wrote:

Do we still think this is an issue after 856999eda434fa9b7d93b152427cf7c82240f220 ("osd: clear failure_queue when marked down"), or were there other issues with OSD crushmap changes that started the chain and the delayed failure_queue just kept it going?

There might be multiple issues, not sure. I want to make sure it's working well under pretty heavy osd repeering load before closing this out.

Actions #3

Updated by Sage Weil over 13 years ago

  • Target version changed from v0.21 to v0.21.1
Actions #4

Updated by Sage Weil over 13 years ago

fixed what i think is the last issue here in 9bfb8da9f925642bca46528a999124cd8b28ba2a

Actions #5

Updated by Sage Weil over 13 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF