Bug #282: osd: heartbeat can't keep up with large cluster changes - Ceph - Ceph

Actions

Copy link

Bug #282

closed

osd: heartbeat can't keep up with large cluster changes

Added by Sage Weil almost 14 years ago. Updated over 13 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

v0.21.1

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

in wido's case, a new crushmap makes osds flap.

Actions

Copy link

Updated by Greg Farnum over 13 years ago

Do we still think this is an issue after 856999eda434fa9b7d93b152427cf7c82240f220 ("osd: clear failure_queue when marked down"), or were there other issues with OSD crushmap changes that started the chain and the delayed failure_queue just kept it going?

Actions

Copy link

Updated by Sage Weil over 13 years ago

Greg Farnum wrote:

Do we still think this is an issue after 856999eda434fa9b7d93b152427cf7c82240f220 ("osd: clear failure_queue when marked down"), or were there other issues with OSD crushmap changes that started the chain and the delayed failure_queue just kept it going?

There might be multiple issues, not sure. I want to make sure it's working well under pretty heavy osd repeering load before closing this out.

Actions

Copy link