Project

General

Profile

Actions

Feature #2944

closed

mon: dynamically adjust heartbeat grace

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
CY2012
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Basically:
1) Keep track of when an OSD boots if it reports itself as fresh or as
wrongly-marked-down. Maintain the probability that the OSD is actually
down versus laggy based on that data and an exponential decay (more
recent reports matter more), and maintain the length of time the OSD
was laggy for in those cases.
2) When a sufficient number of failure reports come in to mark an OSD
down, additionally compute the laggy probability and laggy interval
for the reporters in aggregate.
3) Adjust the "heartbeat grace" locally on the monitor according to
the following formula:
adjusted_heartbeat_grace = heartbeat_grace + laggy_interval * (1 /
laggy_probability) + group_laggy_interval * ( 1 /
group_laggy_probability)
4) If we reach the end of that adjusted heartbeat grace, and we have
not received failure cancellations (which already exist; when an OSD
gets a heartbeat from a node it's reported down but which isn't marked
down, the OSD sends a cancellation), then mark the OSD down.
5) When running the out check, adjust the "down to out interval" by
the same ratio we've adjusted the heartbeat grace by.


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Feature #2320: mon: detect and throttle osd flappingDuplicate

Actions
Actions #1

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_position deleted (1)
  • Translation missing: en.field_position set to 7
Actions #2

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_story_points set to 21
  • Translation missing: en.field_position deleted (10)
  • Translation missing: en.field_position set to 8
Actions #3

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_position deleted (36)
  • Translation missing: en.field_position set to 3
Actions #4

Updated by Sage Weil over 11 years ago

  • Priority changed from Normal to High
Actions #5

Updated by Sage Weil over 11 years ago

  • Tags set to CY2012
Actions #6

Updated by Sage Weil over 11 years ago

  • Description updated (diff)
Actions #7

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_position deleted (15)
  • Translation missing: en.field_position set to 1
Actions #8

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_position deleted (4)
  • Translation missing: en.field_position set to 1
Actions #9

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_story_points changed from 21 to 0
  • Translation missing: en.field_position deleted (1)
  • Translation missing: en.field_position set to 1
Actions #10

Updated by Sage Weil over 11 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF