Project

General

Profile

Actions

Bug #22045

open

OSDMonitor: osd down by monitor is delayed

Added by Tang Jin over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Cluster is a 3-hosts cluster and each host has a monitor, a mgr and serval osds. The options are all default except mon_osd_min_down_reporters is changed to 3, so 'osd check failure' will not be able to down osds.

Now to drop cable of public network for one of hosts, and to observe when osds in this hosts will be down in "ceph -s".

The result is some osds are down in first round of mon_osd_report_timeout seconds, and the others will be down in second round of mon_osd_report_timeout seconds, they are not down in the same time.

Actions #1

Updated by Tang Jin over 6 years ago

After new election, the leader monitor doesn't change. The leader will receive many OSDBeacons of part of down osds by resend_routed_requests from other peon monitor, on the other hand, the others' OSDBeacon will never be received again. It will make two parts of osds which aren't down in same time.

Actions #3

Updated by Greg Farnum over 6 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)
  • Component(RADOS) Monitor added
Actions

Also available in: Atom PDF