Bug #57852: osd: unhealthy osd cannot be marked down in time - RADOS - Ceph

Actions

Copy link

Bug #57852

open

osd: unhealthy osd cannot be marked down in time

Added by wencong wan over 1 year ago. Updated about 1 year ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Prashant D

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Before an unhealthy osd is marked down by mon, other osd may choose it as
heartbeat peer and then report an incorrect failure time(first_tx) to mon.

reproduce:
Shutdown cluster_network and public_network of an osd node several times.

Files

Download all files

p1.png (63.1 KB) p1.png	ifdown net at 13:10	wencong wan, 10/12/2022 02:13 AM
p2.png (246 KB) p2.png	after 10 minutes,unhealthy osd still keep up status	wencong wan, 10/12/2022 02:15 AM

Actions

Copy link

Updated by Radoslaw Zarzynski over 1 year ago

Status changed from New to Need More Info

Could you please clarify a bit? Do you mean there some extra, unnecessary (from the POV of jugging whether an OSD is down or not) messages that just update the markdown timestamp?

Actions

Copy link

Updated by wencong wan over 1 year ago

Radoslaw Zarzynski wrote:

Could you please clarify a bit? Do you mean there some extra, unnecessary (from the POV of jugging whether an OSD is down or not) messages that just update the markdown timestamp?

Whether an OSD is down or not is determined by mon.If either of the following two conditions is met, mon will mark an osd as down.
1、Mon does not receive osd_beacon message of an osd for more than 900s(mon_osd_report_timeout)
2、Mon receive failure_report message from 2(mon_osd_min_down_reporters) osds on different host(mon_osd_reporter_subtree_level) and the fault lasted for a period of time（now - fi.get_failed_since() > grace）.

get_failed_since return the max failed time of all reporters. if some osd choose the unhealthy osd as heartbeat peer,they will never receive heartbeat reply from the unhealthy osd. So these osds will report the first time of sending heartbeat as the failure time of the unhealthy osd. The condition "now - fi.get_failed_since() > grace" cannot be met.

Actions

Copy link

Updated by Radoslaw Zarzynski over 1 year ago

Status changed from Need More Info to New

For the detailed explanation!

Actions

Copy link

Updated by Radoslaw Zarzynski over 1 year ago

Assignee set to Prashant D

Not a something we introduced recently but still worth taking a look if nothing urgent is not the plate.

Actions

Copy link

Updated by Prashant D over 1 year ago

Sure Radek. Let me have a look at this.

Actions

Copy link

Updated by Radoslaw Zarzynski over 1 year ago

Status changed from New to In Progress

Actions

Copy link

Updated by Prashant D about 1 year ago

Status changed from In Progress to Need More Info

I am working on probable fix for this issue but I could not able to reproduce this issue on vstart cluster by blocking traffic on ports for specific OSD. I am creating a test environment to reproduce this issue. Meanwhile would it be possible to provide debug_mon=20 for mon and debug_osd=25 logs for 2-3 OSDs which are reporting unhealthy OSD as failed ?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #57852

osd: unhealthy osd cannot be marked down in time

Updated by Radoslaw Zarzynski over 1 year ago

Updated by wencong wan over 1 year ago

Updated by Radoslaw Zarzynski over 1 year ago

Updated by Radoslaw Zarzynski over 1 year ago

Updated by Prashant D over 1 year ago

Updated by Radoslaw Zarzynski over 1 year ago

Updated by Prashant D about 1 year ago