Bug #48274: mon_osd_adjust_heartbeat_grace blocks OSDs from being marked as down - RADOS - Ceph

Actions

Copy link

Bug #48274

open

mon_osd_adjust_heartbeat_grace blocks OSDs from being marked as down

Added by Wido den Hollander over 3 years ago. Updated almost 3 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.2

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I encountered a situation which I also posted on the users list: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/HZXM2DQTNKE55CS35IJMJKYVEO7237VQ/#SBW4X46UHMUNHC7HCDTPBST3LMSIONSM

The cluster (15.2.2) has a fairly simple CRUSHMap:

root default
  rack 206
   host A
  rack 207
   host B
  rack 208
   host C

We tried to perform a failure of a rack.

The MON in that rack was marked as down, but the OSDs were marked down after 900 seconds eventually.

This morning (2020-11-18) we tried this test again and I noticed this:

2020-11-18T02:33:21.498-0500 7f849ae8c700  0 log_channel(cluster) log [DBG] : osd.58 reported failed by osd.105
2020-11-18T02:33:21.498-0500 7f849ae8c700 10 mon.CEPH2-MON1-206-U39@0(leader).osd e108196  osd.58 has 1 reporters, 82.847365 grace (20.000000 + 62.8474 + 5.67647e-167), max_failed_since 2020-11-18T02:32:59.498216-0500
2020-11-18T02:33:21.498-0500 7f849ae8c700 10 mon.CEPH2-MON1-206-U39@0(leader).log v9218943  logging 2020-11-18T02:33:21.499338-0500 mon.CEPH2-MON1-206-U39 (mon.0) 165 : cluster [DBG] osd.58 reported failed by osd.105
2020-11-18T02:33:21.814-0500 7f8499689700  7 mon.CEPH2-MON1-206-U39@0(leader).log v9218944 update_from_paxos applying incremental log 9218944 2020-11-18T02:33:21.499338-0500 mon.CEPH2-MON1-206-U39 (mon.0) 165 : cluster [DBG] osd.58 reported failed by osd.105
2020-11-18T02:33:22.158-0500 7f849ae8c700 10 mon.CEPH2-MON1-206-U39@0(leader).osd e108196 preprocess_query osd_failure(failed timeout osd.58 [v2:10.101.13.137:6811/2762,v1:10.101.13.137:6819/2762] for 22sec e108196 v108196) v4 from osd.90 v2:10.101.13.12:6830/2708
2020-11-18T02:33:22.158-0500 7f849ae8c700 10 mon.CEPH2-MON1-206-U39@0(leader).osd e108196 preprocess_failure new: osd.58 [v2:10.101.13.137:6811/2762,v1:10.101.13.137:6819/2762], from osd.90
2020-11-18T02:33:22.158-0500 7f849ae8c700  7 mon.CEPH2-MON1-206-U39@0(leader).osd e108196 prepare_update osd_failure(failed timeout osd.58 [v2:10.101.13.137:6811/2762,v1:10.101.13.137:6819/2762] for 22sec e108196 v108196) v4 from osd.90 v2:10.101.13.12:6830/2708
2020-11-18T02:33:22.158-0500 7f849ae8c700  1 mon.CEPH2-MON1-206-U39@0(leader).osd e108196 prepare_failure osd.58 [v2:10.101.13.137:6811/2762,v1:10.101.13.137:6819/2762] from osd.90 is reporting failure:1
2020-11-18T02:33:22.158-0500 7f849ae8c700 10 mon.CEPH2-MON1-206-U39@0(leader) e1 no_reply to osd.90 v2:10.101.13.12:6830/2708 osd_failure(failed timeout osd.58 [v2:10.101.13.137:6811/2762,v1:10.101.13.137:6819/2762] for 22sec e108196 v108196) v4

After reading it thoroughly this is what I noticed:

osd.58 has 1 reporters, 82.847365 grace (20.000000 + 62.8474 + 5.67647e-167)

Somehow the MONs were adding time to the grace period.

After looking through the sourcecode (mon/OSDMonitor.cc: OSDMonitor::check_failure) I saw that the config setting mon_osd_adjust_heartbeat_grace was influencing the grace period.

I changed mon_osd_adjust_heartbeat_grace to false and the OSDs were marked as down right away.

I played with setting mon_osd_reporter_subtree_level to rack, but this didn't change anything.

This setting has been in the code for a while, but I have never seen this happen before.