Project

General

Profile

Actions

Bug #59318

closed

mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twice

Added by Patrick Donnelly about 1 year ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
% Done:

100%

Source:
Development
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If the leader handles two up:boot beacons from a new MDS, it may fail the new MDS if the two beacon updates are batched in the same PAXOS transaction. For example:

2023-04-05T00:58:13.435+0000 7fc56d485700  7 mon.a@0(leader).mds e286 prepare_update mdsbeacon(105603/d up:boot seq=2 v286) v8
2023-04-05T00:58:13.435+0000 7fc56d485700 12 mon.a@0(leader).mds e286 prepare_beacon mdsbeacon(105603/d up:boot seq=2 v286) v8 from mds.? [v2:127.0.0.1:6832/3118878188,v1:127.0.0.1:6833/3118878188]                                                                                                                                                                                         
2023-04-05T00:58:13.435+0000 7fc56d485700 15 mon.a@0(leader).mds e286 prepare_beacon got health from gid 105603 with 0 metrics.
2023-04-05T00:58:13.435+0000 7fc56d485700  0 log_channel(cluster) log [INF] : daemon mds.d restarted
2023-04-05T00:58:13.435+0000 7fc56d485700  1 -- [v2:127.0.0.1:40314/0,v1:127.0.0.1:40315/0] --> [v2:127.0.0.1:40314/0,v1:127.0.0.1:40315/0] -- log(1 entries from seq 734 at 2023-04-05T00:58:13.436351+0000) v1 -- 0x55f4fd3cd880 con 0x55f4f1849000                                                                                                                                         
2023-04-05T00:58:13.435+0000 7fc56d485700  1 mon.a@0(leader).mds e286 fail_mds_gid 105603 mds.d role -1
2023-04-05T00:58:13.435+0000 7fc56d485700 10 mon.a@0(leader).paxosservice(osdmap 1..608) propose_pending
2023-04-05T00:58:13.435+0000 7fc56d485700 10 mon.a@0(leader).osd e608 encode_pending e 609

Related issues 4 (0 open4 closed)

Related to CephFS - Bug #24403: mon failed to return metadata for mdsResolvedPatrick Donnelly

Actions
Copied to CephFS - Backport #61424: reef: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twiceResolvedPatrick DonnellyActions
Copied to CephFS - Backport #61425: quincy: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twiceResolvedPatrick DonnellyActions
Copied to CephFS - Backport #61426: pacific: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twiceResolvedPatrick DonnellyActions
Actions #1

Updated by Patrick Donnelly about 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 50875
Actions #2

Updated by Patrick Donnelly about 1 year ago

  • Related to Bug #24403: mon failed to return metadata for mds added
Actions #3

Updated by Patrick Donnelly 11 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Backport Bot 11 months ago

  • Copied to Backport #61424: reef: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twice added
Actions #5

Updated by Backport Bot 11 months ago

  • Copied to Backport #61425: quincy: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twice added
Actions #6

Updated by Backport Bot 11 months ago

  • Copied to Backport #61426: pacific: mon/MDSMonitor: daemon booting may get failed if mon handles up:boot beacon twice added
Actions #7

Updated by Backport Bot 11 months ago

  • Tags set to backport_processed
Actions #8

Updated by Konstantin Shalygin 5 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF