Project

General

Profile

Actions

Bug #52724

closed

octopus: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'

Added by Deepika Upadhyay over 2 years ago. Updated almost 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

description: rados/singleton/{all/osd-recovery msgr-failures/none msgr/async-v2only
  objectstore/bluestore-comp-snappy rados supported-random-distro$/{centos_8}}
failure_reason: '"2021-09-21T14:14:40.675086+0000 mon.a (mon.0) 24 : cluster [WRN]
  Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)" in cluster log'

needed ack from peon

2021-09-21T14:19:03.751+0000 7ff7e724a700 10 mon.a@0(leader).paxos(paxos active c 1..413) handle_lease_ack from mon.1 -- still need 1 more

2021-09-21T14:19:03.751+0000 7ff7e724a700 10 mon.a@0(leader).paxos(paxos active c 1..413) handle_lease_ack from mon.2 -- got everyone

2021-09-21T14:19:03.972+0000 7ff7e724a700 10 mon.a@0(leader) e1 reset/close on session mgr.4106 172.21.15.64:0/28717
2021-09-21T14:19:03.972+0000 7ff7e724a700 10 mon.a@0(leader) e1 remove_session 0x55f6f7f8fd40 mgr.4106 172.21.15.64:0/28717 features 0x3f01cfb8ffedffff
2021-09-21T14:19:03.972+0000 7ff7eba53700  1 -- 172.21.15.64:0/28534 >> v2:172.21.15.64:6812/28717 conn(0x55f6f8889800 msgr2=0x55f6f7fbb700 unknown :-1 s=STATE_CONNECTING_RE l=0).process reconnect failed to v2:172.21.15.64:6812/28717
2021-09-21T14:19:03.972+0000 7ff7eba53700  1 --2- 172.21.15.64:0/28534 >> v2:172.21.15.64:6812/28717 conn(0x55f6f8889800 0x55f6f7fbb700 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._fault waiting 0.200000
2021-09-21T14:19:03.972+0000 7ff7e3a43700  1 -- 172.21.15.64:0/28534 --> v2:172.21.15.64:6812/28717 -- mgropen(unknown.a) v3 -- 0x55f6f7f84240 con 0x55f6f8889800
2021-09-21T14:19:03.972+0000 7ff7e3a43700 10 mon.a@0(leader) e1 ms_handle_refused 0x55f6f8889800 v2:172.21.15.64:6812/28717

last logged, does not seem like anything suspicious, do we just need to see what's slowing down or increase the timeout? { other 2 mons went to standby }

/ceph/teuthology-archive/yuriw-2021-09-21_13:09:46-rados-wip-yuri2-testing-2021-09-21-0432-octopus-distro-basic-smithi/6401575/


Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #43584: MON_DOWN during mon_join processResolvedSage Weil

Actions
Actions #1

Updated by Deepika Upadhyay over 2 years ago

  • Description updated (diff)
Actions #2

Updated by Neha Ojha over 2 years ago

Seems to be the same issue as https://tracker.ceph.com/issues/43584, marked the original ticket for backport.

Actions #3

Updated by Laura Flores almost 2 years ago

  • Status changed from New to Duplicate
Actions #4

Updated by Laura Flores almost 2 years ago

  • Is duplicate of Bug #43584: MON_DOWN during mon_join process added
Actions

Also available in: Atom PDF