Project

General

Profile

Actions

Bug #42519

open

During deployment of the ceph,when the main node starts slower than the other nodes.It may lead to generate a core by assert.

Added by he huang over 4 years ago. Updated 4 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During deployment of the ceph, the main MON node starts slowly, and the other two nodes start first and complete the election. At this time, the name of the main mon in the monmap is still the noname-a, but there is IP. When the main mon starts, it launches the probe. After receiving the probe_reply, it finds that other nodes' momap is newer than its own, so it directly updates other nodes' momap, but at this time, the updated momap does not have the main node’s name. After that, the main node choose itself as the main mon and enter the timecheck process. Since there is no name of the main node in the momap, the main node think that all three mons in quorum need to be checked.So the main node send a message to three mons and wait for response. But in fact the main node should only be sent to two Mon, which leads to the lack of one response message during check and the core generated by assert.

Actions #1

Updated by Greg Farnum over 4 years ago

  • Project changed from Ceph to RADOS
  • Category changed from Monitor to Correctness/Safety
  • Component(RADOS) Monitor added
Actions #2

Updated by Joao Eduardo Luis over 4 years ago

  • Assignee set to Joao Eduardo Luis
Actions #3

Updated by Joao Eduardo Luis 4 days ago

  • Assignee deleted (Joao Eduardo Luis)

No idea if this is still applicable. Unassigning from me because it hasn't been touched for 4 years now, and I'll likely won't be working on it.

Actions

Also available in: Atom PDF