Project

General

Profile

Bug #42519

During deployment of the ceph,when the main node starts slower than the other nodes.It may lead to generate a core by assert.

Added by he huang 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Component(RADOS):
Monitor
Pull request ID:
Crash signature:

Description

During deployment of the ceph, the main MON node starts slowly, and the other two nodes start first and complete the election. At this time, the name of the main mon in the monmap is still the noname-a, but there is IP. When the main mon starts, it launches the probe. After receiving the probe_reply, it finds that other nodes' momap is newer than its own, so it directly updates other nodes' momap, but at this time, the updated momap does not have the main node’s name. After that, the main node choose itself as the main mon and enter the timecheck process. Since there is no name of the main node in the momap, the main node think that all three mons in quorum need to be checked.So the main node send a message to three mons and wait for response. But in fact the main node should only be sent to two Mon, which leads to the lack of one response message during check and the core generated by assert.

History

#1 Updated by Greg Farnum 5 months ago

  • Project changed from Ceph to RADOS
  • Category changed from Monitor to Correctness/Safety
  • Component(RADOS) Monitor added

#2 Updated by Joao Eduardo Luis 5 months ago

  • Assignee set to Joao Eduardo Luis

Also available in: Atom PDF