Project

General

Profile

Support #23719

Three nodes shutdown,only boot two of the nodes,many pgs down.(host failure-domain,ec 2+1)

Added by junwei liao over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

The interval mechanism of PG will cause a problem in the process of cluster restart.If I have 3 nodes(host failure-domain/ec 2+1) in my cluster,after I close the 3 nodes, I only boot 2 of them,many pgs down.Theoretically,host failure-domain/ec 2+1, 3 nodes have 2 online to guarantee PG active.The reason is because of the PG interval mechanism.

If pg1.1 have acting set [1,2,3], I power down osd.3 first,then power down osd.2. In case I boot osd.3 first,boot osd.2 second,then pg can be active;in case I boot osd.2 first,boot osd.3 second,then pg can be down. When I close the 3 nodes in the cluster at the same time, all osd power down disorder.If I only boot 2 of them,then cause a lot of pgs down.The reliability of the cluster will be greatly affected.

History

#1 Updated by junwei liao over 3 years ago

fix description: If pg1.1 have acting set [1,2,3], I power down osd.3 first,then power down osd.2. In case I boot osd.2,then pg can be active;in case I boot osd.3,then pg can be down.

#2 Updated by Greg Farnum over 3 years ago

  • Tracker changed from Bug to Support
  • Project changed from Ceph to RADOS
  • Category deleted (OSD)

Also available in: Atom PDF