There are a lot of OSD downturns on this node. After PG is redistributed, a PG member may cannot be selected.
The phenomenon is a three node ceph environment with 2 + 1 redundancy configuration. Because of our own OSD cache function, when we unplug the SSD disk, half of the OSD of this node will be down, so PG will be reselected.
In fact, it has nothing to do with OSD cache, just a trigger. To put it bluntly, there are a lot of OSD downturns on this node. This is half downturns. After PG is redistributed, a PG member cannot be selected.
In short, the number of hosts and the number of redundant copies are the same. That is to say, each PG must contain one OSD in each host. On this basis, the OSD on the top half of a host is down. This happens. A PG cannot be selected.
It's not necessary. There is a certain probability. Because the input parameters of the CRUCH algorithm are different, the results are different. For example, if the pool ID is 1, it may appear. If the pool ID is 2, it may not appear.
#1 Updated by Greg Farnum 3 months ago
- Status changed from New to Closed
Yes, sometimes CRUSH selection fails when you have a very small number of choices compared to the number of required selections. Nuking half the OSDs in a host will make this an essentially impossible scenario to handle.
Even if CRUSH could select a node, this in general won't work because if your cluster is anywhere near full, losing half a node will mean you don't have the storage space available to correctly rebalance, since you have to be able to store a full copy of all the data on every host.