I have cluster on 5 nodes.
System pool with replicas and data pools with Erasure Codes 4+2 (min_size 5).
To work, I had to turn off one server.
I turned it off, and after a minute my whole cluster hung.
haproxy writes that the servers do not respond for more than 5 seconds.
I try check from console "rados ls -p .rgw.root"
And my request is frozen.
I understand that by turning off one node a part of the PGs has become lacking. (incompllete)
But the system pool in the replica and read in any case should be available.
How to find out the reason?
#2 Updated by Andrey Groshev over 2 years ago
Casey Bodley wrote:
radosgw depends on having a healthy ceph cluster, so this is expected behavior
What does the expected mean? I have many other pools in good condition. Because of one pool, does the cluster hang? Through this RGW, other users go to their very healthy pools.