Unable to recover from ENOSPC in BlueFS
Under heavy load and full DB volume BlueStore might fall into the state where it lacks additional space for BlueFS even if the space is still available at block device.
This is cased by the "lazy" behavior of free space rebalancing - it happens periodically in background rather than on demand.
On the first allocation failure OSD asserts and then is unable to restart since log replay during BlueFS open needs the space as well but rebalance is still not executed.
Then assertion again and hence getting a sort of unrecoverable deadlock for OSD.
Sage Weil wrote:
Alternative fix for mimic and luminous: https://github.com/ceph/ceph/pull/26735
hello，sage weil , i have meet the same issue before in Lumious and i have merged the new patch you mentioned, but it is unuseful. restart the osd have the same assert.Is there any other way to restore OSD such as clean up bluefs size or
expand bluefs size