Fix #6262
opentoofull osd prevents backfilling of other pg replicas
0%
Description
Say a pg is to be 4-way replicated across osds [0,1,2,3].
AFAICT, if any of the osds 0, 1 or 2 hit the toofull threshold before backfilling that osd completes, the pg will remain stuck in backfill_toofull, instead of putting the full osd aside and starting the backfilling of the remaining osds, which would enable further progress and protect data integrity should any of the osds holding the pg fail.
AFAICT the only way to avoid waiting for the toofull osd to free up space and complete its own backfilling before backfilling the subsequent osds in the replication set is to bring the too-full osd down, which prevents it from participating in recovery of other pgs and even from freeing up space as it becomes available. Plus, if it remains down for long enough that it becomes out, it will trigger additional recovery, that will slow things down and fill osds up further.
Updated by Patrick Donnelly over 5 years ago
- Project changed from Ceph to RADOS
- Component(RADOS) OSD added