Project

General

Profile

Actions

Bug #39555

closed

backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)

Added by Rene Diepstraten about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
David Zafman
Category:
Backfill/Recovery
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous, mimic, nautilus
Regression:
No
Severity:
2 - major
Reviewed:
ceph-qa-suite:
rados
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This week I ran into an issue where ceph reports HEALTH_ERR because pgs are backfill_toofull.
None of the OSDs are over the threshold of nearfull_ratio.

The problem was as far as I know first reported in the ceph users mailing list in January:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032534.html
But more posts can be found ( http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033569.html for example )

Currently, the cluster I'm working on is busy with a massive rebalance and the fullest osd is at 79% full.
There are still more pgs remapped to that osd but there are also pgs remapped away from this osd.

root@mon01:~# ceph osd df tree | sort -nk11 | tail -1 | awk '{print $11}'
79.33
root@mon01:~# ceph daemon mon.mon01 config get mon_osd_backfillfull_ratio
{
    "mon_osd_backfillfull_ratio": "0.900000" 
}

It appears to me as if the monitors count all pgs that are remapped to the osd and if added to the current capacity it would be over the
backfillfull_ratio, reports the backfill_toofull state. It should in my opinion also take into account the pgs that are remapped away from the osd if this is indeed the case.


Files

ceph-osd-df.txt (4 KB) ceph-osd-df.txt ceph osd df Alex Cucu, 05/10/2019 12:05 PM
ceph-df.txt (629 Bytes) ceph-df.txt ceph df Alex Cucu, 05/10/2019 12:05 PM
crushmap.txt (4.3 KB) crushmap.txt Alex Cucu, 05/10/2019 12:10 PM
ceph-osd-tree.txt (3.56 KB) ceph-osd-tree.txt Alex Cucu, 05/10/2019 12:11 PM
crushmap.txt (7.44 KB) crushmap.txt Rene Diepstraten, 05/21/2019 09:33 PM
ceph-osd-df-tree.txt (8.77 KB) ceph-osd-df-tree.txt Rene Diepstraten, 05/21/2019 09:33 PM

Related issues 4 (0 open4 closed)

Related to RADOS - Bug #41255: backfill_toofull seen on cluster where the most full OSD is at 1%ResolvedDavid Zafman

Actions
Copied to RADOS - Backport #41499: mimic: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)RejectedDavid ZafmanActions
Copied to RADOS - Backport #41500: luminous: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)RejectedActions
Copied to RADOS - Backport #41501: nautilus: backfill_toofull while OSDs are not full (Unneccessary HEALTH_ERR)ResolvedNathan CutlerActions
Actions

Also available in: Atom PDF