Project

General

Profile

Actions

Bug #16878

closed

filestore: utilization ratio calculation does not take journal size into account

Added by Nathan Cutler almost 8 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It is easy to fill up a Ceph cluster (FileStore) by running "rados bench write".

Assuming the full and nearfull failsafe ratios have not been changed from their defaults, the expected behavior of such a test is that the cluster will fill up to 96, 97, 98% but not more.

On one cluster, however, it is possible to fill OSDs to 100%, with disastrous consequences. This cluster has 24 OSDs, all on 1TB spinners with external journals on SSDs. The journal partitions are abnormally large (87 GiB).

There is a configuration parameter called osd_failsafe_nearfull_ratio which defaults to 0.90. When the filestore disk usage ratio reaches this point, the OSD state is changed to "near full". The conditional used to determine whether osd_failsafe_nearfull_ratio has been exceeded does not take the journal size into account.

So, here is what might be happening:

1. the journal is periodically flushed to the underlying filestore;
2. the OSD stats (including "cur_state", which can be "FULL", "NEAR", or "NONE") are updated only before and after the journal flush operation - not during it;
3. when cur_state is "NEAR" or "FULL", the journal flush operation is careful not to fill up the disk, but if it is "NONE", it writes blindly for maximum performance.

Hence Kefu's suggested fix (see comments below), which is to assume the worst case (full journal) when checking whether the nearfull failsafe ratio has been reached, as part of updating the OSD stats.


Files

ceph-12.0-2-client.log (75.2 KB) ceph-12.0-2-client.log log of rados bench Nathan Cutler, 03/03/2017 12:32 PM
ceph-12.0-2.log (26.1 KB) ceph-12.0-2.log log of ceph -w and the output of ceph -s and ceph osd tree following execution of rados bench Nathan Cutler, 03/03/2017 12:32 PM
ceph-osd.1.log.bz2 (197 KB) ceph-osd.1.log.bz2 log of the OSD that crashed Nathan Cutler, 03/03/2017 12:32 PM

Related issues 2 (0 open2 closed)

Related to Ceph - Bug #18153: No space cause osd crash (bluestore)Resolved12/06/2016

Actions
Related to Ceph - Bug #15912: An OSD was seen getting ENOSPC even with osd_failsafe_full_ratio passedResolvedDavid Zafman05/17/2016

Actions
Actions

Also available in: Atom PDF