Project

General

Profile

Actions

Bug #15912

closed

An OSD was seen getting ENOSPC even with osd_failsafe_full_ratio passed

Added by David Zafman almost 8 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
kraken, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The value of osd_failsafe_full_ratio only restricts new client ops by default after 97% full condition. Could it be that an OSD with large journal could have enough pending filestore data updates that the 3% isn't enough to absorb those updates?

Should new client operations be restricted based on journal size? We could make the value of osd_failsafe_full_ratio an over-ride with default to 0 (use computed value).


Related issues 7 (0 open7 closed)

Related to RADOS - Bug #18687: bluestore: ENOSPC writing to XFS block file on smithiResolved01/26/2017

Actions
Related to Ceph - Bug #16878: filestore: utilization ratio calculation does not take journal size into accountResolvedDavid Zafman08/01/2016

Actions
Related to Ceph - Feature #15910: Increase the default value of mon_osd_min_in_ratioResolvedDavid Zafman05/17/2016

Actions
Related to Ceph - Bug #19682: Additional full fixesResolvedDavid Zafman04/18/2017

Actions
Related to Ceph - Bug #19733: clean up min/max span warningResolvedDavid Zafman04/20/2017

Actions
Copied to Ceph - Backport #19265: jewel: An OSD was seen getting ENOSPC even with osd_failsafe_full_ratio passedResolvedAlexey SheplyakovActions
Copied to Ceph - Backport #19340: kraken: An OSD was seen getting ENOSPC even with osd_failsafe_full_ratio passedResolvedNathan CutlerActions
Actions #1

Updated by David Zafman almost 8 years ago

  • Assignee set to David Zafman
Actions #2

Updated by David Zafman almost 8 years ago

Potentially backfill or recovery used the remaining space.

Actions #3

Updated by Ian Colle about 7 years ago

  • Priority changed from Normal to Urgent
Actions #4

Updated by David Zafman about 7 years ago

https://github.com/ceph/ceph/pull/13425

There are multiple issues to address. This pull requests addresses some of them.

Actions #5

Updated by Nathan Cutler about 7 years ago

  • Related to Bug #16878: filestore: utilization ratio calculation does not take journal size into account added
Actions #6

Updated by David Zafman about 7 years ago

  • Related to Bug #18687: bluestore: ENOSPC writing to XFS block file on smithi added
Actions #7

Updated by David Zafman about 7 years ago

  • Related to deleted (Bug #16878: filestore: utilization ratio calculation does not take journal size into account)
Actions #8

Updated by David Zafman about 7 years ago

  • Related to Bug #16878: filestore: utilization ratio calculation does not take journal size into account added
Actions #9

Updated by David Zafman about 7 years ago

  • Status changed from New to Resolved
Actions #10

Updated by David Zafman about 7 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to kraken, jewel
Actions #11

Updated by David Zafman about 7 years ago

  • Related to Feature #15910: Increase the default value of mon_osd_min_in_ratio added
Actions #12

Updated by Alexey Sheplyakov about 7 years ago

  • Copied to Backport #19265: jewel: An OSD was seen getting ENOSPC even with osd_failsafe_full_ratio passed added
Actions #13

Updated by Nathan Cutler about 7 years ago

  • Copied to Backport #19340: kraken: An OSD was seen getting ENOSPC even with osd_failsafe_full_ratio passed added
Actions #14

Updated by David Zafman almost 7 years ago

  • Related to Bug #19682: Additional full fixes added
Actions #15

Updated by David Zafman almost 7 years ago

  • Related to Bug #19698: cephtool/test.sh error on full tests added
Actions #16

Updated by David Zafman almost 7 years ago

We should backport the 2 trackers/pulls in this order:

bug #15912 Pull https://github.com/ceph/ceph/pull/13425

bug #19733 Pull https://github.com/ceph/ceph/pull/14611

Actions #17

Updated by David Zafman almost 7 years ago

  • Related to deleted (Bug #19698: cephtool/test.sh error on full tests)
Actions #18

Updated by David Zafman almost 7 years ago

  • Related to Bug #19733: clean up min/max span warning added
Actions #19

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF