Project

General

Profile

Actions

Backport #13335

closed

hammer: OSD crashed when reached pool's max_bytes quota

Added by Loïc Dachary over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Target version:
Release:
hammer
Pull request ID:
Crash signature (v1):
Crash signature (v2):


Related issues 2 (0 open2 closed)

Related to Ceph - Bug #15019: hammer: fs test fails with log [ERR] : OSD full dropping all updates 100% fullDuplicateWei-Chung Cheng

Actions
Copied from Ceph - Bug #13098: OSD crashed when reached pool's max_bytes quotaResolved09/15/2015

Actions
Actions #1

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
  • Status changed from New to In Progress
  • Assignee set to Loïc Dachary
Actions #3

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
  • Status changed from In Progress to New
  • Assignee deleted (Loïc Dachary)

This needs to be adapted to hammer because a few things are different (no flag in messages, CEPH_OSD_FLAG_FULL_FORCE not implemented.

Actions #4

Updated by Alexey Sheplyakov over 8 years ago

Basically I've moved the check for a full pool to the right place (before updating the cached ObjectContext)
without changing the check itself (well, almost).

Actions #5

Updated by Loïc Dachary about 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Loïc Dachary
Actions #6

Updated by Loïc Dachary about 8 years ago

  • Description updated (diff)
Actions #7

Updated by Loïc Dachary about 8 years ago

  • Status changed from In Progress to Resolved
  • Target version set to v0.94.6
Actions #8

Updated by Loïc Dachary about 8 years ago

  • Status changed from Resolved to New
  • Assignee deleted (Loïc Dachary)
  • Target version deleted (v0.94.6)
Actions #9

Updated by Loïc Dachary about 8 years ago

The commit introduces a regression and is reverted by http://tracker.ceph.com/issues/15019

Actions #10

Updated by Loïc Dachary about 8 years ago

  • Related to Bug #15019: hammer: fs test fails with log [ERR] : OSD full dropping all updates 100% full added
Actions #11

Updated by Alexey Sheplyakov about 8 years ago

The commit introduces a regression

The commit exposes a bug in the test which assumes it's possible to write more data than the storage capacity is.
I believe that OSD should reject such writes to prevent further damage (ENOSPC handling in filesystems' code is not 100% fool proof), and it does so in Infernalis and Jewel.

and is reverted by http://tracker.ceph.com/issues/15019

I don't think reverting it is a good idea, the test case itself should be fixed instead.
Even if we want to pretend that it's possible to write 144 MB of data to a 100 MB drive
the check should be slightly modified, that is, https://github.com/ceph/ceph/blob/hammer/src/osd/ReplicatedPG.cc#L5693 should be removed,
instead of reintroducing the obc corruption. However I think checking for a full OSD is actually correct.

Actions #12

Updated by Loïc Dachary about 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Alexey Sheplyakov
Actions #13

Updated by Loïc Dachary about 8 years ago

  • Subject changed from OSD crashed when reached pool's max_bytes quota to hammer: OSD crashed when reached pool's max_bytes quota
Actions #14

Updated by Loïc Dachary about 8 years ago

  • Status changed from In Progress to Resolved
Actions #15

Updated by Loïc Dachary about 8 years ago

  • Target version set to v0.94.6
Actions

Also available in: Atom PDF