Project

General

Profile

Actions

Bug #18643

closed

SnapTrimmer: inconsistencies may lead to snaptrimmer hang

Added by Josh Durgin over 7 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
David Zafman
Category:
Snapshots
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In PrimaryLogPG::trim_object(), there are a few inconsistencies between clone state and the snapmapper that cause the osd to log an error to the cluster log, and continue in the AwaitAsyncWork state without scheduling more work.

Actions #1

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category changed from OSD to Snapshots
  • Component(RADOS) OSD added
Actions #2

Updated by David Zafman almost 5 years ago

  • Assignee set to David Zafman

Do we still need to fix something here? https://github.com/ceph/ceph/pull/15635 at least sets a pg to snaptrim_error state to indicate that an error has blocked snaptrim.

Is this claiming that snaptrim is hung for PGs other than the one with errors?

Actions #3

Updated by Josh Durgin over 4 years ago

  • Priority changed from High to Normal
Actions #4

Updated by Greg Farnum over 4 years ago

  • Status changed from New to Closed

This no longer seems to be the case. If trim_object() returns an error to its sole caller, PrimaryLogPG::AwaitAsyncWork::react(const DoSnapWork&) will return a transit to one of WaitRepops, WaitRWLock, NotTrimming.

Actions

Also available in: Atom PDF