Project

General

Profile

Bug #2476

osd: watch timeout depends on operations to an object

Added by Josh Durgin over 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The watch timeout is an in-memory thing that's local to the primary. If the primary changes, the timer for ending the watch isn't started until the object context for the watched object is loaded. This normally only happens when an operation is performed on the watched object, but this results in an unbounded delay for the watch timeout, and errors like this when clients haven't accessed the image for >> 30s:

# rbd rm postgresql -p winnie-test
Removing image: 99% complete...failed.
delete error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.
2012-05-24 15:25:44.647532 7f35b8849760 -1 librbd: error removing header: (16) Device or resource busy

Associated revisions

Revision 5668e5b5
Added by Samuel Just over 10 years ago

Merge remote-tracking branch 'upstream/wip_2476' into next

Fixes: #2476
Reviewed-by: Greg Farnum <>

History

#1 Updated by Sage Weil about 11 years ago

fix qa/workunits/rbd/copy.sh when this is fixed !!!

#2 Updated by Sage Weil about 11 years ago

  • Target version deleted (v0.48)

#3 Updated by Maciej Galkiewicz about 11 years ago

Have you made any progress with this issue? It is very annoying and breaks my CI. Is there any way to avoid or workaround it? What is more my clients are not crashing. I umount and unmap the volumes before shutting down the machine. Clients are using kernel rbd (3.2.23).

#4 Updated by Samuel Just over 10 years ago

  • Status changed from New to Fix Under Review

wip_2476

#5 Updated by Ian Colle over 10 years ago

  • Assignee set to Greg Farnum

Greg, can you please review this wip branch?

#6 Updated by Greg Farnum over 10 years ago

  • Status changed from Fix Under Review to 7
  • Assignee changed from Greg Farnum to Samuel Just

This looks okay to me, but Sam doesn't remember it and has gotten nervous so now looking at it is in his queue for later today.

#7 Updated by Samuel Just over 10 years ago

  • Priority changed from Normal to Urgent

#8 Updated by Samuel Just over 10 years ago

  • Status changed from 7 to Pending Backport

#9 Updated by Samuel Just over 10 years ago

  • Status changed from Pending Backport to 7

#10 Updated by Samuel Just over 10 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF