Project

General

Profile

Bug #10784

librbd: image has watchers - not removing

Added by Jason Dillaman about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/teuthology-2015-02-04_23:00:02-rbd-master-testing-basic-multi/740396/teuthology.log

2015-02-05T12:45:10.552 INFO:tasks.workunit.client.0.plana31.stderr:blacklisting 10.214.131.9:0/1008981 until 2015-02-05 13:45:09.563070 (3600 sec)
2015-02-05T12:45:10.566 INFO:tasks.workunit.client.0.plana31.stderr:+ wait 8981
2015-02-05T12:45:10.566 INFO:tasks.workunit.client.0.plana31.stderr:+ rbdrw_exitcode=108
2015-02-05T12:45:10.566 INFO:tasks.workunit.client.0.plana31.stderr:+ '[' 108 '!=' 108 ']'
2015-02-05T12:45:10.566 INFO:tasks.workunit.client.0.plana31.stderr:+ echo 'rbdrw stopped with ESHUTDOWN'
2015-02-05T12:45:10.567 INFO:tasks.workunit.client.0.plana31.stderr:+ set -e
2015-02-05T12:45:10.567 INFO:tasks.workunit.client.0.plana31.stderr:+ ceph osd blacklist rm 10.214.131.9:0/1008981
2015-02-05T12:45:10.567 INFO:tasks.workunit.client.0.plana31.stdout:rbdrw stopped with ESHUTDOWN
2015-02-05T12:45:12.342 INFO:tasks.workunit.client.0.plana31.stderr:10.214.131.9:0/1008981 isn't blacklisted
2015-02-05T12:45:12.355 INFO:tasks.workunit.client.0.plana31.stderr:+ rbd lock remove rbdrw-image rbdrw client.4120
2015-02-05T12:45:12.503 INFO:tasks.workunit.client.0.plana31.stderr:+ sleep 30
2015-02-05T12:45:42.505 INFO:tasks.workunit.client.0.plana31.stderr:+ rbd rm rbdrw-image
2015-02-05T12:45:42.645 INFO:tasks.workunit.client.0.plana31.stderr:2015-02-05 12:45:42.655600 7fd3e3acc840 -1 librbd: image has watchers - not removing
2015-02-05T12:45:42.674 INFO:tasks.workunit.client.0.plana31.stderr:
Removing image: 0% complete...failed.
2015-02-05T12:45:42.674 INFO:tasks.workunit.client.0.plana31.stderr:rbd: error: image still has watchers
2015-02-05T12:45:42.674 INFO:tasks.workunit.client.0.plana31.stderr:This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.

Associated revisions

Revision 418ca0c3 (diff)
Added by David Zafman about 9 years ago

osd: Update object state after removing watch from object info

Fixes: #10784

Signed-off-by: David Zafman <>

History

#1 Updated by Jason Dillaman about 9 years ago

  • Project changed from rbd to Ceph

Upon a client crash (or blacklist), the OSD doesn't appear to ever remove the dead client from the list of watchers. Easy to reproduce by running "rbd watch <image>", kill the client, then run "rbd status <image>" to see the current watchers.

#2 Updated by Samuel Just about 9 years ago

  • Priority changed from Normal to Urgent

#3 Updated by Sage Weil about 9 years ago

  • Project changed from Ceph to rbd

#4 Updated by Sage Weil about 9 years ago

  • Project changed from rbd to Ceph

#5 Updated by Sage Weil about 9 years ago

  • Assignee set to David Zafman

#6 Updated by David Zafman about 9 years ago

  • Status changed from New to Fix Under Review

#7 Updated by David Zafman about 9 years ago

Caused by Sage's change 1c6944f79: osd/ReplicatedPG: do watch effects only when change commits

In handle_watch_timeout() directly changing obs.oi

obc->obs.oi.watchers.erase(make_pair(watch->get_cookie(), watch->get_entity()));

now changes the new_obs.oi

object_info_t& oi = ctx->new_obs.oi;
oi.watchers.erase(make_pair(watch->get_cookie(),
watch->get_entity()));

#8 Updated by David Zafman about 9 years ago

  • Status changed from Fix Under Review to 7

#9 Updated by David Zafman about 9 years ago

  • Status changed from 7 to Resolved

#10 Updated by Loïc Dachary about 9 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to firefly
  • Severity changed from 3 - minor to 1 - critical

#11 Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved
  • Backport deleted (firefly)
  • Severity changed from 1 - critical to 3 - minor

Wrong interpretation on my part, revert back to the previous state.

Also available in: Atom PDF