Project

General

Profile

Actions

Bug #58673

closed

When bucket index ops are cancelled it can leave behind zombie index entries

Added by Cory Snyder about 1 year ago. Updated 9 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

100%

Source:
Tags:
cls_rgw backport_processed
Backport:
quincy,pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We discovered that there were a significant number of extra bucket index entries for some of our buckets and found that these entries all pointed to objects which no longer existed. In our case, we traced this back to a scenario where a particular client commonly issues multiple simultaneous delete requests for the same object keys. The first racing delete request succeeds, but the second on results in an ECANCELED error due to a failed cmpxattr check [1] set by a prepare_atomic_modification call [2]. The ECANCELED error causes the index op to be canceled [3], but the osd cls logic for index op cancellation doesn't remove the index entry. The zombie index entry is never cleaned up. It looks like this could possibly manifest itself in other scenarios as well, whenever an index op is canceled for an index entry that otherwise shouldn't exist and has no other pending modifications.

[1] https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_rados.cc#L5833
[2] https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_rados.cc#L5254
[3] https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_rados.cc#L5293


Related issues 3 (0 open3 closed)

Related to rgw - Bug #59164: LC rules cause latency spikesCan't reproduce

Actions
Copied to rgw - Backport #58767: pacific: When bucket index ops are cancelled it can leave behind zombie index entriesResolvedCasey BodleyActions
Copied to rgw - Backport #58768: quincy: When bucket index ops are cancelled it can leave behind zombie index entriesResolvedCasey BodleyActions
Actions #1

Updated by Casey Bodley about 1 year ago

  • Status changed from New to Fix Under Review
  • Tags set to cls_rgw
Actions #2

Updated by J. Eric Ivancich about 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #3

Updated by Backport Bot about 1 year ago

  • Copied to Backport #58767: pacific: When bucket index ops are cancelled it can leave behind zombie index entries added
Actions #4

Updated by Backport Bot about 1 year ago

  • Copied to Backport #58768: quincy: When bucket index ops are cancelled it can leave behind zombie index entries added
Actions #5

Updated by Backport Bot about 1 year ago

  • Tags changed from cls_rgw to cls_rgw backport_processed
Actions #6

Updated by Cory Snyder 11 months ago

  • Related to Bug #59164: LC rules cause latency spikes added
Actions #7

Updated by Konstantin Shalygin 9 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF