Project

General

Profile

Actions

Bug #61359

closed

Consistency bugs with OLH objects

Added by Cory Snyder 12 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
% Done:

100%

Source:
Community (dev)
Tags:
backport_processed
Backport:
pacific quincy reef
Regression:
No
Severity:
1 - critical
Reviewed:

Description

When a PUT request is waiting on reshard, it does not properly update the bucket reference post-reshard and fails after storing the object instance, but before linking it into the bucket index. This results in us having the object stored on disk and accounted for in the bucket stats, but not visible in bucket listings. Additionally, it initializes the OLH RADOS object but never adds the user.rgw.olh.info xattr (which informs the is_olh() predicate). This means that future GET requests for that key return a 200 with an empty object since the OLH is recognized as a plain unversioned object. This can wreak havoc on clients that use well-known keys to store formatted data and fail to parse an unexpectedly empty object.

This was fixed on master and in Reef as part of the multi-site changes [1], but we could use a test case to ensure there are no future regressions on those branches. We need backports of [1] for Quincy and Pacific.

There is also a need for index cleanup tooling since buckets affected by this issue have inconsistent stats, inconsistent OLH RADOS objects, and dark data instance objects.

[1] https://github.com/ceph/ceph/commit/f57973725feeaa84321884c8eebc048989446572


Related issues 8 (1 open7 closed)

Related to rgw - Bug #50552: rgw: set_olh return -2 when reshardingTriagedMark Kogan

Actions
Related to rgw - Bug #59663: rgw: expired delete markers created by deleting non-existant object multiple times are not being removed from data pool after deletion from bucketResolvedCory Snyder

Actions
Related to rgw - Bug #59164: LC rules cause latency spikesCan't reproduce

Actions
Related to rgw - Bug #61710: quincy/pacific: PUT requests during reshard of versioned bucket fail with 404 and leave behind dark dataWon't FixCory Snyder

Actions
Related to rgw - Bug #62075: New radosgw-admin commands to cleanup leftover OLH index entries and unlinked instance objectsResolvedCory Snyder

Actions
Copied to rgw - Backport #62064: pacific: Consistency bugs with OLH objectsResolvedCory SnyderActions
Copied to rgw - Backport #62065: reef: Consistency bugs with OLH objectsResolvedCory SnyderActions
Copied to rgw - Backport #62066: quincy: Consistency bugs with OLH objectsResolvedCory SnyderActions
Actions

Also available in: Atom PDF