Bug #61359
closed
Consistency bugs with OLH objects
Added by Cory Snyder 12 months ago.
Updated about 1 month ago.
Backport:
pacific quincy reef
Affected Versions:
Ceph - v16.0.0,
Ceph - v16.0.1,
Ceph - v16.1.0,
Ceph - v16.1.1,
Ceph - v16.2.0,
Ceph - v16.2.1,
Ceph - v16.2.10,
Ceph - v16.2.11,
Ceph - v16.2.12,
Ceph - v16.2.13,
Ceph - v16.2.2,
Ceph - v16.2.3,
Ceph - v16.2.4,
Ceph - v16.2.5,
Ceph - v16.2.6,
Ceph - v16.2.7,
Ceph - v16.2.8,
Ceph - v16.2.9,
Ceph - v17.0.0,
Ceph - v17.2.1,
Ceph - v17.2.2,
Ceph - v17.2.3,
Ceph - v17.2.4,
Ceph - v17.2.5,
Ceph - v17.2.6
Description
When a PUT request is waiting on reshard, it does not properly update the bucket reference post-reshard and fails after storing the object instance, but before linking it into the bucket index. This results in us having the object stored on disk and accounted for in the bucket stats, but not visible in bucket listings. Additionally, it initializes the OLH RADOS object but never adds the user.rgw.olh.info xattr (which informs the is_olh() predicate). This means that future GET requests for that key return a 200 with an empty object since the OLH is recognized as a plain unversioned object. This can wreak havoc on clients that use well-known keys to store formatted data and fail to parse an unexpectedly empty object.
This was fixed on master and in Reef as part of the multi-site changes [1], but we could use a test case to ensure there are no future regressions on those branches. We need backports of [1] for Quincy and Pacific.
There is also a need for index cleanup tooling since buckets affected by this issue have inconsistent stats, inconsistent OLH RADOS objects, and dark data instance objects.
[1] https://github.com/ceph/ceph/commit/f57973725feeaa84321884c8eebc048989446572
- Affected Versions v16.0.0, v16.0.1, v16.1.0, v16.1.1, v16.2.0, v16.2.1, v16.2.10, v16.2.11, v16.2.12, v16.2.13, v16.2.2, v16.2.3, v16.2.4, v16.2.5, v16.2.6, v16.2.7, v16.2.8, v16.2.9, v17.0.0, v17.2.1, v17.2.2, v17.2.3, v17.2.4, v17.2.5 added
- Pull request ID set to 51700
- Related to Bug #50552: rgw: set_olh return -2 when resharding added
- Related to Bug #59663: rgw: expired delete markers created by deleting non-existant object multiple times are not being removed from data pool after deletion from bucket added
- Related to Bug #59164: LC rules cause latency spikes added
- Status changed from New to Fix Under Review
- Backport changed from pacific quincy to pacific quincy reef
tagged for reef since we'll at least want the recovery command there
- Subject changed from PUT requests during reshard of versioned bucket fail with 404 and leave behind dark data to Consistency bugs with OLH objects
- Related to Bug #61710: quincy/pacific: PUT requests during reshard of versioned bucket fail with 404 and leave behind dark data added
- Status changed from Fix Under Review to Pending Backport
- Copied to Backport #62064: pacific: Consistency bugs with OLH objects added
- Tags set to backport_processed
- Related to Bug #62075: New radosgw-admin commands to cleanup leftover OLH index entries and unlinked instance objects added
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100
Also available in: Atom
PDF