Project

General

Profile

Actions

Bug #63355

open

Bug #62449: test/cls_2pc_queue: TestCls2PCQueue.MultiProducer and TestCls2PCQueue.AsyncConsumer failure

test/cls_2pc_queue: fails during migration tests

Added by Yuval Lifshitz 7 months ago. Updated 6 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
test-failure backport_processed
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

as part of the fix here: https://github.com/ceph/ceph/pull/52439/
the structure describing the entry deletion operation was changed, and a field holding the number of deleted entries was added to it.
this field is set by the client, and used to update the queue stats.
however, when an old ("reef" or earlier) client sent this operation to a new OSD, it fails to parse the op structure.
it expects: cls_2pc_queue_remove_op
but the old cleint sends: cls_queue_remove_op

as a solution, best option would be to try and decode the op as "cls_2pc_queue_remove_op" and if this fails to try and decode as "cls_queue_remove_op".
then, use the entry listing operation in oder to count how many entries should be deleted, and update the stats accordingly.
this would have a performance cost, but only during the period when there is a cluster with different versions of OSDs and RGWs.

another option, is to perform the entry deletion without updating the stats, and add a new "radosgw-admin" command to recalculate the stats by going through all of the entries in the queue.

Actions

Also available in: Atom PDF