Project

General

Profile

Actions

Bug #53927

open

rgwlc: version of cls_rgw_lc_get_entry_ret is not updated

Added by Jeegn Chen over 2 years ago. Updated over 1 year ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
lifecycle backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In commit https://github.com/ceph/ceph/commit/394750597656d4f3ab7b8220af7046753117d39b, the type of entry of cls_rgw_lc_get_entry_ret is changed from rgw_lc_entry_t (which is a pair) to cls_rgw_lc_entry, but the struct version numbers of cls_rgw_lc_get_entry_ret in encode() and decode() methods are not updated and backward compatibility seems not addressed either.


Related issues 1 (1 open0 closed)

Copied to rgw - Backport #54151: quincy: rgwlc: version of cls_rgw_lc_get_entry_ret is not updatedNewActions
Actions #1

Updated by Casey Bodley over 2 years ago

  • Assignee set to Matt Benjamin
  • Tags set to lifecycle
Actions #2

Updated by Matt Benjamin over 2 years ago

Hmm. There was pretty elaborate attention to backward compatbility in the whole process, and it's been validated, but maybe I missed something.

Actions #3

Updated by Matt Benjamin over 2 years ago

Jeegn Chen,

Do you have a proposed change to address this?

Matt

Actions #4

Updated by Jeegn Chen over 2 years ago

Matt Benjamin wrote:

Hmm. There was pretty elaborate attention to backward compatbility in the whole process, and it's been validated, but maybe I missed something.

The issue was not a problem when https://github.com/ceph/ceph/commit/394750597656d4f3ab7b8220af7046753117d39b was introduced for the first time maybe because "radosgw-admin lc process --bucket"(https://github.com/ceph/ceph/pull/44139), which involves cls_rgw_lc_get_entry_ret, had not been implemented at that time.

For now, to make "radosgw-admin lc process --bucket" work after upgrade, "radosgw-admin lc process" needs to be executed to reset all LC entries first.

Actions #5

Updated by Matt Benjamin over 2 years ago

ah, ok, now I get it, Jeegn Chen. Thanks :)

Matt

Actions #6

Updated by Matt Benjamin over 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 44743
Actions #7

Updated by Jeegn Chen over 2 years ago

Jeegn Chen wrote:

Matt Benjamin wrote:

Hmm. There was pretty elaborate attention to backward compatbility in the whole process, and it's been validated, but maybe I missed something.

The issue was not a problem when https://github.com/ceph/ceph/commit/394750597656d4f3ab7b8220af7046753117d39b was introduced for the first time maybe because "radosgw-admin lc process --bucket"(https://github.com/ceph/ceph/pull/44139), which involves cls_rgw_lc_get_entry_ret, had not been implemented at that time.

For now, to make "radosgw-admin lc process --bucket" work after upgrade, "radosgw-admin lc process" needs to be executed to reset all LC entries first.

Hi, Matt. Very sorry for misleading.

After more tests, we just noticed that the root cause is not cls_rgw_lc_get_entry_ret although it should a good convention to update struct version for each format change.
Instead, it is rgw_cls_lc_get_entry. If rgw_cls_lc_get_entry failed to decode when assuming the type as cls_rgw_lc_entry, rgw_cls_lc_get_entry may need to retry with pair<std::string, int> for backward compatibility.

Actions #8

Updated by Casey Bodley about 2 years ago

  • Backport set to quincy
Actions #9

Updated by Matt Benjamin about 2 years ago

Hi Jeegn-Chen,

Ok, I'll add that as well in the same PR.

Matt

Actions #10

Updated by Matt Benjamin about 2 years ago

Hi Jeegn-Chen,

Ok, on review, let's take a look at rgw_cls_lc_get_entry(...):

int cls_rgw_lc_get_entry(IoCtx& io_ctx, const string& oid,
             const std::string& marker, cls_rgw_lc_entry& entry)
{
  bufferlist in, out;
  cls_rgw_lc_get_entry_op call{marker};;
  encode(call, in);
  int r = io_ctx.exec(oid, RGW_CLASS, RGW_LC_GET_ENTRY, in, out);

  if (r < 0) {
    return r;
  }

  cls_rgw_lc_get_entry_ret ret;
  try {
    auto iter = out.cbegin();
    decode(ret, iter);
  } catch (ceph::buffer::error& err) {
    return -EIO;
  }

  entry = std::move(ret.entry);
  return r;
}

What this does, of course, is:
1. dispatch an op, whose struct type is cls_rgw_lc_get_entry_op
2. whose result is an unserialized cls_rgw_lc_get_entry_ret
3. which, because it uses the standard Ceph encoding machinery, essentially "decodes itself" according to the logic in the decode(...) member function of cls_rgw_lc_get_entry_ret
Now, PR 44743 attempts to fix the decode method of cls_rgw_lc_get_entry_ret, in such a way that if either a new or old-style serialization is returned at version 1, we would be able to decode it an return it as a (potentially lobotomized) instance of the current struct type.
I think that's the correct behavior, actually. I didn't test it, so maybe you tried the PR and it doesn't work as expected? I'm going to write a test case that constructs the old serialized form and proves the new decoder can decode it.

Matt

Actions #11

Updated by Casey Bodley about 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #12

Updated by Backport Bot about 2 years ago

  • Copied to Backport #54151: quincy: rgwlc: version of cls_rgw_lc_get_entry_ret is not updated added
Actions #13

Updated by Backport Bot over 1 year ago

  • Tags changed from lifecycle to lifecycle backport_processed
Actions

Also available in: Atom PDF