Project

General

Profile

Actions

Bug #49446

open

rgw: RGWLC::process() may starve some bucket LCs at the end of a lc.N

Added by Jeegn Chen about 3 years ago. Updated over 1 year ago.

Status:
Pending Backport
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
lifecycle backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Current implementation of RGWLC::process() implicitly assumes that all
the buckets in one lc.N can be processed within one day.
So the following snippet will always reset the marker to the beginning
of a lc.N when the lc.N is processed for the first time in a day.

    if(!if_already_run_today(head.start_date)) {
      head.start_date = now;
      head.marker.clear();
      ret = bucket_lc_prepare(index);
      if (ret < 0) {
      ldpp_dout(this, 0) << "RGWLC::process() failed to update lc object " 
          << obj_names[index] << ", ret=" << ret << dendl;
      goto exit;
      }
    }

However, this assumption may not be true, when the number of bucket
lifecycle configurations or the number of objects within buckets became
large enough. In these scenarios, the logic above may reset the marker
to the beginning when it was just pointing to some key in the middle of
lc.N. As a result, the bucket lifecycles after that key will get
no chance to be processed.

Maybe a more reasonable logic is that we should check whether the marker
is pointing to the end of a lc.N first. If yes, it is OK to reset.
But if it is not, the marker should be reserved.


Related issues 2 (1 open1 closed)

Copied to rgw - Backport #55246: pacific: rgw: RGWLC::process() may starve some bucket LCs at the end of a lc.NRejectedMatt BenjaminActions
Copied to rgw - Backport #58329: quincy: rgw: RGWLC::process() may starve some bucket LCs at the end of a lc.NIn ProgressMatt BenjaminActions
Actions #1

Updated by Jeegn Chen about 3 years ago

It seems that lastest implmentation use this snippet to address the issue, which assumes that a bucket should take less than 2 days to complete a lifecycle processing.

    if (! (cct->_conf->rgw_lc_lock_max_time == 9969)) {
      ret = sal_lc->get_entry(obj_names[index], head.marker, entry);
      if (ret >= 0) {
    if (entry.status == lc_processing) {
      if (expired_session(entry.start_time)) {
        dout(5) << "RGWLC::process(): STALE lc session found for: " << entry
            << " index: " << index << " worker ix: " << worker->ix
            << " (clearing)" 
            << dendl;
      } else {
        dout(5) << "RGWLC::process(): ACTIVE entry: " << entry
            << " index: " << index << " worker ix: " << worker->ix
          << dendl;
        goto exit;
      }
    }
      }
    }

    if(!if_already_run_today(head.start_date) ||
       once) {
      head.start_date = now;
      head.marker.clear();
      ret = bucket_lc_prepare(index, worker);
      if (ret < 0) {
      ldpp_dout(this, 0) << "RGWLC::process() failed to update lc object " 
             << obj_names[index]
             << ", ret=" << ret
             << dendl;
      goto exit;
      }
    }
bool RGWLC::expired_session(time_t started)
{
  time_t interval = (cct->_conf->rgw_lc_debug_interval > 0)
    ? cct->_conf->rgw_lc_debug_interval
    : 24*60*60;

  auto now = time(nullptr);

  dout(16) << "RGWLC::expired_session" 
       << " started: " << started
       << " interval: " << interval << "(*2==" << 2*interval << ")" 
       << " now: " << now
       << dendl;

  return (started + 2*interval < now);
}
Actions #2

Updated by Matt Benjamin about 3 years ago

Hi,

I agree with your analysis overall. Wrt your comment #1, the current 2*interval logic is arbitrary and fine to change--but we need to preserve the invariant that only one LC.n process will be running at any time.

regards,

Matt

Actions #3

Updated by Casey Bodley about 3 years ago

  • Status changed from New to Triaged
  • Tags set to lifecycle
Actions #5

Updated by Casey Bodley over 2 years ago

  • Status changed from Triaged to Fix Under Review
  • Assignee set to Matt Benjamin
Actions #6

Updated by Matt Benjamin about 2 years ago

  • Backport set to pacific
Actions #7

Updated by Matt Benjamin about 2 years ago

  • Priority changed from Normal to High
Actions #8

Updated by Matt Benjamin about 2 years ago

  • Pull request ID set to 40703
Actions #9

Updated by Matt Benjamin about 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #10

Updated by Backport Bot about 2 years ago

  • Copied to Backport #55246: pacific: rgw: RGWLC::process() may starve some bucket LCs at the end of a lc.N added
Actions #11

Updated by Backport Bot over 1 year ago

  • Tags changed from lifecycle to lifecycle backport_processed
Actions #12

Updated by Matt Benjamin over 1 year ago

  • Tags changed from lifecycle backport_processed to lifecycle
  • Backport changed from pacific to quincy, pacific
Actions #13

Updated by Matt Benjamin over 1 year ago

  • Copied to Backport #58329: quincy: rgw: RGWLC::process() may starve some bucket LCs at the end of a lc.N added
Actions #14

Updated by Matt Benjamin over 1 year ago

  • Tags changed from lifecycle to lifecycle backport_processed
Actions

Also available in: Atom PDF