Project

General

Profile

Actions

Bug #63613

open

[rgw][lc] using custom lc schedule (work time) may cause lc processing to stall

Added by Oguzhan Ozmen 6 months ago. Updated 5 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
rgw lifecycle backport_processed
Backport:
quincy reef pacific
Regression:
No
Severity:
4 - irritation
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We use different lc processing time windows for our different clusters utilizing the knob rgw_lifecycle_work_time ([[https://bbgithub.dev.bloomberg.com/ceph/ceph/blob/e877333f07d0eeb574572674cbcdefc9f07e231a/src/common/options/rgw.yaml.in#L358]]).

We noticed this issue when we tried to start LC processing at 2PM local time and allow it to run for 24 hours - which translates into "14:00:13:59" - at one of our clusters. However, LC processing stalled completely for several days after applying this setting.

After analyzing the extended logs, we realized that the logic used in the function that decides whether LC should start running at the current time (i.e., "should_work" function at https://bbgithub.dev.bloomberg.com/ceph/ceph/blob/e877333f07d0eeb574572674cbcdefc9f07e231a/src/rgw/rgw_lc.cc#L2431) doesn't take "next date" notion into account. As a result, any custom work time "XY:TW-AB:CD" breaks LC processing when AB < XY.


Related issues 3 (2 open1 closed)

Copied to rgw - Backport #63776: quincy: [rgw][lc] using custom lc schedule (work time) may cause lc processing to stallIn ProgressMykola GolubActions
Copied to rgw - Backport #63777: reef: [rgw][lc] using custom lc schedule (work time) may cause lc processing to stallIn ProgressMykola GolubActions
Copied to rgw - Backport #63787: pacific: [rgw][lc] using custom lc schedule (work time) may cause lc processing to stallRejectedMykola GolubActions
Actions

Also available in: Atom PDF