Project

General

Profile

Actions

Bug #58190

closed

Large RGW GC queue might prevent OSD from starting

Added by Igor Fedotov over 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It looks like rgw_gc_queue_list_entries might cause HDD-based OSD to load the queue for more than half an hour.
Which causes OSD being marked as down shortly after restart.
The root cause is that the queue (or rather some entry in it) takes approx. 60 MB and rgw_gc_queue_list_entries reads it through pretty inefficient CLS queue_list_entries using 1K (sic!) read chunks. Which in turn permits 7-8 reads from a spinning drive.
Relevant OSD log is attached.


Files

TextFile5.txt (29.6 KB) TextFile5.txt very long running rgw_gc_queue_list_entries() (the beginning and a middle part) Igor Fedotov, 12/06/2022 04:25 PM

Related issues 3 (1 open2 closed)

Related to rgw - Bug #53585: RGW Garbage collector leads to slow ops and osd down when removing large objectNewPritha Srivastava

Actions
Copied to rgw - Backport #58579: pacific: Large RGW GC queue might prevent OSD from startingResolvedActions
Copied to rgw - Backport #58580: quincy: Large RGW GC queue might prevent OSD from startingResolvedActions
Actions

Also available in: Atom PDF