Bug #37734: Librgw doesn't GC deleted object correctly - rgw - Ceph

Actions

Copy link

Bug #37734

closed

Librgw doesn't GC deleted object correctly

Added by Tao CHEN over 5 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

librgw

Backport:

nautilus, mimic, luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

28108

Crash signature (v1):

Crash signature (v2):

Description

Hi, recently I work on NFS. I found a bug with Librgw GC process, here is the way to reproduce:

1.set rgw_num_rados_handles = 10 (set this param as small as possible, then we can easily see the problem). set rgw_gc param to small value,too, as we can check gc list soon.
2.create a bucket and expose it with NFS-Ganesha
3.mount this export to local
4.copy 6 local file(1GB) to local path, then the file should be uploaded to bucket
5.once all 1GB file have been successfully written, remove them
6.check cluster usage with 'rados df', you will find out, some data still exists in pool(default.rgw.buckets.data) , though the file are all deleted and gc list = [].

Here is some track info:

rados df before uploading
[root@node1 ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
.rgw.root 7216 21 0 63 0 0 0 664914 432M 107 62464
default.rgw.buckets.data 21028M 6304 0 18912 0 0 0 31198 160M 66520 100803M
default.rgw.buckets.index 0 3584 0 10752 0 0 0 5742379 5608M 48338 0
default.rgw.buckets.non-ec 0 6 0 18 0 0 0 133 86016 137 0
default.rgw.control 0 512 0 1536 0 0 0 0 0 0 0
default.rgw.log 71273 1737 0 5211 0 0 0 220572996 210G 147050225 1306k
default.rgw.meta 6477 38 0 114 0 0 0 504132 411M 4873 1278k
fs_data 4157M 532 0 1596 0 0 0 18119 72473k 1244384 653G
fs_metadata 29189k 29 0 87 0 0 0 47 60416 906 29804k

total_objects 12763
total_used 83599M
total_avail 6438G
total_space 6519G

rados df after uploading
[root@node1 ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
.rgw.root 7216 21 0 63 0 0 0 665100 433M 107 62464
default.rgw.buckets.data 27172M 7841 0 23523 0 0 0 31282 160M 68253 104G
default.rgw.buckets.index 0 3584 0 10752 0 0 0 5791282 5656M 48450 0
default.rgw.buckets.non-ec 0 6 0 18 0 0 0 133 86016 137 0
default.rgw.control 0 512 0 1536 0 0 0 0 0 0 0
default.rgw.log 71273 1737 0 5211 0 0 0 220602291 210G 147069518 1306k
default.rgw.meta 6477 38 0 114 0 0 0 506865 413M 4885 1278k
fs_data 4157M 532 0 1596 0 0 0 18119 72473k 1244384 653G
fs_metadata 29189k 29 0 87 0 0 0 47 60416 906 29804k

total_objects 14300
total_used 102027M
total_avail 6420G
total_space 6519G

rados df after deleting
[root@node1 ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
.rgw.root 7216 21 0 63 0 0 0 665367 433M 107 62464
default.rgw.buckets.data 22048M 6560 0 19680 0 0 0 32649 160M 69534 104G
default.rgw.buckets.index 0 3584 0 10752 0 0 0 5792813 5658M 48474 0
default.rgw.buckets.non-ec 0 6 0 18 0 0 0 133 86016 137 0
default.rgw.control 0 512 0 1536 0 0 0 0 0 0 0
default.rgw.log 71273 1737 0 5211 0 0 0 220614002 210G 147074013 1306k
default.rgw.meta 6477 38 0 114 0 0 0 507762 414M 4885 1278k
fs_data 4157M 532 0 1596 0 0 0 18119 72473k 1244384 653G
fs_metadata 29189k 29 0 87 0 0 0 47 60416 906 29804k

total_objects 13019
total_used 86703M
total_avail 6435G
total_space 6519G

As you can see, only 5GB data are deleted, 1GB still remains.

I also print the obj unique tag:

obj1
RGW GC chain size: 255, with tail tag: b2b41852-866c-4bab-9160-3e1a1b5d7f81.6502217.0
RGW GC adding chain

obj2
RGW GC chain size: 255, with tail tag: b2b41852-866c-4bab-9160-3e1a1b5d7f81.6502208.0
RGW GC adding chain

obj3
RGW GC chain size: 255, with tail tag: b2b41852-866c-4bab-9160-3e1a1b5d7f81.6502219.0
RGW GC adding chain

obj4
RGW GC chain size: 255, with tail tag: b2b41852-866c-4bab-9160-3e1a1b5d7f81.6502220.0
RGW GC adding chain

obj5
RGW GC chain size: 255, with tail tag: b2b41852-866c-4bab-9160-3e1a1b5d7f81.6502214.0
RGW GC adding chain

obj6
RGW GC chain size: 255, with tail tag: b2b41852-866c-4bab-9160-3e1a1b5d7f81.6502214.0
RGW GC adding chain

obj5 and obj6 have same tail tag: b2b41852-866c-4bab-9160-3e1a1b5d7f81.6502214.0

this tag can be divided into 3 part: zone_param_id.rgw_rados_handle_id.RGW_Request_id

obj5 and obj6 seems share the same rgw_rados_handle, but the rgw request id are always 0. I think this is the main reason that confuse RGW GC thread

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Brad Hubbard over 5 years ago

Project changed from Ceph to rgw

Actions

Copy link

Updated by Tao CHEN over 5 years ago

Here is the patch:
https://github.com/ceph/ceph/pull/25664

Actions

Copy link

Updated by Matt Benjamin almost 5 years ago

ok, I think I get this; that said--the use of >1 rados handle is not at all recommended; that said, the fix ~~looks~~ acceptble

Actions

Copy link

Updated by Casey Bodley almost 5 years ago

Status changed from New to Fix Under Review
Pull request ID set to 28108

Actions

Copy link

Updated by Matt Benjamin almost 5 years ago

Backport set to nautilus, mimic, luminous

Actions

Copy link

Updated by Matt Benjamin almost 5 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Nathan Cutler almost 5 years ago

Copied to Backport #40106: mimic: Librgw doesn't GC deleted object correctly added

Actions

Copy link

Updated by Nathan Cutler almost 5 years ago

Copied to Backport #40107: nautilus: Librgw doesn't GC deleted object correctly added

Actions

Copy link

Updated by Nathan Cutler almost 5 years ago

Copied to Backport #40108: luminous: Librgw doesn't GC deleted object correctly added

Actions

Copy link

#10

Updated by Nathan Cutler over 3 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #37734

Librgw doesn't GC deleted object correctly

Updated by Brad Hubbard over 5 years ago

Updated by Tao CHEN over 5 years ago

Updated by Matt Benjamin almost 5 years ago

Updated by Casey Bodley almost 5 years ago

Updated by Matt Benjamin almost 5 years ago

Updated by Matt Benjamin almost 5 years ago

Updated by Nathan Cutler almost 5 years ago

Updated by Nathan Cutler almost 5 years ago

Updated by Nathan Cutler almost 5 years ago

Updated by Nathan Cutler over 3 years ago