Project

General

Profile

Bug #52964

garbage collection doesn't remove gc list entries if the object's pool doesn't exist

Added by Semyon Poklad about 1 month ago. Updated about 1 month ago.

Status:
Triaged
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
gc
Backport:
octopus pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

How is reproduced

added file to bucket from S3 browser


 radosgw-admin --bucket=support-files bucket radoslist | wc -l

 96

 ceph df

 --- RAW STORAGE ---

 CLASS    SIZE   AVAIL     USED  RAW USED  %RAW USED

 hdd    44 TiB  44 TiB  4.7 GiB   4.7 GiB       0.01

 TOTAL  44 TiB  44 TiB  4.7 GiB   4.7 GiB       0.01

--- POOLS ---

POOL                        ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL

device_health_metrics       21    1  242 KiB        6  727 KiB      0     14 TiB

.rgw.root                   22   32  3.2 KiB        6   72 KiB      0     14 TiB

default.rgw.log             23   32   61 KiB      209  600 KiB      0     14 TiB

default.rgw.control         24   32      0 B        8      0 B      0     14 TiB

default.rgw.meta            25    8  1.9 KiB        9   96 KiB      0     14 TiB

default.rgw.buckets.index   26    8   47 KiB       22  142 KiB      0     14 TiB

default.rgw.buckets.data    27   32  377 MiB       96  1.1 GiB      0     14 TiB

default.rgw.buckets.non-ec  28   32  110 KiB        0  330 KiB      0     14 TiB

test.bucket.data            30   32      0 B        0      0 B      0     14 TiB

removed file from bucket from s3 Browser


 radosgw-admin --bucket=support-files bucket radoslist | wc -l

 0

 ceph df

 --- RAW STORAGE ---

 CLASS    SIZE   AVAIL     USED  RAW USED  %RAW USED

 hdd    44 TiB  44 TiB  4.7 GiB   4.7 GiB       0.01

 TOTAL  44 TiB  44 TiB  4.7 GiB   4.7 GiB       0.01

 --- POOLS ---

POOL                        ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL

 device_health_metrics       21    1  242 KiB        6  727 KiB      0     14 TiB

 .rgw.root                   22   32  3.2 KiB        6   72 KiB      0     14 TiB

 default.rgw.log             23   32   89 KiB      209  696 KiB      0     14 TiB

 default.rgw.control         24   32      0 B        8      0 B      0     14 TiB

 default.rgw.meta            25    8  1.9 KiB        9   96 KiB      0     14 TiB

 default.rgw.buckets.index   26    8   47 KiB       22  142 KiB      0     14 TiB

 default.rgw.buckets.data    27   32  377 MiB       95  1.1 GiB      0     14 TiB

 default.rgw.buckets.non-ec  28   32  110 KiB        0  330 KiB      0     14 TiB

 test.bucket.data            30   32      0 B        0      0 B      0     14 TiB

As you can see, the pool was not actually cleaned up.

radosgw-admin --bucket=support-files bucket check, radosgw-admin --bucket=support-files gc process, radosgw-admin --bucket=support-files gc list   - clean

rados -p default.rgw.buckets.data ls | wc -l

95

rgw-orphan-list  default.rgw.buckets.data

cat ./rados-20211018094020.intermediate | wc -l

95

Objects name mask in  ./rados-20211018094020.intermediate

<bucket-id>__shadow_<file_name>~* or

<bucket-id>__multipart_<file_name>~*

There were no errors in the rados log during the deletion

2021-10-18T13:09:06.536+0300 7fcf0564a700 1 ====== starting new request req=0x7fd0e446c820 =====

2021-10-18T13:09:06.608+0300 7fcf6ef1d700 1 ====== req done req=0x7fd0e446c820 op status=0 http_status=204 latency=0.072000369s ======

2021-10-18T13:09:06.608+0300 7fcf6ef1d700 1 beast: 0x7fd0e446c820: 10.77.1.185 - nextcloud [18/Oct/2021:13:09:06.536 +0300] "DELETE /support-files/debian-11.0.0-amd64-netinst.iso HTTP/1.1" 204 0 - "S3 Browser/10.0.9 ([URL]https://s3browser.com[/URL])" - latency=0.072000369s

History

#1 Updated by Casey Bodley about 1 month ago

  • Status changed from New to Need More Info
  • Target version deleted (v16.2.6)
  • Tags set to gc
  • Affected Versions v16.2.6 added

how big was this object?

are you familiar with the garbage collection system that cleans up these deleted tail objects? see https://docs.ceph.com/en/latest/radosgw/config-ref/#garbage-collection-settings

you can use `radosgw-admin gc list --include-all` to show all files that are scheduled for garbage collection. by default, garbage collection waits 2 hours (rgw_gc_obj_min_wait) before it cleans up these objects, because clients may have started reading them just before deletion

note that there is a gc bug when objects get too large (like over 1TB) that we're tracking at https://tracker.ceph.com/issues/49823

#2 Updated by Semyon Poklad about 1 month ago

Casey Bodley wrote:

how big was this object?

are you familiar with the garbage collection system that cleans up these deleted tail objects? see https://docs.ceph.com/en/latest/radosgw/config-ref/#garbage-collection-settings

you can use `radosgw-admin gc list --include-all` to show all files that are scheduled for garbage collection. by default, garbage collection waits 2 hours (rgw_gc_obj_min_wait) before it cleans up these objects, because clients may have started reading them just before deletion

note that there is a gc bug when objects get too large (like over 1TB) that we're tracking at https://tracker.ceph.com/issues/49823

Thanks, @Casey Bodley! File ~ 300mb ."radosgw-admin --bucket = support-files gc process --include-all" actually cleared the pool. So this is not a bug.

Found another problem. I was creating a test pool and bucket that I already deleted along with the objects - bucket and pool created from dasboard, objects deleted from rados -p rm. Bucket deleted from dashboard, pool from gui pve

But
"radosgw-admin gc list --include-all" shows this

 {
                "pool": "test.bucket.data",
                "oid": "0c4323f1-afd6-44b7-b2e3-e566ed2ba18f.4472377.1__multipart_debian-11.0.0-amd64-netinst.iso.2~nNqExIsOUW-HTmP5uxFqEllTb5OCD0c.48",
                "key": "",
                "instance": "" 
            }

radosgw-admin process --include-all --debug-ms = 1 does not output errors, but rerun gc list does not show changes
The collector sees non-existent objects. radosgw-admin bucket check --fix did not affect the result
Since the objects in the default pool were manually deleted many times, I created the test.bucket.data pool again and then ran the gc process -> gc list -clean!
It is not obvious behavior that the collector tries to work with non-existent objects and pools and does not give an error message.

#3 Updated by Casey Bodley about 1 month ago

  • Subject changed from pacific_rgw: when deleting large files, objects *multipart* and *shadows* are not deleted but become orphans to garbage collection doesn't remove gc list entries if the object's pool doesn't exist

#4 Updated by Casey Bodley about 1 month ago

  • Assignee set to Pritha Srivastava
  • Backport set to octopus pacific

#5 Updated by Casey Bodley about 1 month ago

  • Status changed from Need More Info to New

#6 Updated by Casey Bodley about 1 month ago

  • Status changed from New to Triaged

#7 Updated by Pritha Srivastava about 1 month ago

  • Status changed from Triaged to Need More Info

Can you give me the exact steps that you tried for the second bug that you are reporting,

I understand that you have a test pool, and then you created and objects. And then you deleted them, and when you run a gc list, it gives some o/p, but when you run gc process, it doesn't delete anything? and you still see the same o/p from gc list command? And I understand that these are multipart objects. Is there anything else that you would like to add to this?

#8 Updated by Semyon Poklad about 1 month ago

Pritha Srivastava wrote:

Can you give me the exact steps that you tried for the second bug that you are reporting,

I understand that you have a test pool, and then you created and objects. And then you deleted them, and when you run a gc list, it gives some o/p, but when you run gc process, it doesn't delete anything? and you still see the same o/p from gc list command? And I understand that these are multipart objects. Is there anything else that you would like to add to this?

1.Created pool test.bucket.data
2.Created a bucket in this pool
3.Added objects to this bucket
4.Deleted objects (First through the browser and then rados -p rm)
5.Removed the bucket
6.Removed pool test.bucket.data

7. "radosgw-admin gc list --include-all" shows this

{
"pool": "test.bucket.data",
"oid": "0c4323f1-afd6-44b7-b2e3-e566ed2ba18f.4472377.1__multipart_debian-11.0.0-amd64-netinst.iso.2 ~ nNqExIsOUW-HTmP5uxFqEllTb5OCD0c.48",
"key": "",
"instance": ""
}
8. "radosgw-admin gc process --include-all" and "radosgw-admin bucket check --fix" -No error messages
9. Again I do "radosgw-admin gc list --include-all" and again I see this {
"pool": "test.bucket.data",
"oid": "0c4323f1-afd6-44b7-b2e3-e566ed2ba18f.4472377.1__multipart_debian-11.0.0-amd64-netinst.iso.2 ~ nNqExIsOUW-HTmP5uxFqEllTb5OCD0c.48",
"key": "",
"instance": ""
}
10.radosgw-admin process --include-all --debug-ms = 1 does not output errors
11.radosgw-admin gc list --include-all everything is still
12.Create pool test.bucket.data
13.radosgw-admin process --include-all
14.radosgw-admin gc list --include-all output is now clean.

#9 Updated by Casey Bodley about 1 month ago

  • Status changed from Need More Info to New

#10 Updated by Casey Bodley about 1 month ago

  • Status changed from New to Triaged

Also available in: Atom PDF