Bug #52964
opengarbage collection doesn't remove gc list entries if the object's pool doesn't exist
0%
Description
How is reproduced
added file to bucket from S3 browser
radosgw-admin --bucket=support-files bucket radoslist | wc -l
96
ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 44 TiB 44 TiB 4.7 GiB 4.7 GiB 0.01
TOTAL 44 TiB 44 TiB 4.7 GiB 4.7 GiB 0.01
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 21 1 242 KiB 6 727 KiB 0 14 TiB
.rgw.root 22 32 3.2 KiB 6 72 KiB 0 14 TiB
default.rgw.log 23 32 61 KiB 209 600 KiB 0 14 TiB
default.rgw.control 24 32 0 B 8 0 B 0 14 TiB
default.rgw.meta 25 8 1.9 KiB 9 96 KiB 0 14 TiB
default.rgw.buckets.index 26 8 47 KiB 22 142 KiB 0 14 TiB
default.rgw.buckets.data 27 32 377 MiB 96 1.1 GiB 0 14 TiB
default.rgw.buckets.non-ec 28 32 110 KiB 0 330 KiB 0 14 TiB
test.bucket.data 30 32 0 B 0 0 B 0 14 TiB
removed file from bucket from s3 Browser
radosgw-admin --bucket=support-files bucket radoslist | wc -l
0
ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 44 TiB 44 TiB 4.7 GiB 4.7 GiB 0.01
TOTAL 44 TiB 44 TiB 4.7 GiB 4.7 GiB 0.01
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 21 1 242 KiB 6 727 KiB 0 14 TiB
.rgw.root 22 32 3.2 KiB 6 72 KiB 0 14 TiB
default.rgw.log 23 32 89 KiB 209 696 KiB 0 14 TiB
default.rgw.control 24 32 0 B 8 0 B 0 14 TiB
default.rgw.meta 25 8 1.9 KiB 9 96 KiB 0 14 TiB
default.rgw.buckets.index 26 8 47 KiB 22 142 KiB 0 14 TiB
default.rgw.buckets.data 27 32 377 MiB 95 1.1 GiB 0 14 TiB
default.rgw.buckets.non-ec 28 32 110 KiB 0 330 KiB 0 14 TiB
test.bucket.data 30 32 0 B 0 0 B 0 14 TiB
As you can see, the pool was not actually cleaned up.
radosgw-admin --bucket=support-files bucket check, radosgw-admin --bucket=support-files gc process, radosgw-admin --bucket=support-files gc list - clean
rados -p default.rgw.buckets.data ls | wc -l
95
rgw-orphan-list default.rgw.buckets.data
cat ./rados-20211018094020.intermediate | wc -l
95
Objects name mask in ./rados-20211018094020.intermediate
<bucket-id>__shadow_<file_name>~* or
<bucket-id>__multipart_<file_name>~*
There were no errors in the rados log during the deletion
2021-10-18T13:09:06.536+0300 7fcf0564a700 1 ====== starting new request req=0x7fd0e446c820 =====
2021-10-18T13:09:06.608+0300 7fcf6ef1d700 1 ====== req done req=0x7fd0e446c820 op status=0 http_status=204 latency=0.072000369s ======
2021-10-18T13:09:06.608+0300 7fcf6ef1d700 1 beast: 0x7fd0e446c820: 10.77.1.185 - nextcloud [18/Oct/2021:13:09:06.536 +0300] "DELETE /support-files/debian-11.0.0-amd64-netinst.iso HTTP/1.1" 204 0 - "S3 Browser/10.0.9 ([URL]https://s3browser.com[/URL])" - latency=0.072000369s
Updated by Casey Bodley over 2 years ago
- Status changed from New to Need More Info
- Target version deleted (
v16.2.6) - Tags set to gc
- Affected Versions v16.2.6 added
how big was this object?
are you familiar with the garbage collection system that cleans up these deleted tail objects? see https://docs.ceph.com/en/latest/radosgw/config-ref/#garbage-collection-settings
you can use `radosgw-admin gc list --include-all` to show all files that are scheduled for garbage collection. by default, garbage collection waits 2 hours (rgw_gc_obj_min_wait) before it cleans up these objects, because clients may have started reading them just before deletion
note that there is a gc bug when objects get too large (like over 1TB) that we're tracking at https://tracker.ceph.com/issues/49823
Updated by Semyon Poklad over 2 years ago
Casey Bodley wrote:
how big was this object?
are you familiar with the garbage collection system that cleans up these deleted tail objects? see https://docs.ceph.com/en/latest/radosgw/config-ref/#garbage-collection-settings
you can use `radosgw-admin gc list --include-all` to show all files that are scheduled for garbage collection. by default, garbage collection waits 2 hours (rgw_gc_obj_min_wait) before it cleans up these objects, because clients may have started reading them just before deletion
note that there is a gc bug when objects get too large (like over 1TB) that we're tracking at https://tracker.ceph.com/issues/49823
Thanks, @Casey Bodley! File ~ 300mb ."radosgw-admin --bucket = support-files gc process --include-all" actually cleared the pool. So this is not a bug.
Found another problem. I was creating a test pool and bucket that I already deleted along with the objects - bucket and pool created from dasboard, objects deleted from rados -p rm. Bucket deleted from dashboard, pool from gui pve
But
"radosgw-admin gc list --include-all" shows this
{
"pool": "test.bucket.data",
"oid": "0c4323f1-afd6-44b7-b2e3-e566ed2ba18f.4472377.1__multipart_debian-11.0.0-amd64-netinst.iso.2~nNqExIsOUW-HTmP5uxFqEllTb5OCD0c.48",
"key": "",
"instance": ""
}
radosgw-admin process --include-all --debug-ms = 1 does not output errors, but rerun gc list does not show changes
The collector sees non-existent objects. radosgw-admin bucket check --fix did not affect the result
Since the objects in the default pool were manually deleted many times, I created the test.bucket.data pool again and then ran the gc process -> gc list -clean!
It is not obvious behavior that the collector tries to work with non-existent objects and pools and does not give an error message.
Updated by Casey Bodley over 2 years ago
- Subject changed from pacific_rgw: when deleting large files, objects *multipart* and *shadows* are not deleted but become orphans to garbage collection doesn't remove gc list entries if the object's pool doesn't exist
Updated by Casey Bodley over 2 years ago
- Assignee set to Pritha Srivastava
- Backport set to octopus pacific
Updated by Casey Bodley over 2 years ago
- Status changed from Need More Info to New
Updated by Pritha Srivastava over 2 years ago
- Status changed from Triaged to Need More Info
Can you give me the exact steps that you tried for the second bug that you are reporting,
I understand that you have a test pool, and then you created and objects. And then you deleted them, and when you run a gc list, it gives some o/p, but when you run gc process, it doesn't delete anything? and you still see the same o/p from gc list command? And I understand that these are multipart objects. Is there anything else that you would like to add to this?
Updated by Semyon Poklad over 2 years ago
Pritha Srivastava wrote:
Can you give me the exact steps that you tried for the second bug that you are reporting,
I understand that you have a test pool, and then you created and objects. And then you deleted them, and when you run a gc list, it gives some o/p, but when you run gc process, it doesn't delete anything? and you still see the same o/p from gc list command? And I understand that these are multipart objects. Is there anything else that you would like to add to this?
1.Created pool test.bucket.data
2.Created a bucket in this pool
3.Added objects to this bucket
4.Deleted objects (First through the browser and then rados -p rm)
5.Removed the bucket
6.Removed pool test.bucket.data
7. "radosgw-admin gc list --include-all" shows this
{
"pool": "test.bucket.data",
"oid": "0c4323f1-afd6-44b7-b2e3-e566ed2ba18f.4472377.1__multipart_debian-11.0.0-amd64-netinst.iso.2 ~ nNqExIsOUW-HTmP5uxFqEllTb5OCD0c.48",
"key": "",
"instance": ""
}
8. "radosgw-admin gc process --include-all" and "radosgw-admin bucket check --fix" -No error messages
9. Again I do "radosgw-admin gc list --include-all" and again I see this
{
"pool": "test.bucket.data",
"oid": "0c4323f1-afd6-44b7-b2e3-e566ed2ba18f.4472377.1__multipart_debian-11.0.0-amd64-netinst.iso.2 ~ nNqExIsOUW-HTmP5uxFqEllTb5OCD0c.48",
"key": "",
"instance": ""
}
10.radosgw-admin process --include-all --debug-ms = 1 does not output errors
11.radosgw-admin gc list --include-all everything is still
12.Create pool test.bucket.data
13.radosgw-admin process --include-all
14.radosgw-admin gc list --include-all output is now clean.
Updated by Casey Bodley over 2 years ago
- Status changed from Need More Info to New