Bug #51740
closedRandom files are lost from rados but are still visible in bucket index
0%
Description
Hello together,
we are currently facing an issue whith missing files.
We have files in the bucket index that are no longer available on the rados level.
We've create a list from the bucket index with all files that should be available in the bucket with
radosgw-admin bi list --bucket BUCKET | grep -F '"idx":' > bucketindex
and also created a list of all files rados should have for this bucket:
radosgw-admin bucket radoslist --bucket BUCKET > radoslist
Going through the radoslist, I saw around 38k of files that looked like this:
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.LRSp5qOg4cDn2ImWxeXtJlRvfLNZ-8R_1 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_1 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_2 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_3 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_4 ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_5
Diffing the radoslist against bucketindex we found around 34k of files that are not available anymore.
First we thought it might be an orphan objects cleanup, but this would have taken the radoslist as the positiv list into account.
The bucket itself does not use versioning or lifecycle policies.
The bucket holds 2.1 million files in 42 shards.
The customer said that he didn't change anything and nearly never deletes files.
We are currently trying to understand how this happened but are out of ideas.
Dan said I should open a bug ticket. And here is the link to the ML: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ZVACOYKLR5J4DNGMY3YQOPG4BJ55UFUW/
Updated by Dan van der Ster almost 3 years ago
Are there non-ascii chars in the object path/name ?
Do you have an example object: How it looks from s3cmd ls, how it's entry looks in the bi list, and if those objects are in the rados pool?
Updated by Boris B almost 3 years ago
s3cmd ls
$ s3cmd ls s3://BUCKET/a0cc56ca-4c88-47ba-a02c-a38e030dc05a/c9c3d069-8c38-40f2-9680-ea9aa8057fa9/original 2020-11-24 18:39 96222 s3://BUCKET/a0cc56ca-4c88-47ba-a02c-a38e030dc05a/c9c3d069-8c38-40f2-9680-ea9aa8057fa9/original
bucket index:
{ "type": "plain", "idx": "a0cc56ca-4c88-47ba-a02c-a38e030dc05a/c9c3d069-8c38-40f2-9680-ea9aa8057fa9/original", "entry": { "name": "a0cc56ca-4c88-47ba-a02c-a38e030dc05a/c9c3d069-8c38-40f2-9680-ea9aa8057fa9/original", "instance": "", "ver": { "pool": 10, "epoch": 441941 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 96222, "mtime": "2020-11-24 18:39:04.896507Z", "etag": "30bc12c92ff499350ce572d331a149a3", "storage_class": "", "owner": "ead7468e-e980-49fe-87fc-44e8187dd176", "owner_display_name": "ead7468e-e980-49fe-87fc-44e8187dd176", "content_type": "application/pdf", "accounted_size": 96222, "user_data": "", "appendable": "false" }, "tag": "ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821596.17051896", "flags": 0, "pending_map": [], "versioned_epoch": 0 } },
This example is neither in the listing of "radosgw-admin bucket radoslist", nor is it in the listing of "rados -p POOL ls".
Updated by Boris B almost 3 years ago
Good morning everybody,
we've dug further into it but still don't know how this could happen.What we ruled out for now:
- Orphan objects cleanup process.
- There is only one bucket with missing data (I checked all other buckets yesterday)
- The "keep this files" list is generated by radosgw-admin bukcet rados list. I would doubt that there were files listed, that are accessible via radosgw
- The deleted files are somewhat random, but always with their corresponding counterparts (per folder there are 2-3 files that belong together)
- Customer remove his data, but radosgw didn't clean up the bucket index
- there are no delete requests in the buckets usage log.
- customer told us, that they do not have a delete job for this bucket
So I am lost with ideas that I could check, and hope that you people might be able to help with further ideas.
Updated by Boris B almost 3 years ago
Hi everybody,
we found the issue.
It was a cleanup script that didn't work correctly.
Basically it removed files via rados and the bucket index didn't update.
Cheers
Boris