Project

General

Profile

Actions

Bug #51740

closed

Random files are lost from rados but are still visible in bucket index

Added by Boris B almost 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello together,
we are currently facing an issue whith missing files.

We have files in the bucket index that are no longer available on the rados level.

We've create a list from the bucket index with all files that should be available in the bucket with

radosgw-admin bi list --bucket BUCKET | grep -F '"idx":' > bucketindex

and also created a list of all files rados should have for this bucket:
radosgw-admin bucket radoslist --bucket BUCKET > radoslist

Going through the radoslist, I saw around 38k of files that looked like this:

ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.LRSp5qOg4cDn2ImWxeXtJlRvfLNZ-8R_1
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_1
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_2
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_3
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_4
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_5

Diffing the radoslist against bucketindex we found around 34k of files that are not available anymore.

First we thought it might be an orphan objects cleanup, but this would have taken the radoslist as the positiv list into account.
The bucket itself does not use versioning or lifecycle policies.
The bucket holds 2.1 million files in 42 shards.

The customer said that he didn't change anything and nearly never deletes files.

We are currently trying to understand how this happened but are out of ideas.

Dan said I should open a bug ticket. And here is the link to the ML: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ZVACOYKLR5J4DNGMY3YQOPG4BJ55UFUW/

Actions #1

Updated by Dan van der Ster almost 3 years ago

Are there non-ascii chars in the object path/name ?

Do you have an example object: How it looks from s3cmd ls, how it's entry looks in the bi list, and if those objects are in the rados pool?

Actions #2

Updated by Boris B almost 3 years ago

s3cmd ls

$ s3cmd ls s3://BUCKET/a0cc56ca-4c88-47ba-a02c-a38e030dc05a/c9c3d069-8c38-40f2-9680-ea9aa8057fa9/original
2020-11-24 18:39        96222  s3://BUCKET/a0cc56ca-4c88-47ba-a02c-a38e030dc05a/c9c3d069-8c38-40f2-9680-ea9aa8057fa9/original

bucket index:

    {
        "type": "plain",
        "idx": "a0cc56ca-4c88-47ba-a02c-a38e030dc05a/c9c3d069-8c38-40f2-9680-ea9aa8057fa9/original",
        "entry": {
            "name": "a0cc56ca-4c88-47ba-a02c-a38e030dc05a/c9c3d069-8c38-40f2-9680-ea9aa8057fa9/original",
            "instance": "",
            "ver": {
                "pool": 10,
                "epoch": 441941
            },
            "locator": "",
            "exists": "true",
            "meta": {
                "category": 1,
                "size": 96222,
                "mtime": "2020-11-24 18:39:04.896507Z",
                "etag": "30bc12c92ff499350ce572d331a149a3",
                "storage_class": "",
                "owner": "ead7468e-e980-49fe-87fc-44e8187dd176",
                "owner_display_name": "ead7468e-e980-49fe-87fc-44e8187dd176",
                "content_type": "application/pdf",
                "accounted_size": 96222,
                "user_data": "",
                "appendable": "false" 
            },
            "tag": "ff7a8b0c-07e6-463a-861b-78f0adeba8ad.83821596.17051896",
            "flags": 0,
            "pending_map": [],
            "versioned_epoch": 0
        }
    },

This example is neither in the listing of "radosgw-admin bucket radoslist", nor is it in the listing of "rados -p POOL ls".

Actions #3

Updated by Boris B almost 3 years ago

Good morning everybody,

we've dug further into it but still don't know how this could happen.
What we ruled out for now:
  • Orphan objects cleanup process.
    • There is only one bucket with missing data (I checked all other buckets yesterday)
    • The "keep this files" list is generated by radosgw-admin bukcet rados list. I would doubt that there were files listed, that are accessible via radosgw
    • The deleted files are somewhat random, but always with their corresponding counterparts (per folder there are 2-3 files that belong together)
  • Customer remove his data, but radosgw didn't clean up the bucket index
    • there are no delete requests in the buckets usage log.
    • customer told us, that they do not have a delete job for this bucket

So I am lost with ideas that I could check, and hope that you people might be able to help with further ideas.

Actions #4

Updated by Boris B almost 3 years ago

Hi everybody,

we found the issue.
It was a cleanup script that didn't work correctly.
Basically it removed files via rados and the bucket index didn't update.

Cheers
Boris

Actions #5

Updated by Casey Bodley over 2 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF