Project

General

Profile

Actions

Bug #63642

open

rgw: rados objects wrongly deleted

Added by xiaobao wen 5 months ago. Updated 2 months ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
multipart backport_processed
Backport:
pacific quincy reef
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Related issues 4 (1 open3 closed)

Is duplicate of rgw - Bug #63597: rgw: multi-part upload will make head object metadata error during a breakpoint continuation by using aws java Signature Version 4Duplicate

Actions
Copied to rgw - Backport #64425: quincy: rgw: rados objects wrongly deletedIn ProgressCasey BodleyActions
Copied to rgw - Backport #64426: reef: rgw: rados objects wrongly deletedResolvedCasey BodleyActions
Copied to rgw - Backport #64427: pacific: rgw: rados objects wrongly deletedResolvedCasey BodleyActions
Actions #1

Updated by xiaobao wen 5 months ago

We encountered data loss when using multipart upload. We found that some rados objects were lost.

Logs on production environment

  1. s3cmd get with 404 failed
    xiaobaowen@pc:~$ s3cmd get s3://prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1 /tmp/
    download: 's3://prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1' -> '/tmp/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1'  [1 of 1]
      318767104 of 1260015979    25% in    3s    95.06 MB/s  failed
    WARNING: Retrying failed request: /62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1 (EOF from S3!)
    WARNING: Waiting 3 sec...
    download: 's3://prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1' -> '/tmp/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1'  [1 of 1]
    ERROR: Download of '/tmp/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1' failed (Reason: 404 (NoSuchKey))
    ERROR: S3 error: 404 (NoSuchKey)
    
  1. logs when s3 object uploaded
    2023-11-14T00:47:35.719+0000 7fcbabab1700  1 beast: 0x7fcb8bfb0620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:47:28.937 +0000] "PUT /prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1?partNumber=11&uploadId=2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL HTTP/1.1" 404 20972012 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=6.782052517s
    2023-11-14T00:47:35.999+0000 7fcc483ea700  1 beast: 0x7fcb8ae8e620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:47:34.934 +0000] "PUT /prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1?partNumber=11&uploadId=2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.065008283s
    
  1. Missing rados object
    [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_1
    bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_1 mtime 2023-11-14T08:47:35.000000+0800, size 4194304
    [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_2
     error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_2: (2) No such file or directory
    [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_3
     error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_3: (2) No such file or directory
    [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_4
     error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_4: (2) No such file or directory
    [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_5
    bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_5 mtime 2023-11-14T08:47:35.000000+0800, size 4194304
    [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_6
    bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_6 mtime 2023-11-14T08:47:35.000000+0800, size 4194304
    [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_7
    bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_7 mtime 2023-11-14T08:47:35.000000+0800, size 2097152
    
Actions #2

Updated by xiaobao wen 5 months ago

We checked the s3 user's logs. The multipart upload retries were automatically attempted by the s3-transfer SDK.

We suspect that the retry action of UploadPart has caused data loss, and we are trying to reproduce it.
Fortunately we successfully reproduced.

Reproduction steps

  1. Call UploadPart with same PartNumber of multi-threading like this. Only one thread has the correct ContentLength.
        wg.Add(1)
        var f func(bytess io.ReadSeeker) = func(bytess io.ReadSeeker) {
            var buf = make([]byte, 64)
            var stk = buf[:runtime.Stack(buf, false)]
            fmt.Println("start UploadPart PartNumber 2, goroutine id " + string(stk))
            // second part
            uploadResult2, err := svc.UploadPart(&s3.UploadPartInput{
                Body:          bytess,
                Bucket:        &bucket,
                Key:           &key,
                PartNumber:    aws.Int64(int64(2)),
                UploadId:      &*resp.UploadId,
                ContentLength: aws.Int64(int64(100 * 1024 * 1024)),
            })
            if err != nil {
                fmt.Println("failed to UploadPart PartNumber 2, goroutine id " + string(stk) + err.Error())
                return
            }
            fmt.Println("success to UploadPart PartNumber 2, now append, goroutine id " + string(stk))
            wg.Done()
            completedParts = append(completedParts, &s3.CompletedPart{
                ETag:       &*uploadResult2.ETag,
                PartNumber: aws.Int64(int64(2)),
            })
    
        }
        go f(f1)
        go f(f2)
        go f(bytes.NewReader(fileBytes2))
    
  2. read s3 object
    s3cmd get s3://test-bucketname/test-key /tmp/ --force && s3cmd rm s3://test-bucketname/test-key
    
  3. Loop the above steps
    example code: https://github.com/thenamehasbeentake/s3_multipart_example

Logs when the bug reproduces

2023-11-24T12:48:55.492+0000 7fd31b717700  1 beast: 0x7fd2e5323620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:55.480 +0000] "POST /test-bucke-1/test-mupload2?uploads= HTTP/1.1" 200 256 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.011999832s
2023-11-24T12:48:56.172+0000 7fd36b7b7700  1 beast: 0x7fd2e5323620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:55.581 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=1&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 200 8388608 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.590991735s
2023-11-24T12:48:57.790+0000 7fd392004700  1 beast: 0x7fd2e5527620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:56.298 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 29360651 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.491979122s
2023-11-24T12:48:57.792+0000 7fd33d75b700  1 beast: 0x7fd2e5323620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:56.255 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 20972043 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.535978436s
2023-11-24T12:48:58.552+0000 7fd3af03e700  1 beast: 0x7fd2e4f1b620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:57.227 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 29360651 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.324981451s
2023-11-24T12:48:58.699+0000 7fd307ef0700  1 beast: 0x7fd2e51a0620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:56.799 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 20972043 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.899973392s
2023-11-24T12:48:59.499+0000 7fd394008700  1 beast: 0x7fd2e5527620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:58.103 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 29360651 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.395980358s
2023-11-24T12:48:59.501+0000 7fd399813700  1 beast: 0x7fd2e54a6620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:58.361 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 20972043 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.138984084s
2023-11-24T12:48:59.545+0000 7fd3897f3700  1 beast: 0x7fd2e501d620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:56.800 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 200 104857600 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=2.744961500s
2023-11-24T12:48:59.609+0000 7fd348f72700  1 beast: 0x7fd2e501d620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.546 +0000] "POST /test-bucke-1/test-mupload2?uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 200 315 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.062999122s
2023-11-24T12:48:59.672+0000 7fd324729700  1 beast: 0x7fd2e54a6620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.669 +0000] "HEAD /test-bucke-1/test-mupload2 HTTP/1.1" 200 0 - - - latency=0.002999958s
2023-11-24T12:48:59.746+0000 7fd36d7bb700  1 beast: 0x7fd2e54a6620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.685 +0000] "GET /test-bucke-1/test-mupload2 HTTP/1.1" 404 280 - - - latency=0.059999160s
2023-11-24T12:48:59.894+0000 7fd369fb4700  1 beast: 0x7fd2e51a0620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.222 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 8389131 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.671990573s
2023-11-24T12:48:59.973+0000 7fd2f5ecc700  1 beast: 0x7fd2e5323620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.195 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 12583435 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.777989149s
Actions #3

Updated by xiaobao wen 5 months ago

rados object list. shadow_xxxxx.2_3, 2_4 lost

[root@node01 log]# rados -p os-7mhsvrneiumg9g9l.rgw.buckets.data  ls | grep "test-mupload2" | sort
6139219a-070d-4d99-a379-74b96964adef.202979761.4__multipart_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.1
6139219a-070d-4d99-a379-74b96964adef.202979761.4__multipart_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.1_1
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_1
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_10
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_11
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_12
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_13
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_14
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_15
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_16
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_17
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_18
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_19
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_2
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_20
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_21
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_22
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_23
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_24
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_5
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_6
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_7
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_8
6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_9
6139219a-070d-4d99-a379-74b96964adef.202979761.4_test-mupload2

Actions #4

Updated by Casey Bodley 5 months ago

  • Priority changed from Normal to High
  • Tags set to multipart
  • Backport set to pacific quincy reef
Actions #5

Updated by Casey Bodley 5 months ago

  • Status changed from New to Need More Info

we did some work on multipart reuploads in https://tracker.ceph.com/issues/44660, but resolved data leaks that we forgot to delete - not data loss like this. that fix wasn't backported to pacific, but i wonder if it changes how this bug reproduces

would you be willing to test this against the reef release (which has those changes) to see if it still reproduces?

Actions #6

Updated by Liang Zheng 5 months ago

we have met it, which seems partial shadow objects lost from log when using same upload it to upload object, rather than forgot to delete.

[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_7
bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_7 mtime 2023-11-14T08:47:27.000000+0800, size 2097152
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_6
bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_6 mtime 2023-11-14T08:47:27.000000+0800, size 4194304
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_5
bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_5 mtime 2023-11-14T08:47:27.000000+0800, size 4194304
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_4
 error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_4: (2) No such file or directory
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_3
 error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_3: (2) No such file or directory
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_2
 error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_2: (2) No such file or directory
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_1
bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_1 mtime 2023-11-14T08:47:26.000000+0800, size 4194304

rgw log:

123821:2023-11-14T00:47:08.306+0000 7fcbd9b0d700  1 beast: 0x7fcb8b923620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:08.305 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 404 0 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.000000000s
123949:2023-11-14T00:47:08.756+0000 7fcc343c2700  1 beast: 0x7fcb8b923620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:08.740 +0000] "GET /prod-trip-1?uploads&prefix=9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 299 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.016000124s
123969:2023-11-14T00:47:08.771+0000 7fcbccaf3700  1 beast: 0x7fcb8b923620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:08.768 +0000] "POST /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?uploads HTTP/1.1" 200 280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.003000023s
123977:2023-11-14T00:47:08.799+0000 7fcbd4302700  1 beast: 0x7fcb8b923620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:08.797 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 493 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.002000016s
126498:2023-11-14T00:47:14.262+0000 7fcc3ebd7700  1 beast: 0x7fcb88b48620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:13.260 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=3&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.002007723s
127068:2023-11-14T00:47:15.569+0000 7fcc34bc3700  1 beast: 0x7fcb88ccb620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:13.832 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=2&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.736013412s
131569:2023-11-14T00:47:27.284+0000 7fcc2dbb5700  1 beast: 0x7fcb89ded620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:13.807 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=1&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 404 25166316 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=13.476103783s
131735:2023-11-14T00:47:27.648+0000 7fcba129c700  1 beast: 0x7fcb8be2d620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:26.520 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=1&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.127008796s
132082:2023-11-14T00:47:28.143+0000 7fcbd4b03700  1 beast: 0x7fcb893d9620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:26.650 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=4&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.493011594s
139504:2023-11-14T00:47:35.367+0000 7fcbfb350700  1 beast: 0x7fcb87dad620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:32.519 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=5&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 404 4194796 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=2.848021984s
171638:2023-11-14T00:52:20.586+0000 7fcbf033a700  1 beast: 0x7fcb8dcea620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:20.585 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 404 0 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s
171662:2023-11-14T00:52:20.619+0000 7fcc22b9f700  1 beast: 0x7fcb8dcea620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:20.609 +0000] "GET /prod-trip-1?uploads&prefix=9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 824 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.010000078s
171682:2023-11-14T00:52:20.716+0000 7fcc3cbd3700  1 beast: 0x7fcb8dcea620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:20.714 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 1173 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s
172146:2023-11-14T00:52:28.972+0000 7fcc75444700  1 beast: 0x7fcb8c943620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:25.563 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=9&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 27585063 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=3.408026457s
172160:2023-11-14T00:52:29.642+0000 7fcc05364700  1 beast: 0x7fcb8cf4f620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:24.625 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=5&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=5.016038895s
172162:2023-11-14T00:52:29.713+0000 7fcc22b9f700  1 beast: 0x7fcb8cece620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:25.039 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=7&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=4.674036026s
172174:2023-11-14T00:52:30.131+0000 7fcbeab2f700  1 beast: 0x7fcb8d8e2620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:23.869 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=6&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 404 29360620 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=6.262048721s
172294:2023-11-14T00:52:34.319+0000 7fcc47be9700  1 beast: 0x7fcb8c943620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:29.378 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=6&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=4.941038609s
172301:2023-11-14T00:52:34.331+0000 7fcbd3b01700  1 beast: 0x7fcb8dcea620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:34.329 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 1853 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.002000015s
172613:2023-11-14T00:52:41.917+0000 7fcbb92cc700  1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:41.916 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 404 0 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s
172620:2023-11-14T00:52:41.943+0000 7fcbb32c0700  1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:41.931 +0000] "GET /prod-trip-1?uploads&prefix=9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 824 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.012000093s
172636:2023-11-14T00:52:42.009+0000 7fcbc9aed700  1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:42.006 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 1853 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.002000015s
172736:2023-11-14T00:52:44.323+0000 7fcc90c7b700  1 beast: 0x7fcb8d153620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:43.545 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=8&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=0.778006017s
172743:2023-11-14T00:52:44.345+0000 7fcc24ba3700  1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:44.343 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 2023 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.002000015s
172752:2023-11-14T00:52:44.604+0000 7fcc8946c700  1 beast: 0x7fcb8d153620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:44.482 +0000] "POST /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 362 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=0.122000948s
172777:2023-11-14T00:52:45.040+0000 7fcbe0b1b700  1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:45.039 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 0 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s
172833:2023-11-14T00:52:46.063+0000 7fcc07368700  1 beast: 0x7fcb8cac6620: 10.3.9.26 - bd-dataocean-prod [14/Nov/2023:00:52:46.061 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 0 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s

Actions #7

Updated by Casey Bodley 5 months ago

  • Status changed from Need More Info to New
Actions #8

Updated by J. Eric Ivancich 5 months ago

  • Subject changed from rgw: rados objects wronly deleted to rgw: rados objects wrongly deleted
Actions #9

Updated by J. Eric Ivancich 3 months ago

  • Pull request ID set to 55042
Actions #10

Updated by Mark Kogan 3 months ago

@xiaobao wen thank you for providing the repro code https://github.com/thenamehasbeentake/s3_multipart_example
on my system, running it does not repro the issue, does your environment have haproxy?
(it can induce parallelism when different parts are uploaded to various RGWs)
*if there is a proxy care to share the haproxy.cfg please for me try to repro with it.

thanks

Actions #11

Updated by Mark Kogan 3 months ago

updating, reproduces on current main (bab43e83ad7) with single RGW

narrowed (234MB) log with debug_rgw=20 and debug_ms=1 attached below
two reproducing objects in it:
test-key205131 and test-key2136793

for example checking:

cat ./radosgw.8000.log | grep --text --color=always 'test-key2051315'
...
2024-02-14T08:15:54.405+0000 7fffe4fe2640  1 beast: 0x7fffbc9677c0: 127.0.0.1 - cosbench [14/Feb/2024:08:15:54.405 +0000] " HEAD  /test-bucketname/test-key2051315 HTTP/1.1" 200 0 - - - laten
cy=0.000000000s 
...
2024-02-14T08:15:54.417+0000 7fffe57e3640  1 -- 172.21.5.102:0/2702027229 --> [v2:172.21.5.102:6802/534638142,v1:172.21.5.102:6803/534638142] -- osd_op(unknown.0.0:46518 6.62 6:478c5b66:::c60f796e-1a94-4446-aa80-6ecd252e6a19.4234.82__shadow_test-key2051315.2~Jj_Q2X0AQnBRmVLV-Ar0k2g5OVSIrXr.2_4:head [read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e28) v8 -- 0x5e9ce00 con 0x30c7c00
2024-02-14T08:15:54.417+0000 7fffecff2640  1 -- 172.21.5.102:0/2702027229 <== osd.0 v2:172.21.5.102:6802/534638142 46819 ==== osd_op_reply(46518 c60f796e-1a94-4446-aa80-6ecd252e6a19.4234.82__shadow_test-key2051315.2~Jj_Q2X0AQnBRmVLV-Ar0k2g5OVSIrXr.2_4 [read 0~4194304] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 250+0+0 (crc 0 0 0) 0x5492d80 con 0x30c7c00
*** NOTE ^^^ the `ondisk = -2 ((2) No such file or directory`
2024-02-14T08:15:54.658+0000 7fffe77e7640  1 beast: 0x7fffbc9677c0: 127.0.0.1 - cosbench [14/Feb/2024:08:15:54.416 +0000] " GET  /test-bucketname/test-key2051315 HTTP/1.1" 404 241 - - - latency=0.241998926s
...

Actions #12

Updated by Mark Kogan 3 months ago

in my testing of cherry-picking the fix PR (https://github.com/ceph/ceph/pull/55042) commit over main prevents the reproduction of the issue as outlined in comment#2 --> https://github.com/thenamehasbeentake/s3_multipart_example reproducer

Actions #13

Updated by Casey Bodley 3 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to Casey Bodley
  • Pull request ID changed from 55042 to 55582
Actions #14

Updated by Casey Bodley 3 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #15

Updated by Casey Bodley 3 months ago

  • Priority changed from High to Urgent
  • Target version deleted (v16.2.15)
Actions #16

Updated by Backport Bot 3 months ago

  • Copied to Backport #64425: quincy: rgw: rados objects wrongly deleted added
Actions #17

Updated by Backport Bot 3 months ago

  • Copied to Backport #64426: reef: rgw: rados objects wrongly deleted added
Actions #18

Updated by Backport Bot 3 months ago

  • Copied to Backport #64427: pacific: rgw: rados objects wrongly deleted added
Actions #19

Updated by Backport Bot 3 months ago

  • Tags changed from multipart to multipart backport_processed
Actions #20

Updated by Casey Bodley 3 months ago

  • Is duplicate of Bug #63597: rgw: multi-part upload will make head object metadata error during a breakpoint continuation by using aws java Signature Version 4 added
Actions #21

Updated by Casey Bodley 3 months ago

  • Status changed from Pending Backport to Duplicate
Actions #22

Updated by Casey Bodley 3 months ago

  • Status changed from Duplicate to Fix Under Review
Actions #23

Updated by Casey Bodley 2 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #24

Updated by Mark Kogan 2 months ago

for future reference the `rgw-gap-list` has been tested to work correctly with the golang reproducer to find the affected objects -
in the below example, the affected object is test-key1617127

numactl -N 1 -m 1 -- bash ./retry.sh $$
...
goroutine 1 [running]:
main.RetryUpload(0xc0000140b8, {0x8dfffc, 0xf}, {0xc000034b70, 0xf})                   
        /mnt/nvme/src-git/ceph--up--main/s3_multipart_example/main.go:147 +0xdd8       
main.main()
        /mnt/nvme/src-git/ceph--up--main/s3_multipart_example/main.go:32 +0x1e5        
download: 's3://test-bucketname/test-key1617127' -> '/tmp/test-key1617127'  [1 of 1]   
                                ^^^^^^^^^^^^^^^
ERROR: Download of '/tmp/test-key1617127' failed (Reason: 404 (NoSuchKey))             
ERROR: S3 error: 404 (NoSuchKey)
failed

sudo ../src/rgw/rgw-gap-list -p default.rgw.buckets.data
2024-02-18 13:57:28 robsoni01 Pool is "default.rgw.buckets.data".
2024-02-18 13:57:28 robsoni01 Note: output files produced will be tagged with the current timestamp -- 202402181357.
2024-02-18 13:57:28 robsoni01 Starting 'rados ls' function.
2024-02-18 13:57:28 robsoni01 Running 'rados ls' on pool default.rgw.buckets.data.
2024-02-18 13:57:28 robsoni01 Completed 'rados ls' on pool default.rgw.buckets.data.
2024-02-18 13:57:28 robsoni01 Sorting 'rados ls' output(s).
2024-02-18 13:57:29 robsoni01 Moving sorted output(s).
2024-02-18 13:57:29 robsoni01 Sorting 'rados ls' output(s) complete.
2024-02-18 13:57:29 robsoni01 Running 'radosgw-admin bucket radoslist'.
2024-02-18 13:57:32 robsoni01 Completed 'radosgw-admin bucket radoslist'.
2024-02-18 13:57:32 robsoni01 Sorting 'radosgw-admin bucket radoslist' output.
2024-02-18 13:57:32 robsoni01 Completed sorting 'radosgw-admin bucket radoslist'.
2024-02-18 13:57:32 robsoni01 Moving 'radosgw-admin bucket radoslist' output.
2024-02-18 13:57:32 robsoni01 Completed moving 'radosgw-admin bucket radoslist' output.
2024-02-18 13:57:32 robsoni01 Creating awk script for comparing outputs: /tmp/ig-3401305.awk
2024-02-18 13:57:32 robsoni01 Begin identifying potentially impacted user object names.
2024-02-18 13:57:32 File 1 Line Count   File 2 Line Count       Potentially Impacted Objects
2024-02-18 13:57:32             83345               88959                  1
2024-02-18 13:57:32 robsoni01 Begin sorting results.
2024-02-18 13:57:32 robsoni01 Done.

Found 2 *possible* gaps.
The results can be found in "/mnt/nvme/src-git/ceph--up--main/build/gap-list-202402181357.gap".

Intermediate files: "/mnt/nvme/src-git/ceph--up--main/build/rados-202402181357.intermediate" and "/mnt/nvme/src-git/ceph--up--main/build/radosgw-admin-202402181357.intermediate".

***
*** WARNING: This is EXPERIMENTAL code and the results should be used
***          with CAUTION and VERIFIED. Not everything listed is an
***          actual gap. EXPECT false positives. Every result
***          produced should be verified.
***

cat gap-list-202402181357.gap
Bucket: "test-bucketname"  Object: "test-key1617127" 
                                    ^^^^^^^^^^^^^^^
Actions

Also available in: Atom PDF