Bug #63642
openrgw: rados objects wrongly deleted
Added by xiaobao wen 6 months ago. Updated 3 months ago.
0%
Updated by xiaobao wen 6 months ago
We encountered data loss when using multipart upload. We found that some rados objects were lost.
Logs on production environment¶
- s3cmd get with 404 failed
xiaobaowen@pc:~$ s3cmd get s3://prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1 /tmp/ download: 's3://prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1' -> '/tmp/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1' [1 of 1] 318767104 of 1260015979 25% in 3s 95.06 MB/s failed WARNING: Retrying failed request: /62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1 (EOF from S3!) WARNING: Waiting 3 sec... download: 's3://prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1' -> '/tmp/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1' [1 of 1] ERROR: Download of '/tmp/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1' failed (Reason: 404 (NoSuchKey)) ERROR: S3 error: 404 (NoSuchKey)
- logs when s3 object uploaded
2023-11-14T00:47:35.719+0000 7fcbabab1700 1 beast: 0x7fcb8bfb0620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:47:28.937 +0000] "PUT /prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1?partNumber=11&uploadId=2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL HTTP/1.1" 404 20972012 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=6.782052517s 2023-11-14T00:47:35.999+0000 7fcc483ea700 1 beast: 0x7fcb8ae8e620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:47:34.934 +0000] "PUT /prod-trip-1/62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1?partNumber=11&uploadId=2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.065008283s
- Missing rados object
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_1 bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_1 mtime 2023-11-14T08:47:35.000000+0800, size 4194304 [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_2 error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_2: (2) No such file or directory [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_3 error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_3: (2) No such file or directory [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_4 error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_4: (2) No such file or directory [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_5 bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_5 mtime 2023-11-14T08:47:35.000000+0800, size 4194304 [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_6 bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_6 mtime 2023-11-14T08:47:35.000000+0800, size 4194304 [root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_7 bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_62ee2a53-c18c-4dbe-91ec-5ce5e8c9691b-1.2~Bch36uJLYIoIZiewlZ6v1NJVup9JqnL.11_7 mtime 2023-11-14T08:47:35.000000+0800, size 2097152
Updated by xiaobao wen 6 months ago
We checked the s3 user's logs. The multipart upload retries were automatically attempted by the s3-transfer SDK.
We suspect that the retry action of UploadPart has caused data loss, and we are trying to reproduce it.
Fortunately we successfully reproduced.
Reproduction steps¶
- Call UploadPart with same PartNumber of multi-threading like this. Only one thread has the correct ContentLength.
wg.Add(1) var f func(bytess io.ReadSeeker) = func(bytess io.ReadSeeker) { var buf = make([]byte, 64) var stk = buf[:runtime.Stack(buf, false)] fmt.Println("start UploadPart PartNumber 2, goroutine id " + string(stk)) // second part uploadResult2, err := svc.UploadPart(&s3.UploadPartInput{ Body: bytess, Bucket: &bucket, Key: &key, PartNumber: aws.Int64(int64(2)), UploadId: &*resp.UploadId, ContentLength: aws.Int64(int64(100 * 1024 * 1024)), }) if err != nil { fmt.Println("failed to UploadPart PartNumber 2, goroutine id " + string(stk) + err.Error()) return } fmt.Println("success to UploadPart PartNumber 2, now append, goroutine id " + string(stk)) wg.Done() completedParts = append(completedParts, &s3.CompletedPart{ ETag: &*uploadResult2.ETag, PartNumber: aws.Int64(int64(2)), }) } go f(f1) go f(f2) go f(bytes.NewReader(fileBytes2))
- read s3 object
s3cmd get s3://test-bucketname/test-key /tmp/ --force && s3cmd rm s3://test-bucketname/test-key
- Loop the above steps
example code: https://github.com/thenamehasbeentake/s3_multipart_example
Logs when the bug reproduces¶
2023-11-24T12:48:55.492+0000 7fd31b717700 1 beast: 0x7fd2e5323620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:55.480 +0000] "POST /test-bucke-1/test-mupload2?uploads= HTTP/1.1" 200 256 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.011999832s 2023-11-24T12:48:56.172+0000 7fd36b7b7700 1 beast: 0x7fd2e5323620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:55.581 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=1&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 200 8388608 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.590991735s 2023-11-24T12:48:57.790+0000 7fd392004700 1 beast: 0x7fd2e5527620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:56.298 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 29360651 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.491979122s 2023-11-24T12:48:57.792+0000 7fd33d75b700 1 beast: 0x7fd2e5323620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:56.255 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 20972043 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.535978436s 2023-11-24T12:48:58.552+0000 7fd3af03e700 1 beast: 0x7fd2e4f1b620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:57.227 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 29360651 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.324981451s 2023-11-24T12:48:58.699+0000 7fd307ef0700 1 beast: 0x7fd2e51a0620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:56.799 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 20972043 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.899973392s 2023-11-24T12:48:59.499+0000 7fd394008700 1 beast: 0x7fd2e5527620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:58.103 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 29360651 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.395980358s 2023-11-24T12:48:59.501+0000 7fd399813700 1 beast: 0x7fd2e54a6620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:58.361 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 20972043 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=1.138984084s 2023-11-24T12:48:59.545+0000 7fd3897f3700 1 beast: 0x7fd2e501d620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:56.800 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 200 104857600 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=2.744961500s 2023-11-24T12:48:59.609+0000 7fd348f72700 1 beast: 0x7fd2e501d620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.546 +0000] "POST /test-bucke-1/test-mupload2?uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 200 315 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.062999122s 2023-11-24T12:48:59.672+0000 7fd324729700 1 beast: 0x7fd2e54a6620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.669 +0000] "HEAD /test-bucke-1/test-mupload2 HTTP/1.1" 200 0 - - - latency=0.002999958s 2023-11-24T12:48:59.746+0000 7fd36d7bb700 1 beast: 0x7fd2e54a6620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.685 +0000] "GET /test-bucke-1/test-mupload2 HTTP/1.1" 404 280 - - - latency=0.059999160s 2023-11-24T12:48:59.894+0000 7fd369fb4700 1 beast: 0x7fd2e51a0620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.222 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 8389131 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.671990573s 2023-11-24T12:48:59.973+0000 7fd2f5ecc700 1 beast: 0x7fd2e5323620: 10.24.96.98 - os-user-2c197f8b-d0fd-4c81-b8f6-b35b8c32d691 [24/Nov/2023:12:48:59.195 +0000] "PUT /test-bucke-1/test-mupload2?partNumber=2&uploadId=2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO HTTP/1.1" 404 12583435 - "aws-sdk-go/1.44.240 (go1.19.3; linux; amd64)" - latency=0.777989149s
Updated by xiaobao wen 6 months ago
rados object list. shadow_xxxxx.2_3, 2_4 lost
[root@node01 log]# rados -p os-7mhsvrneiumg9g9l.rgw.buckets.data ls | grep "test-mupload2" | sort 6139219a-070d-4d99-a379-74b96964adef.202979761.4__multipart_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.1 6139219a-070d-4d99-a379-74b96964adef.202979761.4__multipart_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.1_1 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_1 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_10 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_11 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_12 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_13 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_14 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_15 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_16 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_17 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_18 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_19 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_2 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_20 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_21 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_22 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_23 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_24 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_5 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_6 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_7 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_8 6139219a-070d-4d99-a379-74b96964adef.202979761.4__shadow_test-mupload2.2~X5fKV4nsqdme1fTqGZVae3EP6SiI6lO.2_9 6139219a-070d-4d99-a379-74b96964adef.202979761.4_test-mupload2
Updated by Casey Bodley 6 months ago
- Priority changed from Normal to High
- Tags set to multipart
- Backport set to pacific quincy reef
Updated by Casey Bodley 6 months ago
- Status changed from New to Need More Info
we did some work on multipart reuploads in https://tracker.ceph.com/issues/44660, but resolved data leaks that we forgot to delete - not data loss like this. that fix wasn't backported to pacific, but i wonder if it changes how this bug reproduces
would you be willing to test this against the reef release (which has those changes) to see if it still reproduces?
Updated by Liang Zheng 6 months ago
we have met it, which seems partial shadow objects lost from log when using same upload it to upload object, rather than forgot to delete.
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_7
bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_7 mtime 2023-11-14T08:47:27.000000+0800, size 2097152
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_6
bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_6 mtime 2023-11-14T08:47:27.000000+0800, size 4194304
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_5
bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_5 mtime 2023-11-14T08:47:27.000000+0800, size 4194304
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_4
error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_4: (2) No such file or directory
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_3
error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_3: (2) No such file or directory
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_2
error stat-ing bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_2: (2) No such file or directory
[root@bd-hdd03-node01 ~]# rados stat -p bigdata-hdd03.rgw.buckets.data e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_1
bigdata-hdd03.rgw.buckets.data/e2a537ca-22bb-470c-9af1-81f3153d6f56.203769.1__shadow_9f5b98c8-d653-41ce-9e4a-52c53148a72b-1.2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap.1_1 mtime 2023-11-14T08:47:26.000000+0800, size 4194304
rgw log:
123821:2023-11-14T00:47:08.306+0000 7fcbd9b0d700 1 beast: 0x7fcb8b923620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:08.305 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 404 0 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.000000000s
123949:2023-11-14T00:47:08.756+0000 7fcc343c2700 1 beast: 0x7fcb8b923620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:08.740 +0000] "GET /prod-trip-1?uploads&prefix=9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 299 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.016000124s
123969:2023-11-14T00:47:08.771+0000 7fcbccaf3700 1 beast: 0x7fcb8b923620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:08.768 +0000] "POST /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?uploads HTTP/1.1" 200 280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.003000023s
123977:2023-11-14T00:47:08.799+0000 7fcbd4302700 1 beast: 0x7fcb8b923620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:08.797 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 493 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.002000016s
126498:2023-11-14T00:47:14.262+0000 7fcc3ebd7700 1 beast: 0x7fcb88b48620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:13.260 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=3&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.002007723s
127068:2023-11-14T00:47:15.569+0000 7fcc34bc3700 1 beast: 0x7fcb88ccb620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:13.832 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=2&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.736013412s
131569:2023-11-14T00:47:27.284+0000 7fcc2dbb5700 1 beast: 0x7fcb89ded620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:13.807 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=1&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 404 25166316 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=13.476103783s
131735:2023-11-14T00:47:27.648+0000 7fcba129c700 1 beast: 0x7fcb8be2d620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:26.520 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=1&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.127008796s
132082:2023-11-14T00:47:28.143+0000 7fcbd4b03700 1 beast: 0x7fcb893d9620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:26.650 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=4&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=1.493011594s
139504:2023-11-14T00:47:35.367+0000 7fcbfb350700 1 beast: 0x7fcb87dad620: 10.3.9.15 - bd-dataocean-prod [14/Nov/2023:00:47:32.519 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=5&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 404 4194796 - "aws-sdk-java/2.20.85 Linux/5.13.0-52-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=2.848021984s
171638:2023-11-14T00:52:20.586+0000 7fcbf033a700 1 beast: 0x7fcb8dcea620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:20.585 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 404 0 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s
171662:2023-11-14T00:52:20.619+0000 7fcc22b9f700 1 beast: 0x7fcb8dcea620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:20.609 +0000] "GET /prod-trip-1?uploads&prefix=9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 824 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.010000078s
171682:2023-11-14T00:52:20.716+0000 7fcc3cbd3700 1 beast: 0x7fcb8dcea620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:20.714 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 1173 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s
172146:2023-11-14T00:52:28.972+0000 7fcc75444700 1 beast: 0x7fcb8c943620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:25.563 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=9&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 27585063 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=3.408026457s
172160:2023-11-14T00:52:29.642+0000 7fcc05364700 1 beast: 0x7fcb8cf4f620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:24.625 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=5&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=5.016038895s
172162:2023-11-14T00:52:29.713+0000 7fcc22b9f700 1 beast: 0x7fcb8cece620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:25.039 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=7&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=4.674036026s
172174:2023-11-14T00:52:30.131+0000 7fcbeab2f700 1 beast: 0x7fcb8d8e2620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:23.869 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=6&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 404 29360620 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=6.262048721s
172294:2023-11-14T00:52:34.319+0000 7fcc47be9700 1 beast: 0x7fcb8c943620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:29.378 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=6&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=4.941038609s
172301:2023-11-14T00:52:34.331+0000 7fcbd3b01700 1 beast: 0x7fcb8dcea620: 10.3.9.21 - bd-dataocean-prod [14/Nov/2023:00:52:34.329 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 1853 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.002000015s
172613:2023-11-14T00:52:41.917+0000 7fcbb92cc700 1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:41.916 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 404 0 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s
172620:2023-11-14T00:52:41.943+0000 7fcbb32c0700 1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:41.931 +0000] "GET /prod-trip-1?uploads&prefix=9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 824 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.012000093s
172636:2023-11-14T00:52:42.009+0000 7fcbc9aed700 1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:42.006 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 1853 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.002000015s
172736:2023-11-14T00:52:44.323+0000 7fcc90c7b700 1 beast: 0x7fcb8d153620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:43.545 +0000] "PUT /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?partNumber=8&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 31457280 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=0.778006017s
172743:2023-11-14T00:52:44.345+0000 7fcc24ba3700 1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:44.343 +0000] "GET /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?max-parts=1000&uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 2023 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.002000015s
172752:2023-11-14T00:52:44.604+0000 7fcc8946c700 1 beast: 0x7fcb8d153620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:44.482 +0000] "POST /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1?uploadId=2~W8DDvF6XtLF5o7EJuD4EXAo78qfYVap HTTP/1.1" 200 362 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/async http/NettyNio cfg/retry-mode/legacy ft/s3-transfer" - latency=0.122000948s
172777:2023-11-14T00:52:45.040+0000 7fcbe0b1b700 1 beast: 0x7fcb8dbe8620: 10.3.9.22 - bd-dataocean-prod [14/Nov/2023:00:52:45.039 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 0 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 kotlin/1.3.61-release-180 (1.3.61) vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s
172833:2023-11-14T00:52:46.063+0000 7fcc07368700 1 beast: 0x7fcb8cac6620: 10.3.9.26 - bd-dataocean-prod [14/Nov/2023:00:52:46.061 +0000] "HEAD /prod-trip-1/9f5b98c8-d653-41ce-9e4a-52c53148a72b-1 HTTP/1.1" 200 0 - "aws-sdk-java/2.20.85 Linux/5.15.0-78-generic Java_HotSpot_TM__64-Bit_Server_VM/25.212-b10 Java/1.8.0_212 vendor/Oracle_Corporation io/sync http/Apache cfg/retry-mode/legacy ft/s3-transfer" - latency=0.001000008s
Updated by J. Eric Ivancich 5 months ago
- Subject changed from rgw: rados objects wronly deleted to rgw: rados objects wrongly deleted
Updated by Mark Kogan 3 months ago
@xiaobao wen thank you for providing the repro code https://github.com/thenamehasbeentake/s3_multipart_example
on my system, running it does not repro the issue, does your environment have haproxy?
(it can induce parallelism when different parts are uploaded to various RGWs)
*if there is a proxy care to share the haproxy.cfg please for me try to repro with it.
thanks
Updated by Mark Kogan 3 months ago
updating, reproduces on current main (bab43e83ad7) with single RGW
narrowed (234MB) log with debug_rgw=20 and debug_ms=1 attached below
two reproducing objects in it:
test-key205131 and test-key2136793
for example checking:
cat ./radosgw.8000.log | grep --text --color=always 'test-key2051315' ... 2024-02-14T08:15:54.405+0000 7fffe4fe2640 1 beast: 0x7fffbc9677c0: 127.0.0.1 - cosbench [14/Feb/2024:08:15:54.405 +0000] " HEAD /test-bucketname/test-key2051315 HTTP/1.1" 200 0 - - - laten cy=0.000000000s ... 2024-02-14T08:15:54.417+0000 7fffe57e3640 1 -- 172.21.5.102:0/2702027229 --> [v2:172.21.5.102:6802/534638142,v1:172.21.5.102:6803/534638142] -- osd_op(unknown.0.0:46518 6.62 6:478c5b66:::c60f796e-1a94-4446-aa80-6ecd252e6a19.4234.82__shadow_test-key2051315.2~Jj_Q2X0AQnBRmVLV-Ar0k2g5OVSIrXr.2_4:head [read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e28) v8 -- 0x5e9ce00 con 0x30c7c00 2024-02-14T08:15:54.417+0000 7fffecff2640 1 -- 172.21.5.102:0/2702027229 <== osd.0 v2:172.21.5.102:6802/534638142 46819 ==== osd_op_reply(46518 c60f796e-1a94-4446-aa80-6ecd252e6a19.4234.82__shadow_test-key2051315.2~Jj_Q2X0AQnBRmVLV-Ar0k2g5OVSIrXr.2_4 [read 0~4194304] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 250+0+0 (crc 0 0 0) 0x5492d80 con 0x30c7c00 *** NOTE ^^^ the `ondisk = -2 ((2) No such file or directory` 2024-02-14T08:15:54.658+0000 7fffe77e7640 1 beast: 0x7fffbc9677c0: 127.0.0.1 - cosbench [14/Feb/2024:08:15:54.416 +0000] " GET /test-bucketname/test-key2051315 HTTP/1.1" 404 241 - - - latency=0.241998926s ...
Updated by Mark Kogan 3 months ago
in my testing of cherry-picking the fix PR (https://github.com/ceph/ceph/pull/55042) commit over main prevents the reproduction of the issue as outlined in comment#2 --> https://github.com/thenamehasbeentake/s3_multipart_example reproducer
Updated by Casey Bodley 3 months ago
- Status changed from New to Fix Under Review
- Assignee set to Casey Bodley
- Pull request ID changed from 55042 to 55582
Updated by Casey Bodley 3 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Casey Bodley 3 months ago
- Priority changed from High to Urgent
- Target version deleted (
v16.2.15)
Updated by Backport Bot 3 months ago
- Copied to Backport #64425: quincy: rgw: rados objects wrongly deleted added
Updated by Backport Bot 3 months ago
- Copied to Backport #64426: reef: rgw: rados objects wrongly deleted added
Updated by Backport Bot 3 months ago
- Copied to Backport #64427: pacific: rgw: rados objects wrongly deleted added
Updated by Backport Bot 3 months ago
- Tags changed from multipart to multipart backport_processed
Updated by Casey Bodley 3 months ago
- Is duplicate of Bug #63597: rgw: multi-part upload will make head object metadata error during a breakpoint continuation by using aws java Signature Version 4 added
Updated by Casey Bodley 3 months ago
- Status changed from Pending Backport to Duplicate
Updated by Casey Bodley 3 months ago
- Status changed from Duplicate to Fix Under Review
Updated by Casey Bodley 3 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Mark Kogan 3 months ago
for future reference the `rgw-gap-list` has been tested to work correctly with the golang reproducer to find the affected objects -
in the below example, the affected object is test-key1617127
numactl -N 1 -m 1 -- bash ./retry.sh $$ ... goroutine 1 [running]: main.RetryUpload(0xc0000140b8, {0x8dfffc, 0xf}, {0xc000034b70, 0xf}) /mnt/nvme/src-git/ceph--up--main/s3_multipart_example/main.go:147 +0xdd8 main.main() /mnt/nvme/src-git/ceph--up--main/s3_multipart_example/main.go:32 +0x1e5 download: 's3://test-bucketname/test-key1617127' -> '/tmp/test-key1617127' [1 of 1] ^^^^^^^^^^^^^^^ ERROR: Download of '/tmp/test-key1617127' failed (Reason: 404 (NoSuchKey)) ERROR: S3 error: 404 (NoSuchKey) failed sudo ../src/rgw/rgw-gap-list -p default.rgw.buckets.data 2024-02-18 13:57:28 robsoni01 Pool is "default.rgw.buckets.data". 2024-02-18 13:57:28 robsoni01 Note: output files produced will be tagged with the current timestamp -- 202402181357. 2024-02-18 13:57:28 robsoni01 Starting 'rados ls' function. 2024-02-18 13:57:28 robsoni01 Running 'rados ls' on pool default.rgw.buckets.data. 2024-02-18 13:57:28 robsoni01 Completed 'rados ls' on pool default.rgw.buckets.data. 2024-02-18 13:57:28 robsoni01 Sorting 'rados ls' output(s). 2024-02-18 13:57:29 robsoni01 Moving sorted output(s). 2024-02-18 13:57:29 robsoni01 Sorting 'rados ls' output(s) complete. 2024-02-18 13:57:29 robsoni01 Running 'radosgw-admin bucket radoslist'. 2024-02-18 13:57:32 robsoni01 Completed 'radosgw-admin bucket radoslist'. 2024-02-18 13:57:32 robsoni01 Sorting 'radosgw-admin bucket radoslist' output. 2024-02-18 13:57:32 robsoni01 Completed sorting 'radosgw-admin bucket radoslist'. 2024-02-18 13:57:32 robsoni01 Moving 'radosgw-admin bucket radoslist' output. 2024-02-18 13:57:32 robsoni01 Completed moving 'radosgw-admin bucket radoslist' output. 2024-02-18 13:57:32 robsoni01 Creating awk script for comparing outputs: /tmp/ig-3401305.awk 2024-02-18 13:57:32 robsoni01 Begin identifying potentially impacted user object names. 2024-02-18 13:57:32 File 1 Line Count File 2 Line Count Potentially Impacted Objects 2024-02-18 13:57:32 83345 88959 1 2024-02-18 13:57:32 robsoni01 Begin sorting results. 2024-02-18 13:57:32 robsoni01 Done. Found 2 *possible* gaps. The results can be found in "/mnt/nvme/src-git/ceph--up--main/build/gap-list-202402181357.gap". Intermediate files: "/mnt/nvme/src-git/ceph--up--main/build/rados-202402181357.intermediate" and "/mnt/nvme/src-git/ceph--up--main/build/radosgw-admin-202402181357.intermediate". *** *** WARNING: This is EXPERIMENTAL code and the results should be used *** with CAUTION and VERIFIED. Not everything listed is an *** actual gap. EXPECT false positives. Every result *** produced should be verified. *** cat gap-list-202402181357.gap Bucket: "test-bucketname" Object: "test-key1617127" ^^^^^^^^^^^^^^^