Bug #63935
openBucket index for object remaining after a DELETE
0%
Description
We are running 16.2.14
We have a user that got an issue with our RGW S3:
He could list an object but not GET it: it was resulting in 404 No Such Key.
After some investigation we discovered that the Bucket Index of the object was present but no underlying RADOS object seemed to be here.
Looking at our HAProxy logs (our RGW logs are not so verbose and no error was visible) the user did a PUT (got a 200), a GET (got a 200) then a DELETE (also a 204). Afterwards multiple GET were issued with this 404 although the object was listable.
To fix the issue the user re-PUT an object and deleted it again and it solved the issue.
We believe the issue reside in a kind-of race condition between two RGW instances, in our HAProxy logs we noticed this flow:
29/Dec/2023:17:13:28.961 rgw-frontend~ rgw-backend/server-mon-01-rgw0 0/0/0/127/127 200 228 - - ---- 132/132/70/67/0 0/0 "PUT /xxx/object HTTP/1.1"
29/Dec/2023:17:13:29.101 rgw-frontend~ rgw-backend/server-mon-01-rgw0 0/0/0/1/1 200 381 - - ---- 132/132/76/71/0 0/0 "GET /xxx/object HTTP/1.1"
29/Dec/2023:17:13:29.121 rgw-frontend~ rgw-backend/server-mon-01-rgw0 0/0/0/1/1 200 381 - - ---- 132/132/71/59/0 0/0 "GET /xxx/object HTTP/1.1"
29/Dec/2023:17:13:29.137 rgw-frontend~ rgw-backend/server-mon-03-rgw0 0/0/0/4/4 204 153 - - ---- 132/132/71/6/0 0/0 "DELETE /xxx/object HTTP/1.1"
29/Dec/2023:19:03:21.671 rgw-frontend~ rgw-backend/server-mon-03-rgw0 0/0/0/1/1 404 472 - - ---- 55/55/26/0/0 0/0 "GET /xxx/object HTTP/1.1"
As you can see the DELETE is happening on a different server in the same second as the PUT.
We did not manage to reproduce it yet.
Thanks!
Regards,
Mathias
No data to display