Project

General

Profile

Bug #37668

AbortMultipartUpload causes data loss(NoSuchKey) when CompleteMultipartUpload request timeout

Added by Princi Part 3 months ago. Updated 21 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
Start date:
12/17/2018
Due date:
% Done:

0%

Source:
Tags:
NoSuchKey data-loss cannot-download
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

We have some Ceph clusters using RGW as S3/OSS service. Unfortunately, our users reported they had lost some data sometimes.
The detail is:
At the beginning, they uploaded these files successfully, and they could download these files correctly at that time, of cause;
After some time(maybe some hours, or a day, or even days), they CANNOT download these files again but they could list out them, and got a "NoSuchKey" error.

So we check the Ceph clusters and the behaviours of the client(awscli), analysed with the rgw server source code and its logs, finally we found this bug: When the cluster is slow(which causes the CompleteMultipartUpload request timeout), the awscli(or other) client sends a DELETE request(AbortMultipartUpload to clean up the upload, put the real data(multi parts) to gc. However, the upload job is actually finished when the CompleteMultipartUpload request return, so the head_obj with the filename is retained in the cluster and could be list out, but its data had gone away.

we use the newest Luminous v12.2.10 for mon/osd/rgw/mgr everything.

[bug reproduce]
1.Using awscli (or any other s3 client for sending request) to PUT a large(e.g. 100MB) file to rgw, make sure using MultipartUpload;
2.Waiting for the multi parts putting finished;
3.When the awscli client finished putting multi parts, it begins to send POST CompleteMultipartUpload requests to rgw server, every request has a 60 seconds to timeout and re-try 4 times(total 5 requests). We use Charles (or any other proxy software) to block these POST requests so it will send DELETE request later;
4.After 5 times POST CompleteMultipartUpload requests, the awscli client begins to send DELETErequests to rgw server, for its opinion, the upload is timeout and the uploaded file status is unknown, so clean up the upload;
5.We also use Charles block these DELETE requests so we can have time to unblock a POST CompleteMultipartUpload;
6.For 100% reproduce, we have a simple patch to let the rgw to simulate the cluster is very slow, sleeping in the CompleteMultipartUpload::execute():

[root@ceph1 ceph]# git diff
diff --git a/src/rgw/rgw_op.cc b/src/rgw/rgw_op.cc
index dce4549..8948f1f 100644
--- a/src/rgw/rgw_op.cc
+++ b/src/rgw/rgw_op.cc
@@ -5610,6 +5610,9 @@ void RGWCompleteMultipart::execute()

   RGWObjectCtx& obj_ctx = *static_cast<RGWObjectCtx *>(s->obj_ctx);

+  std::cout << "sleeping before write head_obj 60s" << std::endl;
+  sleep(60);
+  std::cout << "sleeping end, write head_obj now" << std::endl;
   obj_ctx.obj.set_atomic(target_obj);

   RGWRados::Object op_target(store, s->bucket_info, *static_cast<RGWObjectCtx *>(s->obj_ctx), target_obj);
@@ -5637,6 +5640,7 @@ void RGWCompleteMultipart::execute()
   } else {
       ldout(store->ctx(), 0) << "WARNING: failed to remove object " 
                             << meta_obj << dendl;
+      std::cout << "WARNING: failed to remove meta object: " << meta_obj << std::endl;
   }
 }

7. After any DELETErequest is sent and blocked, we unblock any POST CompleteMultipartUpload request, it will be processed to write the head_obj of the upload file after sleeping;

8. Unblock any DELETErequest, the response is got immediately with http return code 204(Success), and the data(multiparts) of the file has been put to GC, the meta_obj of the upload is removed;

9. When the POST CompleteMultipartUpload request wakes up, the head_obj is write successfully and return 200(OK) to the client;

10. You could download the file(even its data is already in GC) until its multiparts are processed in GC.

[Where is the bug?]
It's in RGWAbortMultipart::execute():
There is nothing to protect the meta_obj for race-condition (another thread is processing POST CompleteMultipartUpload request), the meta_obj and the data(multiparts) are deleted(put to GC) directly.
So after some time, you can see it ,but you cannot get it. _.

ls_but_no_download.jpeg View (78.9 KB) Princi Part, 12/17/2018 08:56 AM

sleep_in_complete.jpeg View (80.5 KB) Princi Part, 12/17/2018 08:56 AM

found_in_gc.jpeg View (276 KB) Princi Part, 12/17/2018 08:56 AM

History

#1 Updated by Adam Emerson 3 months ago

  • File deleted (rgw_ls_but_cannot_download_bug.pdf)

#2 Updated by Adam Emerson 3 months ago

Deleted private attachment at poster's request.

#3 Updated by Princi Part 21 days ago

OMG!!! Is there anybody to deal with this issue?!!

Also available in: Atom PDF