Project

General

Profile

Bug #20166

RGW: Bad error handling when tail object is missing

Added by Jens Harbott almost 7 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When doing an S3 GET on an object, it may happen that one of the tail objects does not exist. When radosgw notices this situation, it tries to send a NoSuchKey error response, but this doesn't work since at this point in time, it has already sent the full headers for the object in question and possibly also the first chunk of data that may be contained in the head object. Also the new set of header is merged with the original headers, so the new response contains e.g. both HTTP/1.1 200 OK and HTTP/1.1 404 Not Found, two Content-Length headers etc.

After sending the broken header, radosgw just sits there and waits for the client to process the data. This can lead to two different failure scenarios, depending on whether the original object size was just a bit larger than the head object size (usually 512kB) or much larger:

1. The client happily pastes the new header after the first chunk of data it has already received and finishes the transaction. Depending on the client implementation, it will proceed with corrupted data or notice an MD5 mismatch and fail.

2. The connection stays open and the client keeps waiting for the remainder of the data is has been promised to receive in the initial header. Hopefully it will timeout eventually.

Also, the radosgw and civetweb log message now contains the 404 error state, despite the fact that the original response to the client has been a 200.

Now the question is how the correct behaviour should look like. I'm seeing two options:

a) close the connection
b) send a chunk of zeroes as long as the missing object and continue

Option b) would be knowingly delivering corrupted data, but at least it would allow the client to see the following data, in case only one of multiple tail objects is missing.

There also should be a more verbose log message explaining what happened than just falsely logging a 404.

This issue is related to http://tracker.ceph.com/issues/20107.

History

#1 Updated by Yehuda Sadeh almost 7 years ago

Do you have any proposed change?

#2 Updated by Jens Harbott almost 7 years ago

No, I tried to dig a bit deeper into how the error is handled, but no success yet.

#3 Updated by Matt Benjamin over 6 years ago

  • Assignee set to Yehuda Sadeh

#4 Updated by Casey Bodley over 6 years ago

  • Status changed from New to 12

#5 Updated by Dan Stoner about 6 years ago

#20107 is marked as resolved.

Is this issue #20166 also resolved?

I think I am seeing this behavior in Luminous 12.2.2.

#6 Updated by Jens Harbott about 6 years ago

Seems that this is not resolved yet indeed. I'm also failing to see a good way how to handle this error situation. Would be nice if some RGW developer could take another look.

#7 Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New

Also available in: Atom PDF