Project

General

Profile

Actions

Bug #1287

closed

Setting metadata with unreadable characters is not consistent with amazon S3

Added by Stephon Striplin almost 13 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If you have a string like '\x04world', Amazon will encode it using MIME encoded-word syntax. Currently, our S3 implementation will either give a 403 error if the string begins or ends with an unreadable character, or it will store it as raw input, rather than use Amazon's method.

Actions #1

Updated by Sage Weil almost 13 years ago

  • Target version set to v0.32
Actions #2

Updated by Sage Weil almost 13 years ago

  • Category set to 22
Actions #3

Updated by Colin McCabe almost 13 years ago

Amazon says (in the developers' guide):

> When uploading an object, you can assign metadata to the object. You 
> provide this optional information as a name, value pair when you send a 
> PUT or POST request to create the object. When uploading objects using the 
> REST API the optional user-defined metadata names must begin with 
> “x-amz-meta-“ to distinguish them as HTTP headers. When you retrieve the 
> object using the REST API, this prefix is returned. When uploading objects
> using the SOAP API, the prefix is not required and when you retrieve the
> object using the SOAP API, the prefix is removed, regardless of which API
> you used to upload the object.
>
> When metadata is retrieved through the REST API, Amazon S3 combines 
> headers that have the same name (ignoring case) into a comma-delimited 
> list. If some metadata contains unprintable characters, it is not 
> returned. Instead, the "x-amz-missing-meta" header is returned with a
> value of the number of the unprintable metadata entries.
>
> Each name, value pair must conform to US-ASCII when using REST and UTF-8 
> when using SOAP or browser-based uploads via POST.

Basically: you must use UTF-8 for metadata or suffer x-amz-missing-meta. MIME is not mentioned anywhere.

So maybe this is another case where there is a conflict between the spec and reality. We will have to test it.

Actions #4

Updated by Colin McCabe almost 13 years ago

Confirmed through s3-tests. Amazon gives it back to you in mime-encoded format rather than giving you x-amz-missing.

Actions #5

Updated by Sage Weil almost 13 years ago

  • Assignee set to Colin McCabe
Actions #6

Updated by Colin McCabe almost 13 years ago

I wanted to be sure about this, so I verified using tcpdump that we were really sending the data over the wire not encoded. I confirmed that we are. So yes, the Amazon docs are wrong again, and RGW needs to learn how to do this encoding.

Actions #7

Updated by Colin McCabe almost 13 years ago

Also, python 2.X DOES mangle your strings, but only if you prepend 'U', making it a "unicode string"

So u'\04a' > "%04a" aka 0x25 0x30 0x34 0x61 0x00
but '\04a' > 0x04 0x61 0x00

Actions #8

Updated by Colin McCabe almost 13 years ago

  • Status changed from New to Resolved

fixed by commit:5cb98c95c001b2a0658a219c717a717bc37e444d

Actions #9

Updated by Sage Weil almost 13 years ago

  • Translation missing: en.field_story_points set to 5
Actions #10

Updated by John Spray over 6 years ago

  • Project changed from Ceph to rgw
  • Category deleted (22)
  • Target version deleted (v0.32)

Bulk reassign of radosgw category to RGW project.

Actions

Also available in: Atom PDF