Bug #1287
closedSetting metadata with unreadable characters is not consistent with amazon S3
0%
Description
If you have a string like '\x04world', Amazon will encode it using MIME encoded-word syntax. Currently, our S3 implementation will either give a 403 error if the string begins or ends with an unreadable character, or it will store it as raw input, rather than use Amazon's method.
Updated by Colin McCabe almost 13 years ago
Amazon says (in the developers' guide):
> When uploading an object, you can assign metadata to the object. You > provide this optional information as a name, value pair when you send a > PUT or POST request to create the object. When uploading objects using the > REST API the optional user-defined metadata names must begin with > “x-amz-meta-“ to distinguish them as HTTP headers. When you retrieve the > object using the REST API, this prefix is returned. When uploading objects > using the SOAP API, the prefix is not required and when you retrieve the > object using the SOAP API, the prefix is removed, regardless of which API > you used to upload the object. > > When metadata is retrieved through the REST API, Amazon S3 combines > headers that have the same name (ignoring case) into a comma-delimited > list. If some metadata contains unprintable characters, it is not > returned. Instead, the "x-amz-missing-meta" header is returned with a > value of the number of the unprintable metadata entries. > > Each name, value pair must conform to US-ASCII when using REST and UTF-8 > when using SOAP or browser-based uploads via POST.
Basically: you must use UTF-8 for metadata or suffer x-amz-missing-meta. MIME is not mentioned anywhere.
So maybe this is another case where there is a conflict between the spec and reality. We will have to test it.
Updated by Colin McCabe almost 13 years ago
Confirmed through s3-tests. Amazon gives it back to you in mime-encoded format rather than giving you x-amz-missing.
Updated by Colin McCabe almost 13 years ago
I wanted to be sure about this, so I verified using tcpdump that we were really sending the data over the wire not encoded. I confirmed that we are. So yes, the Amazon docs are wrong again, and RGW needs to learn how to do this encoding.
Updated by Colin McCabe almost 13 years ago
Also, python 2.X DOES mangle your strings, but only if you prepend 'U', making it a "unicode string"
So u'\04a' > "%04a" aka 0x25 0x30 0x34 0x61 0x00
but '\04a' > 0x04 0x61 0x00
Updated by Colin McCabe almost 13 years ago
- Status changed from New to Resolved
fixed by commit:5cb98c95c001b2a0658a219c717a717bc37e444d
Updated by Sage Weil almost 13 years ago
- Translation missing: en.field_story_points set to 5
Updated by John Spray over 6 years ago
- Project changed from Ceph to rgw
- Category deleted (
22) - Target version deleted (
v0.32)
Bulk reassign of radosgw category to RGW project.