Project

General

Profile

Bug #908

RGW allows users to create buckets and objects with invalid names

Added by Colin McCabe about 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

100%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From the "Amazon Simple Storage Service Developer Guide", API Version 2006-03-01: ("Object Key and Metadata")

"The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long." 

We need to make sure all key names and bucket names are valid UTF-8, with an encoding no longer than 1024 characters. Two good libraries to do this are Glib::ustring or libICU. Unfortunately, RGW currently has no knowledge of unicode at all!

This is a serious problem for several reasons:
1. S3 clients will choke on getting a bucket-list that has invalid key names in it.

For example, try creating an object with a control character in the name. After one of these baddies has been created in an RGW bucket, it's impossible to list the objects in this bucket. You can get back a list of object names, but your s3 client will choke on it. It's also impossible to destroy such an object because you cannot send a properly encoded XML message talking about it. The only solution is to destroy the bucket and start over.

I haven't tried this with a bucket name, but I suspect that it would be even worse, since there's no "destroy all buckets" command to save you.

2. echoing filenames containing control characters can cause major security holes
(see http://seclists.org/fulldisclosure/2003/Feb/att-341/Termulation.txt for an example)
None of these security issues can bite you if you just use UTF-8 like you're supposed to.

Another thing.

Under certain conditions, RGW will need to sanitize the returned headers so that they are ASCII:

> When metadata is retrieved through the REST API, Amazon S3 combines 
> headers that have the same name (ignoring case) into a comma-delimited 
> list. If some metadata contains unprintable characters, it is not 
> returned. Instead, the "x-amz-missing-meta" header is returned with a 
> value of the number of the unprintable metadata entries.  Each name, value 
> pair must conform to US-ASCII when using REST and UTF-8 when using SOAP or 
> browser-based uploads via POST.  

That's US-ASCII, not UTF-8. Sorry.

And yet another thing! We need to do XML escaping on character such as "<" and ">" (and possibly others; it should be in the XML RFC I guess.)
It seems like we have a lot of work to do here.


Subtasks

Tasks #918: forbid bad bucket namesResolvedColin McCabe

Tasks #919: properly escape XMLResolvedColin McCabe

Tasks #920: validate that key names are 1024-byte long valid UTF-8ResolvedColin McCabe

Tasks #939: properly escape JSON in RGWResolvedColin McCabe

History

#1 Updated by Colin McCabe about 13 years ago

  • Assignee changed from Yehuda Sadeh to Colin McCabe

#2 Updated by Colin McCabe about 13 years ago

I'm not having any luck creating unicode bucket names

cmccabe@metropolis:~/src/ceph/src$ s3amazon create cr?zy

WARNING: Bucket name is not valid for virtual-host style URI access.
Bucket not created.  Use -f option to force the bucket to be created despite
this warning.

but then -f just doesn't work either... with

ERROR: InvalidBucketNameCharacter

I think maybe we can constrain bucket names to good old ASCII? I need to find out what the real constraints are.

#3 Updated by Colin McCabe about 13 years ago

  • Status changed from New to Resolved

Implemented all subtasks.

#4 Updated by Sage Weil about 13 years ago

  • Target version set to v0.26

#5 Updated by Sage Weil about 13 years ago

  • translation missing: en.field_story_points set to 2
  • translation missing: en.field_position set to 588

#6 Updated by Sage Weil about 13 years ago

  • translation missing: en.field_story_points changed from 2 to 3
  • translation missing: en.field_position deleted (588)
  • translation missing: en.field_position set to 588

Also available in: Atom PDF