Bug #908

RGW allows users to create buckets and objects with invalid names

Added by Colin McCabe over 9 years ago. Updated over 9 years ago.

Target version:
% Done:


3 - minor
Affected Versions:
Pull request ID:
Crash signature:


From the "Amazon Simple Storage Service Developer Guide", API Version 2006-03-01: ("Object Key and Metadata")

"The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long." 

We need to make sure all key names and bucket names are valid UTF-8, with an encoding no longer than 1024 characters. Two good libraries to do this are Glib::ustring or libICU. Unfortunately, RGW currently has no knowledge of unicode at all!

This is a serious problem for several reasons:
1. S3 clients will choke on getting a bucket-list that has invalid key names in it.

For example, try creating an object with a control character in the name. After one of these baddies has been created in an RGW bucket, it's impossible to list the objects in this bucket. You can get back a list of object names, but your s3 client will choke on it. It's also impossible to destroy such an object because you cannot send a properly encoded XML message talking about it. The only solution is to destroy the bucket and start over.

I haven't tried this with a bucket name, but I suspect that it would be even worse, since there's no "destroy all buckets" command to save you.

2. echoing filenames containing control characters can cause major security holes
(see for an example)
None of these security issues can bite you if you just use UTF-8 like you're supposed to.

Another thing.

Under certain conditions, RGW will need to sanitize the returned headers so that they are ASCII:

> When metadata is retrieved through the REST API, Amazon S3 combines 
> headers that have the same name (ignoring case) into a comma-delimited 
> list. If some metadata contains unprintable characters, it is not 
> returned. Instead, the "x-amz-missing-meta" header is returned with a 
> value of the number of the unprintable metadata entries.  Each name, value 
> pair must conform to US-ASCII when using REST and UTF-8 when using SOAP or 
> browser-based uploads via POST.  

That's US-ASCII, not UTF-8. Sorry.

And yet another thing! We need to do XML escaping on character such as "<" and ">" (and possibly others; it should be in the XML RFC I guess.)
It seems like we have a lot of work to do here.


Tasks #918: forbid bad bucket namesResolvedColin McCabe

Tasks #919: properly escape XMLResolvedColin McCabe

Tasks #920: validate that key names are 1024-byte long valid UTF-8ResolvedColin McCabe

Tasks #939: properly escape JSON in RGWResolvedColin McCabe


#1 Updated by Colin McCabe over 9 years ago

  • Assignee changed from Yehuda Sadeh to Colin McCabe

#2 Updated by Colin McCabe over 9 years ago

I'm not having any luck creating unicode bucket names

cmccabe@metropolis:~/src/ceph/src$ s3amazon create cr?zy

WARNING: Bucket name is not valid for virtual-host style URI access.
Bucket not created.  Use -f option to force the bucket to be created despite
this warning.

but then -f just doesn't work either... with

ERROR: InvalidBucketNameCharacter

I think maybe we can constrain bucket names to good old ASCII? I need to find out what the real constraints are.

#3 Updated by Colin McCabe over 9 years ago

  • Status changed from New to Resolved

Implemented all subtasks.

#4 Updated by Sage Weil over 9 years ago

  • Target version set to v0.26

#5 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_story_points set to 2
  • translation missing: en.field_position set to 588

#6 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_story_points changed from 2 to 3
  • translation missing: en.field_position deleted (588)
  • translation missing: en.field_position set to 588

Also available in: Atom PDF