Project

General

Profile

Actions

Bug #8702

closed

RadosGW incorrectly converting + to space in URLs

Added by Brian Rak almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
firefly
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

(see also ceph-users mailing list thread titled 'Problem with RadosGW and special characters')

RadosGW is incorrectly converting + symbols to spaces within filenames. This prevents it from working, unless the client explicitly encodes these first. Encoding plus symbols is only required when submitting a form, so decoding these to spaces is not the correct behavior for filenames. (see http://stackoverflow.com/questions/1005676/urls-and-plus-signs )

As an example:

---request begin---
HEAD /ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)

Will fail with a 404 error, but

---request begin---
HEAD /ubuntu/pool/main/a/adduser/adduser_3.113%2Bnmu3ubuntu3_all.deb
HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)

will work properly.

Debug mode on RadosGW shows this:

2014-06-25 17:30:37.383029 7f7ca7fff700 20 RGWWQ:
2014-06-25 17:30:37.383040 7f7ca7fff700 20 req: 0x7f7ca000b180
2014-06-25 17:30:37.383053 7f7ca7fff700 10 allocated request req=0x7f7ca0015ef0
2014-06-25 17:30:37.383064 7f7c6cfa9700 20 dequeued request req=0x7f7ca000b180
2014-06-25 17:30:37.383070 7f7c6cfa9700 20 RGWWQ: empty
2014-06-25 17:30:37.383121 7f7c6cfa9700 20 CONTENT_LENGTH=
2014-06-25 17:30:37.383123 7f7c6cfa9700 20 CONTENT_TYPE=
2014-06-25 17:30:37.383124 7f7c6cfa9700 20 DOCUMENT_ROOT=/etc/nginx/html
2014-06-25 17:30:37.383125 7f7c6cfa9700 20 DOCUMENT_URI=/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb
2014-06-25 17:30:37.383126 7f7c6cfa9700 20 FCGI_ROLE=RESPONDER
2014-06-25 17:30:37.383127 7f7c6cfa9700 20 GATEWAY_INTERFACE=CGI/1.1
2014-06-25 17:30:37.383128 7f7c6cfa9700 20 HTTP_ACCEPT=*/*
2014-06-25 17:30:37.383129 7f7c6cfa9700 20 HTTP_CONNECTION=Keep-Alive
2014-06-25 17:30:37.383129 7f7c6cfa9700 20 HTTP_HOST=xxx
2014-06-25 17:30:37.383130 7f7c6cfa9700 20 HTTP_USER_AGENT=Wget/1.12 (linux-gnu)
2014-06-25 17:30:37.383131 7f7c6cfa9700 20 QUERY_STRING=
2014-06-25 17:30:37.383131 7f7c6cfa9700 20 REDIRECT_STATUS=200
2014-06-25 17:30:37.383132 7f7c6cfa9700 20 REMOTE_ADDR=yyy
2014-06-25 17:30:37.383133 7f7c6cfa9700 20 REMOTE_PORT=43855
2014-06-25 17:30:37.383134 7f7c6cfa9700 20 REQUEST_METHOD=HEAD
2014-06-25 17:30:37.383134 7f7c6cfa9700 20 REQUEST_URI=/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb
2014-06-25 17:30:37.383135 7f7c6cfa9700 20 SCRIPT_NAME=/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb
2014-06-25 17:30:37.383136 7f7c6cfa9700 20 SERVER_ADDR=yyy
2014-06-25 17:30:37.383136 7f7c6cfa9700 20 SERVER_NAME=xxx
2014-06-25 17:30:37.383137 7f7c6cfa9700 20 SERVER_PORT=80
2014-06-25 17:30:37.383138 7f7c6cfa9700 20 SERVER_PROTOCOL=HTTP/1.0
2014-06-25 17:30:37.383138 7f7c6cfa9700 20 SERVER_SOFTWARE=nginx/1.4.6
2014-06-25 17:30:37.383140 7f7c6cfa9700 1 ====== starting new request req=0x7f7ca000b180 =====
2014-06-25 17:30:37.383152 7f7c6cfa9700 2 req 1:0.000013::HEAD /ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb::initializing
2014-06-25 17:30:37.383158 7f7c6cfa9700 10 host=xxxx rgw_dns_name=xxxx
2014-06-25 17:30:37.383199 7f7c6cfa9700 10 s->object=ubuntu/pool/main/a/adduser/adduser_3.113 nmu3ubuntu3_all.deb s->bucket=ubuntu
2014-06-25 17:30:37.383207 7f7c6cfa9700 2 req 1:0.000068:s3:HEAD /ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb::getting op
2014-06-25 17:30:37.383211 7f7c6cfa9700 2 req 1:0.000072:s3:HEAD /ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb:get_obj:authorizing
2014-06-25 17:30:37.383218 7f7c6cfa9700 2 req 1:0.000079:s3:HEAD /ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb:get_obj:reading permissions
2014-06-25 17:30:37.383268 7f7c6cfa9700 20 get_obj_state: rctx=0x7f7c6cfa8640 obj=.rgw:ubuntu state=0x7f7c6800c0a8 s->prefetch_data=0
2014-06-25 17:30:37.383279 7f7c6cfa9700 10 cache get: name=.rgw+ubuntu : miss

of note are the following lines:

2014-06-25 17:30:37.383134 7f7c6cfa9700 20 REQUEST_URI=/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb
2014-06-25 17:30:37.383199 7f7c6cfa9700 10 s->object=ubuntu/pool/main/a/adduser/adduser_3.113 nmu3ubuntu3_all.deb s->bucket=ubuntu

The webserver passes in the correct REQUEST_URI, after which radosgw replaces the plus with a space. I've also reproduced this exactly with Apache, so it's not specific to Nginx config.

The radosgw behavior here does not match what other webservers are doing, and is causing compatibility issues (I cannot point a Ubuntu installer at a repository stored in Ceph, as it doesn't unnecessarily encode the plus signs).

This is a test just of nginx, without radosgw, which shows the correct behavior:

echo space > "test file"

Then, from a client:
$ wget --spider "http://example.com/test/test file"
Spider mode enabled. Check if remote file exists.
--2014-06-26 11:46:54-- http://example.com/test/test%20file
Connecting to example.com:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6 [application/octet-stream]
Remote file exists.

$ wget --spider "http://example.com/test/test+file"
Spider mode enabled. Check if remote file exists.
--2014-06-26 11:46:57-- http://example.com/test/test+file
Connecting to example.com:80... connected.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!

This was tested with ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)

Actions #1

Updated by Brian Rak almost 10 years ago

Looks like this is ultimately caused by line 1227 in rgw_rest.cc

int RGWREST::preprocess(struct req_state *s, RGWClientIO *cio)
...
url_decode(s->info.request_uri, s->decoded_uri);

Where url_decode just assumes the entire URL can be handled the same way:

bool url_decode(string& src_str, string& dest_str)
{
  const char *src = src_str.c_str();
  char dest[src_str.size() + 1];
  int pos = 0;
  char c;

  while (*src) {
    if (*src != '%') {
      if (*src != '+') {
    dest[pos++] = *src++;
      } else {
    dest[pos++] = ' ';
    ++src;
      }
    } else {
      src++;
      if (!*src)
        break;
      char c1 = hex_to_num(*src++);
      if (!*src)
        break;
      c = c1 << 4;
      if (c1 < 0)
        return false;
      c1 = hex_to_num(*src++);
      if (c1 < 0)
        return false;
      c |= c1;
      dest[pos++] = c;
    }
  }
  dest[pos] = 0;
  dest_str = dest;

  return true;
}

Definitely a bit beyond my abilities to fix, since this would need to start parsing the entire URL.

Actions #3

Updated by Yehuda Sadeh almost 10 years ago

The problem seem to be with the the web server itself here that doesn't send the REQUEST_URI url encoded as it should. It might be a configuration issue, or just a general issue with nginx. Change like the one you propose will break any other web server that we're using. A different change that you might want to provide would be to make it configurable whether to do a url_decode() at that point or not.

Actions #4

Updated by Brian Rak almost 10 years ago

We talked about this on the mailing list. I can reproduce this at will on apache as well.

Can you take a look at the mailing list thread? http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040936.html

Specifically, http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040955.html

The root of the issue is that:

REQUEST_URI=/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb

is URL encoded. It is not necessary to replace + signs, unless you're in query arguments. Please see the stackoverflow link.

Actions #5

Updated by Yehuda Sadeh almost 10 years ago

  • Backport set to firefly

ok, I see it now.

Actions #6

Updated by Yehuda Sadeh almost 10 years ago

  • Status changed from New to Pending Backport
Actions #7

Updated by Sage Weil over 9 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF