Project

General

Profile

Actions

Bug #50977

open

s3select: empty file select failed

Added by Zhiwei Dai almost 3 years ago. Updated over 2 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
rgw, s3select
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

SQL: select * from stdin;

[root@node1 s3select]# python3 test.py
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", line 1346, in getresponse
response.begin()
File "/usr/lib64/python3.6/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python3.6/http/client.py", line 289, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: Date: Wed, 26 May 2021 09:28:10 GMT

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/botocore/httpsession.py", line 320, in send
chunked=self._chunked(request.headers),
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 506, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", line 1346, in getresponse
response.begin()
File "/usr/lib64/python3.6/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python3.6/http/client.py", line 289, in _read_status
raise BadStatusLine(line)
urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine('Date: Wed, 26 May 2021 09:28:10 GMT\r\n',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 56, in <module>
test("select * from stdin;")
File "test.py", line 38, in test
OutputSerialization = {'CSV': {}},
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 663, in _make_api_call
operation_model, request_dict, request_context)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 682, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 137, in _send_request
success_response, exception):
File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 256, in _needs_retry
caught_exception=caught_exception, request_dict=request_dict)
File "/usr/local/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, *kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(
*kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/retryhandler.py", line 183, in call
if self._checker(attempts, response, caught_exception):
File "/usr/local/lib/python3.6/site-packages/botocore/retryhandler.py", line 251, in call
caught_exception)
File "/usr/local/lib/python3.6/site-packages/botocore/retryhandler.py", line 277, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/usr/local/lib/python3.6/site-packages/botocore/retryhandler.py", line 317, in call
caught_exception)
File "/usr/local/lib/python3.6/site-packages/botocore/retryhandler.py", line 223, in call
attempt_number, caught_exception)
File "/usr/local/lib/python3.6/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File "/usr/local/lib/python3.6/site-packages/botocore/httpsession.py", line 351, in send
endpoint_url=request.url
botocore.exceptions.ConnectionClosedError: Connection was closed before we received a valid response from endpoint URL: "http://node1:8000/bucket1/aaa.csv?select&select-type=2".


Files

client_boto3.py (1.34 KB) client_boto3.py Zhiwei Dai, 06/17/2021 03:29 AM
Actions #1

Updated by Casey Bodley almost 3 years ago

  • Assignee set to Gal Salomon
Actions #2

Updated by Gal Salomon almost 3 years ago

not clear what the test is doing.

please do not use stdin, but use s3object.

can you attach test.py?

you can use AWS-CLI (start with local machine), it's simpler for most purposes.

aws s3api select-object-content --endpoint-url http://localhost:8000 --bucket YOUR_BUCKET --key YOUR_CSV_FILE.csv --expression-type 'SQL' --input-serialization '{"CSV": {}, "CompressionType": "NONE"}' --output-serialization '{"CSV": {}}' --expression 'select * from s3object;' "output.csv" && cat output.csv

bucket and file should be accessible and with the correct permission

Actions #3

Updated by Casey Bodley almost 3 years ago

  • Status changed from New to Need More Info
Actions #4

Updated by Zhiwei Dai almost 3 years ago

fault reappearance is realized when using my testcase.

client_boto3.py  empty_s3_obj.csv

I guess dealing with data len 0 is inappropriate in RGWSelectObj_ObjStore_S3::send_response_data.
int RGWSelectObj_ObjStore_S3::send_response_data(bufferlist& bl, off_t ofs, off_t len)
{
  if (len == 0) {
    return 0;
  }

Actions #5

Updated by Zhiwei Dai almost 3 years ago

Sorry, empty file empty_s3_obj.csv cannot be uploaded as attachment. Creating one like this fastly:

touch empty_s3_obj.csv

Actions #6

Updated by Gal Salomon almost 3 years ago

the script is OK.
the use-case of an empty object is not handled correctly.

the code-lines
if (len == 0) {
return 0;
...
handling different use-case (a chunk is empty in the case of a non-empty object)

Actions #7

Updated by Zhiwei Dai over 2 years ago

when selecting a file, just push a SQL to rgw, the user or use-case don't know if the file is empty or not.
So, I think rgw s3select should deal with the empty file. Otherwise, the client will crash.

the same way, get operation of empty object or non-empty object with empty chunk is always successful

int RGWGetObj_ObjStore_S3::send_response_data(bufferlist& bl, off_t bl_ofs,
                          off_t bl_len)
{
...
send_data:
  if (get_data && !op_ret) {
    int r = dump_body(s, bl.c_str() + bl_ofs, bl_len);

    if (r < 0)
      return r;
  }

  return 0;
}

Actions #8

Updated by Gal Salomon over 2 years ago

  • Pull request ID set to 40973

https://github.com/ceph/ceph/pull/40973 (on tests)
is handling the empty object use-case.

upon an empty object, it responds correctly.

Actions #9

Updated by Gal Salomon over 2 years ago

  • Pull request ID changed from 40973 to 42416

the following PR
https://github.com/ceph/ceph/pull/42416

will handle the empty size (combine with more features)

Actions #10

Updated by Matt Benjamin over 2 years ago

  • Status changed from Need More Info to Fix Under Review

Hi Zhiwei Dai,

We've merged Gal's change which, as above, is claimed to address this issue. Can you confirm?

thanks!

Matt

Actions

Also available in: Atom PDF