Project

General

Profile

Actions

Bug #63245

open

rgw/s3select: crashes in test_progress_expressions in run_s3select_on_csv

Added by J. Eric Ivancich 7 months ago. Updated about 2 months ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
Target version:
% Done:

0%

Source:
Tags:
s3select backport_processed
Backport:
quincy reef squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Crashes in the functional testing of s3select on main on 10/19/2023.

Teuthology results:
http://qa-proxy.ceph.com/teuthology/ivancich-2023-10-19_14:26:37-rgw-wip-eric-testing-1-distro-default-smithi/7432363/teuthology.log

Stack trace:

2023-10-19T15:17:33.828 INFO:tasks.rgw.client.0.smithi016.stdout:radosgw: ./src/s3select/include/s3select_csv_parser.h:315: char* CSVParser::next_line(): Assertion `data_begin < data_end' failed.
2023-10-19T15:17:33.828 INFO:tasks.rgw.client.0.smithi016.stdout:*** Caught signal (Aborted) **
2023-10-19T15:17:33.829 INFO:tasks.rgw.client.0.smithi016.stdout: in thread 7f1b72059640 thread_name:radosgw
2023-10-19T15:17:33.836 INFO:tasks.rgw.client.0.smithi016.stdout: ceph version 18.0.0-6773-g3760fae3 (3760fae306efe59523385b538dfa0e949242cb9c) reef (dev)
2023-10-19T15:17:33.836 INFO:tasks.rgw.client.0.smithi016.stdout: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f1be9b27520]
2023-10-19T15:17:33.836 INFO:tasks.rgw.client.0.smithi016.stdout: 2: pthread_kill()
2023-10-19T15:17:33.837 INFO:tasks.rgw.client.0.smithi016.stdout: 3: raise()
2023-10-19T15:17:33.837 INFO:tasks.rgw.client.0.smithi016.stdout: 4: abort()
2023-10-19T15:17:33.837 INFO:tasks.rgw.client.0.smithi016.stdout: 5: /lib/x86_64-linux-gnu/libc.so.6(+0x2871b) [0x7f1be9b0d71b]
2023-10-19T15:17:33.837 INFO:tasks.rgw.client.0.smithi016.stdout: 6: /lib/x86_64-linux-gnu/libc.so.6(+0x39e96) [0x7f1be9b1ee96]
2023-10-19T15:17:33.837 INFO:tasks.rgw.client.0.smithi016.stdout: 7: radosgw(+0x767301) [0x5634b0427301]
2023-10-19T15:17:33.838 INFO:tasks.rgw.client.0.smithi016.stdout: 8: radosgw(+0x769473) [0x5634b0429473]
2023-10-19T15:17:33.838 INFO:tasks.rgw.client.0.smithi016.stdout: 9: radosgw(+0x102e24c) [0x5634b0cee24c]
2023-10-19T15:17:33.838 INFO:tasks.rgw.client.0.smithi016.stdout: 10: radosgw(+0x76bc64) [0x5634b042bc64]
2023-10-19T15:17:33.838 INFO:tasks.rgw.client.0.smithi016.stdout: 11: (RGWSelectObj_ObjStore_S3::run_s3select_on_csv(char const*, char const*, unsigned long)+0x8b3) [0x5634b043c593]
2023-10-19T15:17:33.838 INFO:tasks.rgw.client.0.smithi016.stdout: 12: (RGWSelectObj_ObjStore_S3::csv_processing(ceph::buffer::v15_2_0::list&, long, long)+0x242) [0x5634b04518b2]
2023-10-19T15:17:33.839 INFO:tasks.rgw.client.0.smithi016.stdout: 13: (RGWGetObj_Decompress::handle_data(ceph::buffer::v15_2_0::list&, long, long)+0x267) [0x5634b0315737]
2023-10-19T15:17:33.839 INFO:tasks.rgw.client.0.smithi016.stdout: 14: (get_obj_data::flush(rgw::OwningList<rgw::AioResultEntry>&&)+0x7b8) [0x5634b053bde8]
2023-10-19T15:17:33.839 INFO:tasks.rgw.client.0.smithi016.stdout: 15: (RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)+0x2eb) [0x5634b053f85b]
2023-10-19T15:17:33.839 INFO:tasks.rgw.client.0.smithi016.stdout: 16: (RGWGetObj::execute(optional_yield)+0x1145) [0x5634b035da25]
2023-10-19T15:17:33.839 INFO:tasks.rgw.client.0.smithi016.stdout: 17: (RGWSelectObj_ObjStore_S3::execute(optional_yield)+0xc1) [0x5634b0450b71]
2023-10-19T15:17:33.840 INFO:tasks.rgw.client.0.smithi016.stdout: 18: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0x9dc) [0x5634b0210cac]
2023-10-19T15:17:33.840 INFO:tasks.rgw.client.0.smithi016.stdout: 19: (process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x201e) [0x5634b02247ae]


Related issues 3 (1 open2 closed)

Copied to rgw - Backport #64692: quincy: rgw/s3select: crashes in test_progress_expressions in run_s3select_on_csvRejectedGal SalomonActions
Copied to rgw - Backport #64693: reef: rgw/s3select: crashes in test_progress_expressions in run_s3select_on_csvIn ProgressGal SalomonActions
Copied to rgw - Backport #64694: squid: rgw/s3select: crashes in test_progress_expressions in run_s3select_on_csvResolvedGal SalomonActions
Actions #1

Updated by J. Eric Ivancich 7 months ago

  • Description updated (diff)
Actions #2

Updated by J. Eric Ivancich 7 months ago

  • Description updated (diff)
Actions #3

Updated by Gal Salomon 7 months ago

investigating.

currently, this failure is not reproducible.

Actions #4

Updated by Casey Bodley 5 months ago

  • Status changed from New to Can't reproduce
Actions #5

Updated by Casey Bodley 5 months ago

  • Status changed from Can't reproduce to New

happened again in http://qa-proxy.ceph.com/teuthology/cbodley-2023-12-08_04:50:09-rgw-wip-rgw-sal-acl-owner-distro-default-smithi/7483734/teuthology.log

2023-12-08T19:50:30.381 INFO:tasks.rgw.client.0.smithi080.stdout:*** Caught signal (Segmentation fault) **
2023-12-08T19:50:30.381 INFO:tasks.rgw.client.0.smithi080.stdout: in thread 6dcbd640 thread_name:memcheck-amd64-
2023-12-08T19:50:30.435 INFO:tasks.rgw.client.0.smithi080.stdout: ceph version 19.0.0-42-gc2ece1ef (c2ece1efd0a18b9b4db4477c5aae273826a38988) reef (dev)
2023-12-08T19:50:30.436 INFO:tasks.rgw.client.0.smithi080.stdout: 1: /lib64/libc.so.6(+0x54db0) [0x792ddb0]
2023-12-08T19:50:30.436 INFO:tasks.rgw.client.0.smithi080.stdout: 2: _vgr20181ZZ_libcZdsoZa_memmove()
2023-12-08T19:50:30.436 INFO:tasks.rgw.client.0.smithi080.stdout: 3: radosgw(+0x68c9a5) [0x7949a5]
2023-12-08T19:50:30.436 INFO:tasks.rgw.client.0.smithi080.stdout: 4: radosgw(+0x6a371e) [0x7ab71e]
2023-12-08T19:50:30.436 INFO:tasks.rgw.client.0.smithi080.stdout: 5: (RGWSelectObj_ObjStore_S3::run_s3select_on_csv(char const*, char const*, unsigned long)+0x7ba) [0x7b2e4a]
2023-12-08T19:50:30.436 INFO:tasks.rgw.client.0.smithi080.stdout: 6: (RGWSelectObj_ObjStore_S3::csv_processing(ceph::buffer::v15_2_0::list&, long, long)+0x507) [0x7b5887]
2023-12-08T19:50:30.436 INFO:tasks.rgw.client.0.smithi080.stdout: 7: (RGWGetObj_Decompress::handle_data(ceph::buffer::v15_2_0::list&, long, long)+0x3d6) [0x687676]
2023-12-08T19:50:30.436 INFO:tasks.rgw.client.0.smithi080.stdout: 8: (get_obj_data::flush(rgw::OwningList<rgw::AioResultEntry>&&)+0x7b8) [0x8b3c68]
2023-12-08T19:50:30.437 INFO:tasks.rgw.client.0.smithi080.stdout: 9: (RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)+0x2eb) [0x8b7e7b]
2023-12-08T19:50:30.437 INFO:tasks.rgw.client.0.smithi080.stdout: 10: (RGWGetObj::execute(optional_yield)+0x11cf) [0x6cf57f]
2023-12-08T19:50:30.437 INFO:tasks.rgw.client.0.smithi080.stdout: 11: (RGWSelectObj_ObjStore_S3::execute(optional_yield)+0xc1) [0x7b7ad1]
2023-12-08T19:50:30.437 INFO:tasks.rgw.client.0.smithi080.stdout: 12: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)+0xa6a) [0x582d7a]
2023-12-08T19:50:30.437 INFO:tasks.rgw.client.0.smithi080.stdout: 13: (process_request(RGWProcessEnv const&, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0xf7d) [0x58693d]
2023-12-08T19:50:30.437 INFO:tasks.rgw.client.0.smithi080.stdout: 14: radosgw(+0xc65d70) [0xd6dd70]
2023-12-08T19:50:30.437 INFO:tasks.rgw.client.0.smithi080.stdout: 15: radosgw(+0x3cf746) [0x4d7746]
2023-12-08T19:50:30.437 INFO:tasks.rgw.client.0.smithi080.stdout: 16: make_fcontext()

notably, i'm seeing RGWGetObj_Decompress::handle_data in the backtraces which shows that compression is enabled

Actions #7

Updated by Gal Salomon 4 months ago

i did not succeed to re-produce this issue.
but, QE did discover an issue similar to that.
hopefully, it is the same.

Actions #8

Updated by Casey Bodley 4 months ago

thanks Gal. are you enabling compression in your reproducer?

https://docs.ceph.com/en/latest/radosgw/compression/#configuration shows an example for zlib. you'd just need to restart radosgw after running that command

Actions #9

Updated by Gal Salomon 4 months ago

does that crash happen upon using compression?
when the compression is used? is it observed in the log?

if that is the case, it is probably a different issue from the QE issue.

i will try to reproduce as you mentioned here.

Actions #10

Updated by Gal Salomon 4 months ago

with https://docs.ceph.com/en/latest/radosgw/compression/#configuration
it is not reproduced, (actually, it did not compress the object)

with
vstart .... --rgw_compression zlib
the crash is re-produced

it seems similar to the QE bug (huge/very-big CSV objects)

Actions #11

Updated by Casey Bodley 3 months ago

any updates here? would be nice to fix for squid

Actions #12

Updated by Gal Salomon 2 months ago

https://github.com/ceph/ceph/pull/55891

this PR removes the assert residing in the CSV-parser and replaces it with exceptions.
RGW will report on error, and reject the request.

note:
testing the PR I noticed the following.
upon compression is set in RGW;
the `len` and `bufferlist::it.length()` are not correlated (contrary to non-compression state)
the callback `send_response_data` returns sometimes {len > it.length()}
this may cause wrong pointers calculation(on csv-parser) followed by an assert(and crash)

it seems that `it.length()` is the correct size, it needs to be verified.

Actions #13

Updated by Gal Salomon 2 months ago

ignore the previous comment.

the exception (before that it was an assert) was caused by a small-size chunk.
the size of the chunk is smaller than the row size, which leads to wrong pointer arithmetic.
(the assumption was that a row may split between 2 chunks, not more)

Actions #14

Updated by Casey Bodley about 2 months ago

  • Status changed from New to Fix Under Review
  • Tags set to s3select
  • Backport set to quincy reef squid
  • Pull request ID set to 55891
Actions #15

Updated by Gal Salomon about 2 months ago

  • Status changed from Fix Under Review to New

the PR is fixing the issue of too-small-chunk (flow was changed to append these small chunks)
thus, upon compression setup that might lead to too-small-chunk, it will aggregate these chunks until a complete row and will process that.

Actions #16

Updated by Casey Bodley about 2 months ago

  • Status changed from New to Pending Backport
Actions #17

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64692: quincy: rgw/s3select: crashes in test_progress_expressions in run_s3select_on_csv added
Actions #18

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64693: reef: rgw/s3select: crashes in test_progress_expressions in run_s3select_on_csv added
Actions #19

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64694: squid: rgw/s3select: crashes in test_progress_expressions in run_s3select_on_csv added
Actions #20

Updated by Backport Bot about 2 months ago

  • Tags changed from s3select to s3select backport_processed
Actions

Also available in: Atom PDF