Project

General

Profile

Actions

Bug #61638

closed

RGW crashed on getObject::range-request for some of the TPCDS queries

Added by Gal Salomon 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

the following crash stack appears constantly per some of the TPCDS queries

FILE :: ./ceph/src/rgw/rgw_aio_throttle.h:48
ASSERT :: ceph::__ceph_assert_fail (assertion=0x559c23db6145 "pending.empty()", file=<optimized out>, line=48, func=0x559c23db9db0 "virtual rgw::Throttle::~Throttle()")
at /home/gsalomon/work/ceph-ws2/ceph/src/common/assert.cc:75

stack trace:

[Current thread is 1 (Thread 0x7f58bb76a640 (LWP 590450))]
(gdb) bt
#0 _pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007f5a4948ec53 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007f5a4943e956 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#3 0x0000559c237e2d21 in reraise_fatal (signum=signum@entry=6) at /home/gsalomon/work/ceph-ws2/ceph/src/global/signal_handler.cc:88
#4 0x0000559c237e45bd in handle_oneshot_fatal_signal (signum=6) at /home/gsalomon/work/ceph-ws2/ceph/src/global/signal_handler.cc:363
#5 <signal handler called>
#6 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#7 0x00007f5a4948ec53 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#8 0x00007f5a4943e956 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#9 0x00007f5a494287f4 in __GI_abort () at abort.c:79
#10 0x00007f5a4adabf05 in ceph::
_ceph_assert_fail (assertion=0x559c23db6145 "pending.empty()", file=<optimized out>, line=48, func=0x559c23db9db0 "virtual rgw::Throttle::~Throttle()")
at /home/gsalomon/work/ceph-ws2/ceph/src/common/assert.cc:75
#11 0x00007f5a4adabfd0 in ceph::__ceph_assert_fail (ctx=...) at /home/gsalomon/work/ceph-ws2/ceph/src/common/assert.cc:80
#12 0x0000559c231311be in rgw::Throttle::~Throttle (this=this@entry=0x7f578c040f88, _in_chrg=<optimized out>) at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/rgw_aio_throttle.h:48
#13 0x0000559c231a7352 in rgw::YieldingAioThrottle::~YieldingAioThrottle (this=0x7f578c040f80, __in_chrg=<optimized out>) at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/rgw_aio_throttle.h:102
#14 0x0000559c231a737d in rgw::YieldingAioThrottle::~YieldingAioThrottle (this=0x7f578c040f80, __in_chrg=<optimized out>) at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/rgw_aio_throttle.h:102
#15 0x0000559c23425518 in std::default_delete<rgw::Aio>::operator() (
_ptr=<optimized out>, this=<optimized out>) at /usr/include/c++/12/bits/unique_ptr.h:89
#16 std::unique_ptr<rgw::Aio, std::default_delete<rgw::Aio> >::~unique_ptr (this=0x7f5a0c36d488, __in_chrg=<optimized out>) at /usr/include/c++/12/bits/unique_ptr.h:396
#17 0x0000559c2340baf2 in RGWRados::Object::Read::iterate (this=<optimized out>, dpp=<optimized out>, ofs=335544320, end=402654207, cb=0x7f5a0c36d600,
y=<error reading variable: DWARF-2 expression error: DW_OP_GNU_uninit must always be the very last op.>) at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/driver/rados/rgw_rados.cc:6648
#18 0x0000559c2344b7fa in rgw::sal::RadosObject::RadosReadOp::iterate (this=<optimized out>, dpp=<optimized out>, ofs=<optimized out>, end=<optimized out>, cb=<optimized out>, y=...)
at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/driver/rados/rgw_sal_rados.cc:2370
#19 0x0000559c2322e318 in RGWGetObj::execute (this=this@entry=0x7f57cc0d18c0, y=...) at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/rgw_op.cc:2393
#20 0x0000559c232de54c in RGWSelectObj_ObjStore_S3::range_request (this=this@entry=0x7f57cc0d18c0, ofs=335544320, len=67109888, buff=buff@entry=0x0, y=...)
at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/rgw_s3select.cc:693
#21 0x0000559c232de9dc in RGWSelectObj_ObjStore_S3::execute (this=0x7f57cc0d18c0, y=...) at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/rgw_s3select.cc:737
#22 0x0000559c230c4570 in rgw_process_authenticated (handler=handler@entry=0x7f57cc01d7c0, op=@0x7f5a0c36da78: 0x7f57cc0d18c0, req=req@entry=0x7f5a0c36e710, s=0x7f5a0c36db80, y=...,
driver=0x559c25fae6e0, skip_retarget=false) at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/rgw_process.cc:255
#23 0x0000559c230c5f1c in process_request (penv=..., req=req@entry=0x7f5a0c36e710, frontend_prefix="", client_io=client_io@entry=0x7f5a0c36e7c0, yield=..., scheduler=0x559c260cec68,
user=0x7f5a0c36e920, latency=0x7f5a0c36e6e8, http_ret=0x7f5a0c36e6e4) at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/rgw_process.cc:392
#24 0x0000559c23030125 in (anonymous namespace)::handle_connection<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > > (context=..., env=..., stream=..., timeout=..., header_limit=<optimized out>, buffer=..., is_ssl=<optimized out>, pause_mutex=..., scheduler=<optimized out>, uri_prefix=..., ec=...,
yield=...) at /home/gsalomon/work/ceph-ws2/ceph/src/rgw/rgw_asio_frontend.cc:284

Actions #1

Updated by Gal Salomon 11 months ago

the crash is not related to get-object by range.
it is related to s3select-memory-mang, it throws an exception(out-of-memory). this should end with query-abort-execution.
the s3select does not show in the crash stack trace.

the resolution is to modify the memory-mang(allocator)

Actions #2

Updated by Gal Salomon 11 months ago

  • Status changed from New to Resolved
  • Pull request ID set to 52186
Actions

Also available in: Atom PDF