Project

General

Profile

Actions

Bug #23910

closed

objector: rgw aborts during watch notify when trying to lock

Added by Abhishek Lekshmanan almost 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

seen this in my vstart cluster of late, rgw aborts when running a task like s3-tests with a stacktrace like below:

#1  0x00007fffeaef033a in abort () from /lib64/libc.so.6
#2  0x00007fffeb80c2e5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007fffeb80a0d6 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007fffeb80a121 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007fffeb80a363 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=tinfo@entry=0x7ffff7264258 <typeinfo for boost::thread_interrupted>, dest=dest@entry=0x0)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:93
#6  0x00007ffff6fb7fc0 in boost::detail::interruption_checker::check_for_interruption (this=0x7fffc4dc68d0)
    at /ssd/builds/cpp/ceph_mimic/build/boost/include/boost/thread/pthread/thread_data.hpp:192
#7  boost::detail::interruption_checker::interruption_checker (this=0x7fffc4dc68d0, cond_mutex=0x5555567e4768, cond=0x5555567e4790)
    at /ssd/builds/cpp/ceph_mimic/build/boost/include/boost/thread/pthread/thread_data.hpp:206
#8  0x00007ffff7005d48 in boost::condition_variable::wait (m=..., this=0x5555567e4768) at /ssd/builds/cpp/ceph_mimic/build/boost/include/boost/thread/pthread/condition_variable.hpp:78
#9  boost::shared_mutex::lock (this=this@entry=0x5555567e46e0) at /ssd/builds/cpp/ceph_mimic/build/boost/include/boost/thread/pthread/shared_mutex.hpp:293
#10 0x00007ffff6fded01 in std::unique_lock<boost::shared_mutex>::lock (this=0x7fffc4dc6ab0) at /usr/include/c++/7/bits/std_mutex.h:267
#11 std::unique_lock<boost::shared_mutex>::unique_lock (__m=..., this=0x7fffc4dc6ab0) at /usr/include/c++/7/bits/std_mutex.h:197
#12 Objecter::linger_cancel (this=0x5555567e4600, info=0x555557b97f80) at /ssd/builds/cpp/ceph_mimic/src/osdc/Objecter.cc:747
#13 0x00007ffff6fb04a9 in librados::IoCtxImpl::notify (this=0x555556812c00, oid=..., bl=..., timeout_ms=<optimized out>, preply_bl=<optimized out>, preply_buf=preply_buf@entry=0x0,
    preply_buf_len=0x0) at /ssd/builds/cpp/ceph_mimic/src/librados/IoCtxImpl.cc:1869
#14 0x00007ffff6f6e2d4 in librados::IoCtx::notify2 (this=this@entry=0x555556944258, oid="notify.0", bl=..., timeout_ms=timeout_ms@entry=0, preplybl=preplybl@entry=0x0)
    at /ssd/builds/cpp/ceph_mimic/src/librados/librados.cc:2124
#15 0x00005555558ef51f in RGWRados::distribute (this=this@entry=0x555556944000,
    key="default.rgw.meta+root+.bucket.meta.60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158", bl=...) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_rados.cc:12579
#16 0x000055555596b696 in RGWCache<RGWRados>::distribute_cache (this=this@entry=0x555556944000,
    normal_name="default.rgw.meta+root+.bucket.meta.60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158", obj=..., obj_info=..., op=op@entry=0)
    at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_cache.h:574
#17 0x000055555596c4d8 in RGWCache<RGWRados>::put_system_obj_impl (this=0x555556944000, obj=..., size=302, mtime=0x0, attrs=std::map with 2 elements = {...}, flags=1, data=...,
    objv_tracker=0x7fffc4dc8400, set_mtime=...) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_cache.h:455
#18 0x0000555555a0eefb in RGWRados::put_system_obj (set_mtime=..., objv_tracker=0x7fffc4dc8400, attrs=std::map with 2 elements = {...}, mtime=0x0, exclusive=<optimized out>, data=...,
    obj=..., ctx=0x0, this=0x555556944000) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_rados.h:3110
#19 rgw_put_system_obj (rgwstore=0x555556944000, pool=..., oid=".bucket.meta.60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158", data=...,
    exclusive=exclusive@entry=false, objv_tracker=objv_tracker@entry=0x7fffc4dc8400, set_mtime=..., pattrs=0x7fffc4dc7860) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_tools.cc:30
#20 0x000055555586f1b1 in RGWMetadataManager::put_entry (this=0x555556733980, handler=0x555556717598, key="60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158",
    bl=..., exclusive=exclusive@entry=false, objv_tracker=objv_tracker@entry=0x7fffc4dc8400, mtime=..., pattrs=0x7fffc4dc7860) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_metadata.cc:1103
#21 0x00005555557f4727 in rgw_bucket_instance_store_info (store=store@entry=0x555556944000, entry="60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158", bl=...,
    exclusive=exclusive@entry=false, pattrs=pattrs@entry=0x7fffc4dc7860, objv_tracker=objv_tracker@entry=0x7fffc4dc8400, mtime=...) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_bucket.cc:295
#22 0x00005555557f5668 in rgw_bucket_set_attrs (store=0x555556944000, bucket_info=..., attrs=std::map with 2 elements = {...}, objv_tracker=0x7fffc4dc8400)
    at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_bucket.cc:424
#23 0x00005555558b713c in RGWPutACLs::execute (this=0x5555570ae000) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_op.cc:4997
#24 0x00005555558e517d in rgw_process_authenticated (handler=<optimized out>, op=@0x7fffc4dc7ea0: 0x5555570ae000, req=0x7fffc4dc88d0, s=0x7fffc4dc81b0, skip_retarget=<optimized out>)
    at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_process.cc:104
#25 0x00005555558e60cc in process_request (store=0x555556944000, rest=0x7fffffffd710, req=req@entry=0x7fffc4dc88d0, frontend_prefix="", auth_registry=...,
    client_io=client_io@entry=0x7fffc4dc8900, olog=0x0, http_ret=0x7fffc4dc88cc) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_process.cc:207
#26 0x000055555576618c in RGWCivetWebFrontend::process (this=0x555556af6820, conn=<optimized out>) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_civetweb_frontend.cc:36
#27 0x00005555557d509e in handle_request (conn=conn@entry=0x555556b993b0) at /ssd/builds/cpp/ceph_mimic/src/civetweb/src/civetweb.c:12530
#28 0x00005555557d6d88 in process_new_connection (conn=conn@entry=0x555556b993b0) at /ssd/builds/cpp/ceph_mimic/src/civetweb/src/civetweb.c:15943
#29 0x00005555557d7228 in worker_thread_run (thread_args=<optimized out>) at /ssd/builds/cpp/ceph_mimic/src/civetweb/src/civetweb.c:16269
#30 worker_thread (thread_func_param=0x55555671ca20) at /ssd/builds/cpp/ceph_mimic/src/civetweb/src/civetweb.c:16312
#31 0x00007ffff6aec724 in start_thread () from /lib64/libpthread.so.0
#32 0x00007fffeafa6e8d in clone () from /lib64/libc.so.6

Casey mentioned in irc that boost::shared_mutex may support interruption (https://www.boost.org/doc/libs/1_67_0/doc/html/thread/thread_management.html#thread.thread_management.tutorial.interruption)
and switching to std::shared_mutex in Objector.h seems to make these errors go away, still not sure where we're calling the interruptable op that would trigger this error

Actions #1

Updated by Abhishek Lekshmanan almost 6 years ago

  • Project changed from rgw to Ceph
  • Subject changed from rgw/objector: rgw aborts during watch notify when trying to lock to objector: rgw aborts during watch notify when trying to lock
Actions #2

Updated by Sage Weil almost 6 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF