Actions
Bug #23910
closedobjector: rgw aborts during watch notify when trying to lock
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
seen this in my vstart cluster of late, rgw aborts when running a task like s3-tests with a stacktrace like below:
#1 0x00007fffeaef033a in abort () from /lib64/libc.so.6 #2 0x00007fffeb80c2e5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95 #3 0x00007fffeb80a0d6 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47 #4 0x00007fffeb80a121 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:57 #5 0x00007fffeb80a363 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=tinfo@entry=0x7ffff7264258 <typeinfo for boost::thread_interrupted>, dest=dest@entry=0x0) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:93 #6 0x00007ffff6fb7fc0 in boost::detail::interruption_checker::check_for_interruption (this=0x7fffc4dc68d0) at /ssd/builds/cpp/ceph_mimic/build/boost/include/boost/thread/pthread/thread_data.hpp:192 #7 boost::detail::interruption_checker::interruption_checker (this=0x7fffc4dc68d0, cond_mutex=0x5555567e4768, cond=0x5555567e4790) at /ssd/builds/cpp/ceph_mimic/build/boost/include/boost/thread/pthread/thread_data.hpp:206 #8 0x00007ffff7005d48 in boost::condition_variable::wait (m=..., this=0x5555567e4768) at /ssd/builds/cpp/ceph_mimic/build/boost/include/boost/thread/pthread/condition_variable.hpp:78 #9 boost::shared_mutex::lock (this=this@entry=0x5555567e46e0) at /ssd/builds/cpp/ceph_mimic/build/boost/include/boost/thread/pthread/shared_mutex.hpp:293 #10 0x00007ffff6fded01 in std::unique_lock<boost::shared_mutex>::lock (this=0x7fffc4dc6ab0) at /usr/include/c++/7/bits/std_mutex.h:267 #11 std::unique_lock<boost::shared_mutex>::unique_lock (__m=..., this=0x7fffc4dc6ab0) at /usr/include/c++/7/bits/std_mutex.h:197 #12 Objecter::linger_cancel (this=0x5555567e4600, info=0x555557b97f80) at /ssd/builds/cpp/ceph_mimic/src/osdc/Objecter.cc:747 #13 0x00007ffff6fb04a9 in librados::IoCtxImpl::notify (this=0x555556812c00, oid=..., bl=..., timeout_ms=<optimized out>, preply_bl=<optimized out>, preply_buf=preply_buf@entry=0x0, preply_buf_len=0x0) at /ssd/builds/cpp/ceph_mimic/src/librados/IoCtxImpl.cc:1869 #14 0x00007ffff6f6e2d4 in librados::IoCtx::notify2 (this=this@entry=0x555556944258, oid="notify.0", bl=..., timeout_ms=timeout_ms@entry=0, preplybl=preplybl@entry=0x0) at /ssd/builds/cpp/ceph_mimic/src/librados/librados.cc:2124 #15 0x00005555558ef51f in RGWRados::distribute (this=this@entry=0x555556944000, key="default.rgw.meta+root+.bucket.meta.60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158", bl=...) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_rados.cc:12579 #16 0x000055555596b696 in RGWCache<RGWRados>::distribute_cache (this=this@entry=0x555556944000, normal_name="default.rgw.meta+root+.bucket.meta.60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158", obj=..., obj_info=..., op=op@entry=0) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_cache.h:574 #17 0x000055555596c4d8 in RGWCache<RGWRados>::put_system_obj_impl (this=0x555556944000, obj=..., size=302, mtime=0x0, attrs=std::map with 2 elements = {...}, flags=1, data=..., objv_tracker=0x7fffc4dc8400, set_mtime=...) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_cache.h:455 #18 0x0000555555a0eefb in RGWRados::put_system_obj (set_mtime=..., objv_tracker=0x7fffc4dc8400, attrs=std::map with 2 elements = {...}, mtime=0x0, exclusive=<optimized out>, data=..., obj=..., ctx=0x0, this=0x555556944000) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_rados.h:3110 #19 rgw_put_system_obj (rgwstore=0x555556944000, pool=..., oid=".bucket.meta.60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158", data=..., exclusive=exclusive@entry=false, objv_tracker=objv_tracker@entry=0x7fffc4dc8400, set_mtime=..., pattrs=0x7fffc4dc7860) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_tools.cc:30 #20 0x000055555586f1b1 in RGWMetadataManager::put_entry (this=0x555556733980, handler=0x555556717598, key="60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158", bl=..., exclusive=exclusive@entry=false, objv_tracker=objv_tracker@entry=0x7fffc4dc8400, mtime=..., pattrs=0x7fffc4dc7860) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_metadata.cc:1103 #21 0x00005555557f4727 in rgw_bucket_instance_store_info (store=store@entry=0x555556944000, entry="60bkr2i0-s3testbucket-60bkr2i0167:fc6c99fe-958f-40a6-8357-b2022a0d7df2.4271.158", bl=..., exclusive=exclusive@entry=false, pattrs=pattrs@entry=0x7fffc4dc7860, objv_tracker=objv_tracker@entry=0x7fffc4dc8400, mtime=...) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_bucket.cc:295 #22 0x00005555557f5668 in rgw_bucket_set_attrs (store=0x555556944000, bucket_info=..., attrs=std::map with 2 elements = {...}, objv_tracker=0x7fffc4dc8400) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_bucket.cc:424 #23 0x00005555558b713c in RGWPutACLs::execute (this=0x5555570ae000) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_op.cc:4997 #24 0x00005555558e517d in rgw_process_authenticated (handler=<optimized out>, op=@0x7fffc4dc7ea0: 0x5555570ae000, req=0x7fffc4dc88d0, s=0x7fffc4dc81b0, skip_retarget=<optimized out>) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_process.cc:104 #25 0x00005555558e60cc in process_request (store=0x555556944000, rest=0x7fffffffd710, req=req@entry=0x7fffc4dc88d0, frontend_prefix="", auth_registry=..., client_io=client_io@entry=0x7fffc4dc8900, olog=0x0, http_ret=0x7fffc4dc88cc) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_process.cc:207 #26 0x000055555576618c in RGWCivetWebFrontend::process (this=0x555556af6820, conn=<optimized out>) at /ssd/builds/cpp/ceph_mimic/src/rgw/rgw_civetweb_frontend.cc:36 #27 0x00005555557d509e in handle_request (conn=conn@entry=0x555556b993b0) at /ssd/builds/cpp/ceph_mimic/src/civetweb/src/civetweb.c:12530 #28 0x00005555557d6d88 in process_new_connection (conn=conn@entry=0x555556b993b0) at /ssd/builds/cpp/ceph_mimic/src/civetweb/src/civetweb.c:15943 #29 0x00005555557d7228 in worker_thread_run (thread_args=<optimized out>) at /ssd/builds/cpp/ceph_mimic/src/civetweb/src/civetweb.c:16269 #30 worker_thread (thread_func_param=0x55555671ca20) at /ssd/builds/cpp/ceph_mimic/src/civetweb/src/civetweb.c:16312 #31 0x00007ffff6aec724 in start_thread () from /lib64/libpthread.so.0 #32 0x00007fffeafa6e8d in clone () from /lib64/libc.so.6
Casey mentioned in irc that boost::shared_mutex may support interruption (https://www.boost.org/doc/libs/1_67_0/doc/html/thread/thread_management.html#thread.thread_management.tutorial.interruption)
and switching to std::shared_mutex in Objector.h seems to make these errors go away, still not sure where we're calling the interruptable op that would trigger this error
Updated by Abhishek Lekshmanan almost 6 years ago
- Project changed from rgw to Ceph
- Subject changed from rgw/objector: rgw aborts during watch notify when trying to lock to objector: rgw aborts during watch notify when trying to lock
Updated by Greg Farnum almost 6 years ago
Actions