Actions
Bug #16817
closedmultisite segfault on ~RGWRealmWatcher if realm was deleted
% Done:
0%
Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
In a running multisite cluster:
$ radosgw-admin realm delete --rgw-realm=dev
the command succeeds, and deletes the realm's control object. this disconnects watchers, as seen in the radosgw log:
2016-07-26 12:22:48.688087 7f3618e40700 1 -- 10.17.151.122:0/1703825118 <== osd.0 10.17.151.122:6800/13262 10 ==== watch-notify(disconnect (3) cookie 41364480 notify 0 ret 0) v3 ==== 42+0+0 (1292098676 0 0) 0x2744800 con 0x2ad4c60 2016-07-26 12:22:48.688175 7f3619742700 4 rgw realm watcher: Disconnected watch on realms.fdde081d-bfeb-48c4-81d7-a1fac0e355f4.control 2016-07-26 12:22:48.690730 7f3619742700 -1 rgw realm watcher: Failed to unwatch on realms.fdde081d-bfeb-48c4-81d7-a1fac0e355f4.control with (2) No such file or directory 2016-07-26 12:22:48.693381 7f3619742700 -1 rgw realm watcher: Failed to restart watch on realms.fdde081d-bfeb-48c4-81d7-a1fac0e355f4.control with (2) No such file or directory
On shutdown, we try to clean up the disconnected watch and hit this segfault:
(gdb) bt #0 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0x7ffcac9b99a0, __str=<error reading variable: Cannot access memory at address 0x30>) at /usr/src/debug/gcc-5.3.1-20160406/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:400 #1 0x00007f367833809f in object_t::object_t (this=0x7ffcac9b99a0) at /home/cbodley/ceph/src/include/object.h:32 #2 0x00007f367837003a in Objecter::Op::Op (this=0x2815600, o=..., ol=..., op=std::vector of length 1, capacity 1 = {...}, f=32, ac=0x0, co=0x7ffcac9b9ca0, ov=0x7ffcac9b9c00, offset=0x0) at /home/cbodley/ceph/src/osdc/Objecter.h:1295 #3 0x00007f36783708f2 in Objecter::prepare_mutate_op (this=0x2824700, oid=..., oloc=..., op=..., snapc=..., mtime=..., flags=0, onack=0x0, oncommit=0x7ffcac9b9ca0, objver=0x7ffcac9b9c00, reqid=...) at /home/cbodley/ceph/src/osdc/Objecter.h:2166 #4 0x00007f3678370a24 in Objecter::mutate (this=0x2824700, oid=..., oloc=..., op=..., snapc=..., mtime=..., flags=0, onack=0x0, oncommit=0x7ffcac9b9ca0, objver=0x7ffcac9b9c00, reqid=...) at /home/cbodley/ceph/src/osdc/Objecter.h:2181 #5 0x00007f367836c92d in librados::IoCtxImpl::unwatch (this=0x2854960, cookie=0) at /home/cbodley/ceph/src/librados/IoCtxImpl.cc:1508 #6 0x00007f3678329388 in librados::IoCtx::unwatch2 (this=0x7ffcac9ba4f8, handle=0) at /home/cbodley/ceph/src/librados/librados.cc:1897 #7 0x0000000000954810 in RGWRealmWatcher::watch_stop (this=0x7ffcac9ba4e0) at /home/cbodley/ceph/src/rgw/rgw_realm_watcher.cc:142 #8 0x0000000000953366 in RGWRealmWatcher::~RGWRealmWatcher (this=0x7ffcac9ba4e0, __in_chrg=<optimized out>) at /home/cbodley/ceph/src/rgw/rgw_realm_watcher.cc:35 #9 0x000000000087ab17 in main (argc=11, argv=0x7ffcac9ba9f8) at /home/cbodley/ceph/src/rgw/rgw_main.cc:445
librados::IoCtxImpl::unwatch() is casting cookie=0 with reinterpret_cast<Objecter::LingerOp*>(cookie), and trying to operate on a null pointer.
Updated by Casey Bodley over 7 years ago
- Status changed from New to In Progress
- Assignee set to Casey Bodley
- Backport set to jewel
Updated by Casey Bodley over 7 years ago
- Status changed from In Progress to Pending Backport
Updated by Nathan Cutler over 7 years ago
- Copied to Backport #16864: jewel: multisite segfault on ~RGWRealmWatcher if realm was deleted added
Updated by Loïc Dachary over 7 years ago
- Status changed from Pending Backport to Resolved
Actions