Project

General

Profile

Bug #16817

multisite segfault on ~RGWRealmWatcher if realm was deleted

Added by Casey Bodley over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
07/26/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

In a running multisite cluster:

$ radosgw-admin realm delete --rgw-realm=dev

the command succeeds, and deletes the realm's control object. this disconnects watchers, as seen in the radosgw log:

2016-07-26 12:22:48.688087 7f3618e40700  1 -- 10.17.151.122:0/1703825118 <== osd.0 10.17.151.122:6800/13262 10 ==== watch-notify(disconnect (3) cookie 41364480 notify 0 ret 0) v3 ==== 42+0+0 (1292098676 0 0) 0x2744800 con 0x2ad4c60
2016-07-26 12:22:48.688175 7f3619742700  4 rgw realm watcher: Disconnected watch on realms.fdde081d-bfeb-48c4-81d7-a1fac0e355f4.control
2016-07-26 12:22:48.690730 7f3619742700 -1 rgw realm watcher: Failed to unwatch on realms.fdde081d-bfeb-48c4-81d7-a1fac0e355f4.control with (2) No such file or directory
2016-07-26 12:22:48.693381 7f3619742700 -1 rgw realm watcher: Failed to restart watch on realms.fdde081d-bfeb-48c4-81d7-a1fac0e355f4.control with (2) No such file or directory

On shutdown, we try to clean up the disconnected watch and hit this segfault:

(gdb) bt
#0  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0x7ffcac9b99a0, 
    __str=<error reading variable: Cannot access memory at address 0x30>)
    at /usr/src/debug/gcc-5.3.1-20160406/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:400
#1  0x00007f367833809f in object_t::object_t (this=0x7ffcac9b99a0) at /home/cbodley/ceph/src/include/object.h:32
#2  0x00007f367837003a in Objecter::Op::Op (this=0x2815600, o=..., ol=..., op=std::vector of length 1, capacity 1 = {...}, f=32, ac=0x0, co=0x7ffcac9b9ca0, 
    ov=0x7ffcac9b9c00, offset=0x0) at /home/cbodley/ceph/src/osdc/Objecter.h:1295
#3  0x00007f36783708f2 in Objecter::prepare_mutate_op (this=0x2824700, oid=..., oloc=..., op=..., snapc=..., mtime=..., flags=0, onack=0x0, 
    oncommit=0x7ffcac9b9ca0, objver=0x7ffcac9b9c00, reqid=...) at /home/cbodley/ceph/src/osdc/Objecter.h:2166
#4  0x00007f3678370a24 in Objecter::mutate (this=0x2824700, oid=..., oloc=..., op=..., snapc=..., mtime=..., flags=0, onack=0x0, oncommit=0x7ffcac9b9ca0, 
    objver=0x7ffcac9b9c00, reqid=...) at /home/cbodley/ceph/src/osdc/Objecter.h:2181
#5  0x00007f367836c92d in librados::IoCtxImpl::unwatch (this=0x2854960, cookie=0) at /home/cbodley/ceph/src/librados/IoCtxImpl.cc:1508
#6  0x00007f3678329388 in librados::IoCtx::unwatch2 (this=0x7ffcac9ba4f8, handle=0) at /home/cbodley/ceph/src/librados/librados.cc:1897
#7  0x0000000000954810 in RGWRealmWatcher::watch_stop (this=0x7ffcac9ba4e0) at /home/cbodley/ceph/src/rgw/rgw_realm_watcher.cc:142
#8  0x0000000000953366 in RGWRealmWatcher::~RGWRealmWatcher (this=0x7ffcac9ba4e0, __in_chrg=<optimized out>) at /home/cbodley/ceph/src/rgw/rgw_realm_watcher.cc:35
#9  0x000000000087ab17 in main (argc=11, argv=0x7ffcac9ba9f8) at /home/cbodley/ceph/src/rgw/rgw_main.cc:445

librados::IoCtxImpl::unwatch() is casting cookie=0 with reinterpret_cast<Objecter::LingerOp*>(cookie), and trying to operate on a null pointer.


Related issues

Copied to rgw - Backport #16864: jewel: multisite segfault on ~RGWRealmWatcher if realm was deleted Resolved

History

#1 Updated by Casey Bodley over 2 years ago

  • Status changed from New to In Progress
  • Assignee set to Casey Bodley
  • Backport set to jewel

#2 Updated by Casey Bodley over 2 years ago

  • Status changed from In Progress to Pending Backport

#3 Updated by Nathan Cutler over 2 years ago

  • Copied to Backport #16864: jewel: multisite segfault on ~RGWRealmWatcher if realm was deleted added

#4 Updated by Loic Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF