Project

General

Profile

Actions

Bug #18963

closed

rbd-mirror: forced failover does not function when peer is unreachable

Added by Jason Dillaman about 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When a local image is force promoted to primary, the local rbd-mirror daemon should detect that the local images are now primary and shut-down the image replayers (and release the exclusive lock). However, if the remote peer is unreachable, it can result in deadlock and the image replayers will not shut down correctly.

#0  0x00007f96db88b6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f96dc6c7ad1 in Wait (mutex=..., this=0x7f9636ff9da0) at common/Cond.h:56
#2  librados::IoCtxImpl::operate_read (this=this@entry=0x7f96efdfb050, oid=..., o=o@entry=0x7f9636ff9fc0, pbl=pbl@entry=0x7f9636ffa180, flags=flags@entry=0) at librados/IoCtxImpl.cc:725
#3  0x00007f96dc6d25d3 in librados::IoCtxImpl::exec (this=0x7f96efdfb050, oid=..., cls=cls@entry=0x7f96e649f4c7 "rbd", method=method@entry=0x7f96e64e42e7 "mirror_mode_get", inbl=..., outbl=...) at librados/IoCtxImpl.cc:1135
#4  0x00007f96dc681a74 in librados::IoCtx::exec (this=this@entry=0x7f96efdfb710, oid="rbd_mirroring", cls=cls@entry=0x7f96e649f4c7 "rbd", method=method@entry=0x7f96e64e42e7 "mirror_mode_get", inbl=..., outbl=...) at librados/librados.cc:1273
#5  0x00007f96e638ec7d in librbd::cls_client::mirror_mode_get (ioctx=ioctx@entry=0x7f96efdfb710, mirror_mode=mirror_mode@entry=0x7f9636ffa21c) at cls/rbd/cls_rbd_client.cc:1042
#6  0x00007f96e623bf10 in librbd::mirror_mode_get (io_ctx=..., mirror_mode=mirror_mode@entry=0x7f9636ffa3dc) at librbd/internal.cc:3445
#7  0x00007f96e61d471a in rbd::mirror::PoolWatcher::refresh (this=this@entry=0x7f96efdfb710, image_ids=image_ids@entry=0x7f9636ffa680) at tools/rbd_mirror/PoolWatcher.cc:90
#8  0x00007f96e61d54df in rbd::mirror::PoolWatcher::refresh_images (this=0x7f96efdfb710, reschedule=<optimized out>) at tools/rbd_mirror/PoolWatcher.cc:65
#9  0x00007f96e61b0c9a in operator() (a0=<optimized out>, this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#10 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at include/Context.h:460
#11 0x00007f96e61aeb89 in Context::complete (this=0x7f954c00d530, r=<optimized out>) at include/Context.h:64
#12 0x00007f96e63ccd24 in SafeTimer::timer_thread (this=0x7f96efdfb730) at common/Timer.cc:105
#13 0x00007f96e63ce75d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#14 0x00007f96db887dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f96da77073d in clone () from /lib64/libc.so.6
#0  0x00007f96db88b6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f96dc6c7ad1 in Wait (mutex=..., this=0x7f9596ffa120) at common/Cond.h:56
#2  librados::IoCtxImpl::operate_read (this=this@entry=0x7f96efe66fb0, oid=..., o=o@entry=0x7f9596ffa340, pbl=pbl@entry=0x7f9596ffa500, flags=flags@entry=0) at librados/IoCtxImpl.cc:725
#3  0x00007f96dc6d25d3 in librados::IoCtxImpl::exec (this=0x7f96efe66fb0, oid=..., cls=cls@entry=0x7f96e649f4c7 "rbd", method=method@entry=0x7f96e64e42c7 "mirror_uuid_get", inbl=..., outbl=...) at librados/IoCtxImpl.cc:1135
#4  0x00007f96dc681a74 in librados::IoCtx::exec (this=this@entry=0x7f96efe2d3f8, oid="rbd_mirroring", cls=cls@entry=0x7f96e649f4c7 "rbd", method=method@entry=0x7f96e64e42c7 "mirror_uuid_get", inbl=..., outbl=...) at librados/librados.cc:1273
#5  0x00007f96e638e8dd in librbd::cls_client::mirror_uuid_get (ioctx=ioctx@entry=0x7f96efe2d3f8, uuid=uuid@entry=0x7f9596ffa650) at cls/rbd/cls_rbd_client.cc:1010
Python Exception <type 'exceptions.ValueError'> Cannot find type const rbd::mirror::Replayer::ImageIds::_Rep_type: 
#6  0x00007f96e61ac49f in rbd::mirror::Replayer::set_sources (this=this@entry=0x7f96efe2d2d0, image_ids=std::set with 4 elements) at tools/rbd_mirror/Replayer.cc:631
#7  0x00007f96e61adc47 in rbd::mirror::Replayer::run (this=0x7f96efe2d2d0) at tools/rbd_mirror/Replayer.cc:453
#8  0x00007f96e61b15fd in rbd::mirror::Replayer::ReplayerThread::entry (this=<optimized out>) at tools/rbd_mirror/Replayer.h:125
#9  0x00007f96db887dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f96da77073d in clone () from /lib64/libc.so.6
#0  0x00007f96db88e1bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f96db889d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007f96db889c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f96e63c5458 in Mutex::Lock (this=this@entry=0x7f96efdf5ad8, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007f96e61a6767 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5  rbd::mirror::Replayer::is_blacklisted (this=0x7f96efdf5ab0) at tools/rbd_mirror/Replayer.cc:263
Python Exception <type 'exceptions.ValueError'> Cannot find type const rbd::mirror::Mirror::PoolPeers::_Rep_type: 
#6  0x00007f96e61a218b in rbd::mirror::Mirror::update_replayers (this=this@entry=0x7f96efdbcbe0, pool_peers=std::map with 3 elements) at tools/rbd_mirror/Mirror.cc:368
#7  0x00007f96e61a2cf6 in rbd::mirror::Mirror::run (this=0x7f96efdbcbe0) at tools/rbd_mirror/Mirror.cc:237
#8  0x00007f96e619a592 in main (argc=<optimized out>, argv=0x7ffe3e072c68) at tools/rbd_mirror/main.cc:74
Actions

Also available in: Atom PDF