Bug #45716
closed[rbd-mirror] image replayer stop might race with remove and instace replayer shut down
0%
Description
http://qa-proxy.ceph.com/teuthology/jdillaman-2020-04-09_09:42:22-rbd-wip-jd-testing-distro-basic-smithi/4938679/teuthology.log
http://qa-proxy.ceph.com/teuthology/jdillaman-2020-04-09_09:42:22-rbd-wip-jd-testing-distro-basic-smithi/4938684/teuthology.log
A notification to remove an image was received:
-199> 2020-04-11T01:06:43.872+0000 7f5c9f274700 10 rbd::mirror::InstanceReplayer: 0x5601b046de00 remove_peer_image: global_image_id=cda5e6c3-dc44-445a-b2db-6bcb8717a165, peer_mirror_uuid=5c6a4b9c-fb17-4d0e-97ff-8b6996184ee9 -198> 2020-04-11T01:06:43.872+0000 7f5c9f274700 10 rbd::mirror::ImageReplayer: 0x5601b4a7db80 [3/cda5e6c3-dc44-445a-b2db-6bcb8717a165] stop: on_finish=0x5601b46a1060, manual=0, desc= -197> 2020-04-11T01:06:43.872+0000 7f5c9f274700 10 rbd::mirror::ImageReplayer: 0x5601b4a7db80 [3/cda5e6c3-dc44-445a-b2db-6bcb8717a165] stop: canceling start -196> 2020-04-11T01:06:43.872+0000 7f5c9f274700 10 rbd::mirror::ImageReplayer: 0x5601b4a7db80 [3/cda5e6c3-dc44-445a-b2db-6bcb8717a165] stop: canceling bootstrap
Followed shortly be a SIGTERM from the thrasher test which attempted a second stop request (which failed):
-17> 2020-04-11T01:06:43.921+0000 7f5caff34680 10 rbd::mirror::ImageReplayer: 0x5601b4a7db80 [3/cda5e6c3-dc44-445a-b2db-6bcb8717a165] stop: on_finish=0x5601b04095a0, manual=1, desc= -16> 2020-04-11T01:06:43.921+0000 7f5c9f274700 10 rbd::mirror::NamespaceReplayer: 0x5601b1d981a0 handle_stop_instance_replayer: r=-22 -15> 2020-04-11T01:06:43.921+0000 7f5c9f274700 -1 rbd::mirror::NamespaceReplayer: 0x5601b1d981a0 handle_stop_instance_replayer: error stopping instance replayer: (22) Invalid argument -14> 2020-04-11T01:06:43.921+0000 7f5c9f274700 10 rbd::mirror::NamespaceReplayer: 0x5601b1d981a0 shut_down_instance_watcher:
However, the NamespaceReplayer ignored the error and continued to shut down the InstanceWatcher while it still had registered callbacks from the ImageReplayer that was shutting down:
-1> 2020-04-11T01:06:43.927+0000 7f5c9f274700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-418-g83b5036/rpm/el8/BUILD/ceph-16.0.0-418-g83b5036/src/tools/rbd_mirror/InstanceWatcher.cc: In function 'rbd::mirror::InstanceWatcher<ImageCtxT>::~InstanceWatcher() [with ImageCtxT = librbd::ImageCtx]' thread 7f5c9f274700 time 2020-04-11T01:06:43.927299+0000 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-418-g83b5036/rpm/el8/BUILD/ceph-16.0.0-418-g83b5036/src/tools/rbd_mirror/InstanceWatcher.cc: 340: FAILED ceph_assert(m_requests.empty())
ceph version 16.0.0-418-g83b5036 (83b50362f2e3cb2eb00db134ab87c51b5452223e) octopus (rc) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f5ca69c4030] 2: (()+0x27b24a) [0x7f5ca69c424a] 3: (rbd::mirror::InstanceWatcher<librbd::ImageCtx>::~InstanceWatcher()+0x145) [0x5601aef1fe45] 4: (rbd::mirror::InstanceWatcher<librbd::ImageCtx>::~InstanceWatcher()+0xd) [0x5601aef1fe8d] 5: (rbd::mirror::NamespaceReplayer<librbd::ImageCtx>::handle_shut_down_instance_watcher(int)+0x85) [0x5601aeef8d55] 6: (ThreadPool::PointerWQ<Context>::_void_process(void*, ThreadPool::TPHandle&)+0x148) [0x5601aeeb8458] 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xe64) [0x7f5ca6ab0ea4] 8: (ThreadPool::WorkThread::entry()+0x15) [0x7f5ca6ab1705] 9: (()+0x82de) [0x7f5ca5e002de] 10: (clone()+0x43) [0x7f5ca436f133]
Updated by Jason Dillaman almost 4 years ago
- Copied from Bug #45072: [rbd-mirror] image replayer stop might race with remove and instace replayer shut down added
Updated by Jason Dillaman almost 4 years ago
- Status changed from New to Pending Backport
- Pull request ID changed from 34615 to 34931
This tracker ticket covers additional fixes related to #45072
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45763: octopus: [rbd-mirror] image replayer stop might race with remove and instace replayer shut down added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45764: nautilus: [rbd-mirror] image replayer stop might race with remove and instace replayer shut down added
Updated by Loïc Dachary almost 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".