Project

General

Profile

Actions

Bug #45716

closed

[rbd-mirror] image replayer stop might race with remove and instace replayer shut down

Added by Jason Dillaman almost 4 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
nautilus,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/jdillaman-2020-04-09_09:42:22-rbd-wip-jd-testing-distro-basic-smithi/4938679/teuthology.log
http://qa-proxy.ceph.com/teuthology/jdillaman-2020-04-09_09:42:22-rbd-wip-jd-testing-distro-basic-smithi/4938684/teuthology.log

A notification to remove an image was received:

  -199> 2020-04-11T01:06:43.872+0000 7f5c9f274700 10 rbd::mirror::InstanceReplayer: 0x5601b046de00 remove_peer_image: global_image_id=cda5e6c3-dc44-445a-b2db-6bcb8717a165, peer_mirror_uuid=5c6a4b9c-fb17-4d0e-97ff-8b6996184ee9
  -198> 2020-04-11T01:06:43.872+0000 7f5c9f274700 10 rbd::mirror::ImageReplayer: 0x5601b4a7db80 [3/cda5e6c3-dc44-445a-b2db-6bcb8717a165] stop: on_finish=0x5601b46a1060, manual=0, desc=
  -197> 2020-04-11T01:06:43.872+0000 7f5c9f274700 10 rbd::mirror::ImageReplayer: 0x5601b4a7db80 [3/cda5e6c3-dc44-445a-b2db-6bcb8717a165] stop: canceling start
  -196> 2020-04-11T01:06:43.872+0000 7f5c9f274700 10 rbd::mirror::ImageReplayer: 0x5601b4a7db80 [3/cda5e6c3-dc44-445a-b2db-6bcb8717a165] stop: canceling bootstrap

Followed shortly be a SIGTERM from the thrasher test which attempted a second stop request (which failed):

   -17> 2020-04-11T01:06:43.921+0000 7f5caff34680 10 rbd::mirror::ImageReplayer: 0x5601b4a7db80 [3/cda5e6c3-dc44-445a-b2db-6bcb8717a165] stop: on_finish=0x5601b04095a0, manual=1, desc=
   -16> 2020-04-11T01:06:43.921+0000 7f5c9f274700 10 rbd::mirror::NamespaceReplayer: 0x5601b1d981a0 handle_stop_instance_replayer: r=-22
   -15> 2020-04-11T01:06:43.921+0000 7f5c9f274700 -1 rbd::mirror::NamespaceReplayer: 0x5601b1d981a0 handle_stop_instance_replayer: error stopping instance replayer: (22) Invalid argument
   -14> 2020-04-11T01:06:43.921+0000 7f5c9f274700 10 rbd::mirror::NamespaceReplayer: 0x5601b1d981a0 shut_down_instance_watcher:

However, the NamespaceReplayer ignored the error and continued to shut down the InstanceWatcher while it still had registered callbacks from the ImageReplayer that was shutting down:

    -1> 2020-04-11T01:06:43.927+0000 7f5c9f274700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-418-g83b5036/rpm/el8/BUILD/ceph-16.0.0-418-g83b5036/src/tools/rbd_mirror/InstanceWatcher.cc: In function 'rbd::mirror::InstanceWatcher<ImageCtxT>::~InstanceWatcher() [with ImageCtxT = librbd::ImageCtx]' thread 7f5c9f274700 time 2020-04-11T01:06:43.927299+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-418-g83b5036/rpm/el8/BUILD/ceph-16.0.0-418-g83b5036/src/tools/rbd_mirror/InstanceWatcher.cc: 340: FAILED ceph_assert(m_requests.empty())


 ceph version 16.0.0-418-g83b5036 (83b50362f2e3cb2eb00db134ab87c51b5452223e) octopus (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f5ca69c4030]
 2: (()+0x27b24a) [0x7f5ca69c424a]
 3: (rbd::mirror::InstanceWatcher<librbd::ImageCtx>::~InstanceWatcher()+0x145) [0x5601aef1fe45]
 4: (rbd::mirror::InstanceWatcher<librbd::ImageCtx>::~InstanceWatcher()+0xd) [0x5601aef1fe8d]
 5: (rbd::mirror::NamespaceReplayer<librbd::ImageCtx>::handle_shut_down_instance_watcher(int)+0x85) [0x5601aeef8d55]
 6: (ThreadPool::PointerWQ<Context>::_void_process(void*, ThreadPool::TPHandle&)+0x148) [0x5601aeeb8458]
 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xe64) [0x7f5ca6ab0ea4]
 8: (ThreadPool::WorkThread::entry()+0x15) [0x7f5ca6ab1705]
 9: (()+0x82de) [0x7f5ca5e002de]
 10: (clone()+0x43) [0x7f5ca436f133]


Related issues 3 (0 open3 closed)

Copied from rbd - Bug #45072: [rbd-mirror] image replayer stop might race with remove and instace replayer shut downResolvedJason Dillaman

Actions
Copied to rbd - Backport #45763: octopus: [rbd-mirror] image replayer stop might race with remove and instace replayer shut downResolvedNathan CutlerActions
Copied to rbd - Backport #45764: nautilus: [rbd-mirror] image replayer stop might race with remove and instace replayer shut downResolvedMykola GolubActions
Actions #1

Updated by Jason Dillaman almost 4 years ago

  • Copied from Bug #45072: [rbd-mirror] image replayer stop might race with remove and instace replayer shut down added
Actions #2

Updated by Jason Dillaman almost 4 years ago

  • Status changed from New to Pending Backport
  • Pull request ID changed from 34615 to 34931

This tracker ticket covers additional fixes related to #45072

Actions #3

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45763: octopus: [rbd-mirror] image replayer stop might race with remove and instace replayer shut down added
Actions #4

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45764: nautilus: [rbd-mirror] image replayer stop might race with remove and instace replayer shut down added
Actions #5

Updated by Loïc Dachary almost 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF