Project

General

Profile

Actions

Bug #56154

closed

[rbd-mirror] resume pending snapshot replayer shut down when an error is encountered

Added by Ilya Dryomov almost 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If a shutdown is requested by a higher layer and Replayer::shut_down() pends it on completion of snapshot replay, rbd-mirror daemon hangs if replayer encounters an error in the interim. This is particularly likely in case of blocklisting: a higher layer may detect that client got blocklisted and request a shutdown but then replayer actually encounters the blocklist as well, e.g. in Replayer::handle_copy_image():

2022-08-27T17:37:22.894-0400 7f98895f5700  5 rbd::mirror::PoolWatcher: 0x55ed1fc10b40 handle_rewatch_complete: r=-108
2022-08-27T17:37:22.894-0400 7f98895f5700  0 rbd::mirror::PoolWatcher: 0x55ed1fc10b40 handle_rewatch_complete: detected client is blocklisted
2022-08-27T17:37:22.906-0400 7f9884dec700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image_progress: object_number=429, object_count=512
2022-08-27T17:37:23.718-0400 7f98845eb700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_update_non_primary_snapshot: r=0
2022-08-27T17:37:23.846-0400 7f9884dec700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image_progress: object_number=430, object_count=512
2022-08-27T17:37:23.846-0400 7f9884dec700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 update_non_primary_snapshot: 
2022-08-27T17:37:24.610-0400 7f9888df4700 10 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x55ed1fc9a0c0 finish: r=-110
2022-08-27T17:37:24.610-0400 7f9888df4700 -1 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x55ed1fc9a0c0 finish: resending after timeout
2022-08-27T17:37:24.610-0400 7f9888df4700 10 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x55ed1fc9a0c0 send
2022-08-27T17:37:24.610-0400 7f9888df4700 10 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x55ed1fc9a0c0 send: sending to 4249
2022-08-27T17:37:24.610-0400 7f98845eb700  5 librbd::Watcher: 0x55ed1e6e4400 notifications_blocked: blocked=0
2022-08-27T17:37:24.610-0400 7f98845eb700 10 librbd::Watcher::C_NotifyAck 0x55ed1e38a6e0 C_NotifyAck: id=128849018930, handle=94476928963072
2022-08-27T17:37:24.610-0400 7f98845eb700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 handle_notify: notify_id=128849018930, handle=94476928963072, notifier_id=4249
2022-08-27T17:37:24.610-0400 7f98845eb700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 handle_payload: sync_start: instance_id=4249, request_id=5
2022-08-27T17:37:24.610-0400 7f98845eb700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 prepare_request: instance_id=4249, request_id=5
2022-08-27T17:37:24.610-0400 7f98845eb700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 prepare_request: duplicate for in-progress request
2022-08-27T17:37:24.890-0400 7f9885dee700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 handle_task: 
2022-08-27T17:37:24.890-0400 7f9885dee700 10 rbd::mirror::MirrorStatusUpdater 0x55ed1e67ef20 handle_timer_task: 
2022-08-27T17:37:24.890-0400 7f9885dee700 10 rbd::mirror::MirrorStatusUpdater 0x55ed1e67ef20 schedule_timer_task: 
2022-08-27T17:37:24.890-0400 7f9885dee700 10 rbd::mirror::MirrorStatusUpdater 0x55ed1e67ef20 queue_update_task: 
2022-08-27T17:37:24.890-0400 7f98895f5700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 get_mirror_uuid: 
2022-08-27T17:37:24.890-0400 7f98895f5700 10 rbd::mirror::MirrorStatusUpdater 0x55ed1e67ef20 update_task: 
2022-08-27T17:37:24.890-0400 7f987cddc700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 handle_get_mirror_uuid: r=-108
2022-08-27T17:37:24.890-0400 7f987cddc700 -1 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 handle_get_mirror_uuid: failed to retrieve remote mirror uuid: (108) Cannot send after transport endpoint shutdown
2022-08-27T17:37:24.890-0400 7f987cddc700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 handle_get_mirror_uuid: remote_mirror_uuid=
2022-08-27T17:37:24.890-0400 7f987cddc700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 mirror_peer_ping: 
2022-08-27T17:37:24.890-0400 7f987c5db700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 handle_mirror_peer_ping: r=-108
2022-08-27T17:37:24.890-0400 7f987c5db700 -1 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 handle_mirror_peer_ping: failed to ping remote mirror peer: (108) Cannot send after transport endpoint shutdown
2022-08-27T17:37:24.890-0400 7f987c5db700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 mirror_peer_list: 
2022-08-27T17:37:24.890-0400 7f987cddc700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 handle_mirror_peer_list: r=-108
2022-08-27T17:37:24.890-0400 7f987cddc700 -1 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 handle_mirror_peer_list: failed to retrieve mirror peers: (108) Cannot send after transport endpoint shutdown
2022-08-27T17:37:24.890-0400 7f987cddc700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 handle_mirror_peer_list: remote_mirror_peer_uuid=
2022-08-27T17:37:24.890-0400 7f987cddc700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 notify_listener: 
2022-08-27T17:37:24.890-0400 7f987cddc700  5 rbd::mirror::PoolReplayer: 0x55ed192f8000 handle_remote_pool_meta_updated: remote_pool_meta=mirror_uuid=, mirror_peer_uuid=
2022-08-27T17:37:24.890-0400 7f987cddc700 -1 rbd::mirror::PoolReplayer: 0x55ed192f8000 handle_remote_pool_meta_updated: remote pool metadata updated unexpectedly
2022-08-27T17:37:24.890-0400 7f987cddc700 10 rbd::mirror::RemotePoolPoller: 0x55ed1e2a64e0 schedule_task: 
2022-08-27T17:37:24.890-0400 7f98785d3700 20 rbd::mirror::PoolReplayer: 0x55ed192f8000 update_namespace_replayers: 
2022-08-27T17:37:24.890-0400 7f98785d3700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 list_instances: 
2022-08-27T17:37:24.890-0400 7f98785d3700 20 rbd::mirror::Instances: 0x55ed1ecca300 list: 
2022-08-27T17:37:24.890-0400 7f98785d3700 20 rbd::mirror::PoolReplayer: 0x55ed192f8000 update_namespace_replayers: 
2022-08-27T17:37:24.890-0400 7f98785d3700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 list_instances: 
2022-08-27T17:37:24.890-0400 7f98785d3700 20 rbd::mirror::Instances: 0x55ed1ecca300 list: 
2022-08-27T17:37:24.938-0400 7f9885dee700 10 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 handle_timer_task: 
2022-08-27T17:37:24.938-0400 7f9885dee700 10 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 schedule_timer_task: 
2022-08-27T17:37:24.938-0400 7f9885dee700 10 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 queue_update_task: 
2022-08-27T17:37:24.938-0400 7f9888df4700 10 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 update_task: 
2022-08-27T17:37:24.938-0400 7f987c5db700 10 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 handle_update_task: 
2022-08-27T17:37:24.938-0400 7f987c5db700 -1 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 handle_update_task: failed to update mirror image statuses: (108) Cannot send after transport endpoint shutdown
2022-08-27T17:37:24.986-0400 7f9885dee700 10 rbd::mirror::InstanceReplayer: 0x55ed1ecc8b60 schedule_image_state_check_task: scheduling image state check after 30 sec (task 0x55ed192908d0)
2022-08-27T17:37:24.986-0400 7f9885dee700 10 rbd::mirror::InstanceReplayer: 0x55ed1ecc8b60 queue_start_image_replayers: 
2022-08-27T17:37:24.986-0400 7f98895f5700 10 rbd::mirror::InstanceReplayer: 0x55ed1ecc8b60 start_image_replayers: 
2022-08-27T17:37:24.986-0400 7f98895f5700 20 rbd::mirror::ServiceDaemon: 0x55ed192d0900 add_or_update_attribute: pool_id=2, key=image_assigned_count, value=1
2022-08-27T17:37:24.986-0400 7f98895f5700 20 rbd::mirror::ServiceDaemon: 0x55ed192d0900 add_or_update_attribute: pool_id=2, key=image_warning_count, value=0
2022-08-27T17:37:24.986-0400 7f98895f5700 20 rbd::mirror::ServiceDaemon: 0x55ed192d0900 add_or_update_attribute: pool_id=2, key=image_error_count, value=0
2022-08-27T17:37:25.170-0400 7f988ce23900 20 rbd::mirror::ClusterWatcher:0x55ed192d2c60 refresh_pools: enter
2022-08-27T17:37:25.170-0400 7f988ce23900 10 rbd::mirror::ClusterWatcher:0x55ed192d2c60 read_pool_peers: mirroring is disabled for pool .mgr
2022-08-27T17:37:25.298-0400 7f9885dee700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 execute_timer_task: 
2022-08-27T17:37:25.298-0400 7f9885dee700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 is_leader: 1
2022-08-27T17:37:25.298-0400 7f9885dee700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 notify_heartbeat: 
2022-08-27T17:37:25.298-0400 7f9885dee700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 is_leader: 1
2022-08-27T17:37:25.298-0400 7f98845eb700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 handle_notify: notify_id=128849019096, handle=94476928961792, notifier_id=4249
2022-08-27T17:37:25.298-0400 7f98845eb700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 handle_notify: our own notification, ignoring
2022-08-27T17:37:25.298-0400 7f98845eb700  5 librbd::Watcher: 0x55ed1e6e4880 notifications_blocked: blocked=0
2022-08-27T17:37:25.298-0400 7f98845eb700 10 librbd::Watcher::C_NotifyAck 0x55ed1e591a40 C_NotifyAck: id=128849019096, handle=94476928961792
2022-08-27T17:37:25.298-0400 7f98845eb700 10 librbd::Watcher::C_NotifyAck 0x55ed1e591a40 finish: r=0
2022-08-27T17:37:25.302-0400 7f98895f5700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 handle_notify_heartbeat: r=0
2022-08-27T17:37:25.302-0400 7f98895f5700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 is_leader: 1
2022-08-27T17:37:25.302-0400 7f98895f5700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 handle_notify_heartbeat: 1 acks received, 0 timed out
2022-08-27T17:37:25.302-0400 7f98895f5700 10 rbd::mirror::Instances: 0x55ed1ecca300 acked: instance_ids=[4249]
2022-08-27T17:37:25.302-0400 7f98895f5700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 schedule_timer_task: scheduling heartbeat after 5 sec (task 0x55ed1e37d260)
2022-08-27T17:37:25.302-0400 7f98895f5700  5 rbd::mirror::Instances: 0x55ed1ecca300 handle_acked: instance_ids=[4249]
2022-08-27T17:37:25.438-0400 7f9884dec700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image_progress: object_number=431, object_count=512
2022-08-27T17:37:25.438-0400 7f98845eb700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_update_non_primary_snapshot: r=0
2022-08-27T17:37:25.518-0400 7f98845eb700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image_progress: object_number=432, object_count=512
2022-08-27T17:37:25.518-0400 7f98845eb700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 update_non_primary_snapshot: 
2022-08-27T17:37:25.518-0400 7f988ce23900 10 rbd::mirror::ClusterWatcher:0x55ed192d2c60 resolve_peer_site_config_keys: retrieving config-key: pool_id=2, pool_name=data, peer_uuid=b94ab5ee-b81c-4890-bd00-801a7b8df174
2022-08-27T17:37:25.518-0400 7f9884dec700 10 rbd::mirror::MirrorStatusUpdater 0x55ed1e67ef20 handle_update_task: 
2022-08-27T17:37:25.518-0400 7f988ce23900 10 rbd::mirror::ClusterWatcher:0x55ed192d2c60 read_site_name: 
2022-08-27T17:37:25.522-0400 7f988ce23900 10 librbd::ManagedLock: 0x55ed1e295a00 shut_down: 
2022-08-27T17:37:25.522-0400 7f988ce23900 10 librbd::ManagedLock: 0x55ed1e295a00 send_shutdown: 
2022-08-27T17:37:25.522-0400 7f988ce23900 20 rbd::mirror::Mirror: 0x55ed19368f20 update_pool_replayers: enter
2022-08-27T17:37:25.522-0400 7f988ce23900 -1 rbd::mirror::Mirror: 0x55ed19368f20 update_pool_replayers: restarting blocklisted pool replayer for uuid: b94ab5ee-b81c-4890-bd00-801a7b8df174 cluster: site-a client: client.rbd-mirror-peer
2022-08-27T17:37:25.522-0400 7f988ce23900 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 shut_down: 
2022-08-27T17:37:25.522-0400 7f988ce23900 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 cancel_timer_task: 0x55ed1e37d260
2022-08-27T17:37:25.522-0400 7f988ce23900 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 shut_down_leader_lock: 
2022-08-27T17:37:25.522-0400 7f9888df4700 10 librbd::ManagedLock: 0x55ed1e295a00 send_shutdown_release: 
2022-08-27T17:37:25.522-0400 7f9888df4700 10 librbd::Watcher: 0x55ed192f9500 unregister_watch: 
2022-08-27T17:37:25.522-0400 7f9888df4700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 handle_pre_release_leader_lock: 
2022-08-27T17:37:25.522-0400 7f9888df4700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 notify_listener: 
2022-08-27T17:37:25.522-0400 7f9888df4700 10 rbd::mirror::LeaderWatcher: 0x55ed1e6e4880 is_leader: 0
2022-08-27T17:37:25.522-0400 7f9888df4700 20 rbd::mirror::PoolReplayer: 0x55ed192f8000 handle_pre_release_leader: 
2022-08-27T17:37:25.522-0400 7f9888df4700 10 rbd::mirror::PoolReplayer: 0x55ed192f8000 operator(): handle_pre_release_leader
2022-08-27T17:37:25.522-0400 7f9888df4700 20 rbd::mirror::ServiceDaemon: 0x55ed192d0900 remove_attribute: pool_id=2, key=leader
2022-08-27T17:37:25.522-0400 7f9888df4700 10 rbd::mirror::NamespaceReplayer: 0x55ed192b4d20 handle_release_leader: 
2022-08-27T17:37:25.522-0400 7f9888df4700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 handle_release_leader: 
2022-08-27T17:37:25.522-0400 7f9888df4700 20 rbd::mirror::Throttler:: 0x55ed1e3919e0 drain: ns=
2022-08-27T17:37:25.522-0400 7f9888df4700 20 rbd::mirror::Throttler:: 0x55ed1e3919e0 drain: inflight_op ,1031ea7c6103
2022-08-27T17:37:25.522-0400 7f9888df4700 10 rbd::mirror::NamespaceReplayer: 0x55ed192b4d20 shut_down_image_deleter: 
2022-08-27T17:37:25.522-0400 7f9888df4700 10 rbd::mirror::ImageDeleter: 0x55ed1e1ba500 shut_down: 
2022-08-27T17:37:25.522-0400 7f9888df4700 20 rbd::mirror::Throttler:: 0x55ed1e08c5a0 drain: ns=
2022-08-27T17:37:25.522-0400 7f9888df4700 10 rbd::mirror::ImageDeleter: 0x55ed1e1ba500 shut_down_trash_watcher: 
2022-08-27T17:37:25.522-0400 7f9888df4700  5 rbd::mirror::image_deleter::TrashWatcher: 0x55ed192f9500 shut_down: 
2022-08-27T17:37:25.522-0400 7f9888df4700  5 rbd::mirror::image_deleter::TrashWatcher: 0x55ed192f9500 unregister_watcher: 
2022-08-27T17:37:25.986-0400 7f9885dee700 20 rbd::mirror::ServiceDaemon: 0x55ed192d0900 update_status: 
2022-08-27T17:37:26.714-0400 7f9884dec700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_update_non_primary_snapshot: r=0
2022-08-27T17:37:26.714-0400 7f9884dec700  5 rbd::mirror::image_deleter::TrashWatcher: 0x55ed192f9500 handle_unregister_watcher: unregister_watcher: r=0
2022-08-27T17:37:26.714-0400 7f9884dec700 10 rbd::mirror::ImageDeleter: 0x55ed1e1ba500 cancel_retry_timer: 
2022-08-27T17:37:26.714-0400 7f9884dec700 20 rbd::mirror::Throttler:: 0x55ed1e08c5a0 drain: ns=
2022-08-27T17:37:26.714-0400 7f98895f5700 10 rbd::mirror::NamespaceReplayer: 0x55ed192b4d20 handle_shut_down_image_deleter: r=0
2022-08-27T17:37:26.714-0400 7f98895f5700 10 librbd::Watcher: 0x55ed1fc11680 unregister_watch: 
2022-08-27T17:37:26.714-0400 7f98895f5700 10 rbd::mirror::NamespaceReplayer: 0x55ed192b4d20 shut_down_pool_watchers: 
2022-08-27T17:37:26.714-0400 7f98895f5700  5 rbd::mirror::PoolWatcher: 0x55ed1fc11440 shut_down: 
2022-08-27T17:37:26.714-0400 7f98895f5700  5 rbd::mirror::PoolWatcher: 0x55ed1fc11440 unregister_watcher: 
2022-08-27T17:37:26.714-0400 7f98895f5700  5 rbd::mirror::PoolWatcher: 0x55ed1fc10b40 shut_down: 
2022-08-27T17:37:26.714-0400 7f98895f5700  5 rbd::mirror::PoolWatcher: 0x55ed1fc10b40 unregister_watcher: 
2022-08-27T17:37:26.714-0400 7f98895f5700  5 rbd::mirror::PoolWatcher: 0x55ed1fc10b40 operator(): unregister_watcher: r=0
2022-08-27T17:37:27.602-0400 7f9884dec700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image_progress: object_number=433, object_count=512
2022-08-27T17:37:27.602-0400 7f9884dec700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 update_non_primary_snapshot: 
2022-08-27T17:37:27.666-0400 7f98845eb700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image_progress: object_number=434, object_count=512
2022-08-27T17:37:27.666-0400 7f9884dec700  5 rbd::mirror::PoolWatcher: 0x55ed1fc11440 operator(): unregister_watcher: r=0
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::NamespaceReplayer: 0x55ed192b4d20 handle_shut_down_pool_watchers: r=0
2022-08-27T17:37:27.666-0400 7f9888df4700  5 rbd::mirror::NamespaceReplayer: 0x55ed192b4d20 shut_down_image_map: 
2022-08-27T17:37:27.666-0400 7f9888df4700 20 rbd::mirror::ImageMap: 0x55ed1ecc8820 shut_down: 
2022-08-27T17:37:27.666-0400 7f9888df4700  5 rbd::mirror::NamespaceReplayer: 0x55ed192b4d20 handle_shut_down_image_map: r=0
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::InstanceReplayer: 0x55ed1ecc8b60 release_all: 
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::InstanceReplayer: 0x55ed1ecc8b60 stop_image_replayer: 0x55ed1e4d7200 global_image_id=8a5b8be0-43ec-47df-9ee3-d97948e055c6, on_finish=0x55ed1e6063e0
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] stop: on_finish=0x55ed1e2a88c0, manual=0, restart=0
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] stop: interrupting replay
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] on_stop_journal_replay: 
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] cancel_update_mirror_image_replay_status: 
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] set_state_description: r=0, desc=
2022-08-27T17:37:27.666-0400 7f9888df4700 15 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] update_mirror_image_status: force=1, state=--
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] shut_down: r=0
2022-08-27T17:37:27.666-0400 7f9888df4700 15 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] shut_down: waiting for in-flight operations to complete
2022-08-27T17:37:27.666-0400 7f9888df4700 15 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] set_mirror_image_status_update: force=1, state=--
2022-08-27T17:37:27.666-0400 7f9888df4700 15 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] set_mirror_image_status_update: status={state=up+stopping_replay, description=stopping replay, last_update=0.000000]}
2022-08-27T17:37:27.666-0400 7f9888df4700 15 rbd::mirror::MirrorStatusUpdater 0x55ed1e67ef20 set_mirror_image_status: global_image_id=8a5b8be0-43ec-47df-9ee3-d97948e055c6, mirror_image_site_status={state=up+stopping_replay, description=stopping replay, last_update=0.000000]}
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::MirrorStatusUpdater 0x55ed1e67ef20 queue_update_task: 
2022-08-27T17:37:27.666-0400 7f9888df4700 15 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 set_mirror_image_status: global_image_id=8a5b8be0-43ec-47df-9ee3-d97948e055c6, mirror_image_site_status={state=up+stopping_replay, description=stopping replay, last_update=0.000000]}
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 queue_update_task: 
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::ImageReplayer: 0x55ed1e4d7200 [2/8a5b8be0-43ec-47df-9ee3-d97948e055c6] shut_down: r=0
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::MirrorStatusUpdater 0x55ed1e67ef20 update_task: 
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 update_task: 
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 shut_down: 
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 cancel_sync_request: sync_id=1031ea7c6103
2022-08-27T17:37:27.666-0400 7f9888df4700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 shut_down: shut down pending on completion of snapshot replay
2022-08-27T17:37:27.666-0400 7f987cddc700 10 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 handle_update_task: 
2022-08-27T17:37:27.666-0400 7f987cddc700 -1 rbd::mirror::MirrorStatusUpdater 0x55ed19369760 handle_update_task: failed to update mirror image statuses: (108) Cannot send after transport endpoint shutdown
2022-08-27T17:37:27.714-0400 7f9884dec700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image_progress: object_number=435, object_count=512
2022-08-27T17:37:27.838-0400 7f98845eb700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_update_non_primary_snapshot: r=0
2022-08-27T17:37:28.278-0400 7f9884dec700 10 rbd::mirror::MirrorStatusUpdater 0x55ed1e67ef20 handle_update_task: 
2022-08-27T17:37:28.426-0400 7f98845eb700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image_progress: object_number=436, object_count=512
2022-08-27T17:37:28.426-0400 7f98845eb700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 update_non_primary_snapshot: 
2022-08-27T17:37:28.426-0400 7f98845eb700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image: r=-108
2022-08-27T17:37:28.426-0400 7f98845eb700 -1 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_copy_image: failed to copy remote image to local image: (108) Cannot send after transport endpoint shutdown
2022-08-27T17:37:28.426-0400 7f98845eb700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 notify_sync_complete: sync_id=1031ea7c6103
2022-08-27T17:37:28.426-0400 7f98895f5700 10 rbd::mirror::InstanceWatcher: C_SyncRequest: 0x55ed1e38b5e0 finish: r=0
2022-08-27T17:37:28.426-0400 7f98895f5700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 handle_notify_sync_complete: sync_id=1031ea7c6103, r=0
2022-08-27T17:37:28.426-0400 7f98895f5700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 complete_request: instance_id=4249, request_id=5
2022-08-27T17:37:28.426-0400 7f98895f5700 10 librbd::Watcher::C_NotifyAck 0x55ed1e38a6e0 finish: r=0
2022-08-27T17:37:28.426-0400 7f9888df4700 10 rbd::mirror::InstanceWatcher: C_NotifyInstanceRequest: 0x55ed1fc9a0c0 finish: r=0
2022-08-27T17:37:28.430-0400 7f9888df4700 10 rbd::mirror::InstanceWatcher: 0x55ed1e6e4400 operator(): finish: sync_id=1031ea7c6103, r=0
2022-08-27T17:37:28.466-0400 7f9884dec700 10 rbd::mirror::image_replayer::snapshot::Replayer: 0x55ed1e6e2480 handle_update_non_primary_snapshot: r=0

At this point rbd-mirror daemon hangs forever, because Mirror::update_pool_replayers() is blocking on shutdown with Mirror::m_lock held. The shutdown never completes because Replayer::handle_replay_complete() makes no attempt to resume a pending shutdown like Replayer::is_replay_interrupted() does.


Related issues 2 (0 open2 closed)

Copied to rbd - Backport #57353: pacific: [rbd-mirror] resume pending snapshot replayer shut down when an error is encounteredResolvedIlya DryomovActions
Copied to rbd - Backport #57354: quincy: [rbd-mirror] resume pending snapshot replayer shut down when an error is encounteredResolvedIlya DryomovActions
Actions #1

Updated by Ilya Dryomov over 1 year ago

  • Description updated (diff)
  • Backport set to pacific,quincy
Actions #2

Updated by Ilya Dryomov over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 47856
Actions #3

Updated by Ilya Dryomov over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Backport Bot over 1 year ago

  • Copied to Backport #57353: pacific: [rbd-mirror] resume pending snapshot replayer shut down when an error is encountered added
Actions #5

Updated by Backport Bot over 1 year ago

  • Copied to Backport #57354: quincy: [rbd-mirror] resume pending snapshot replayer shut down when an error is encountered added
Actions #6

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #7

Updated by Backport Bot about 1 year ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF