Project

General

Profile

Bug #45714

Updated by Jason Dillaman almost 4 years ago

If The rbd-mirror attempts to shut down thrasher test case got stuck waiting for quiesce / unquiesce while releasing its journal replayer, it might shut down after a op has started but before lock -- the OpFinish event is processed. In that case, the op will never finish point where it was blacklisted by peers and eventually aborted by the wait-for-events state will never advance. This reverts to the pre-Octopus behavior prior to rbd-mirror being refactored. trasher for not completing in a timely fashion. 

 <pre> 
 2020-05-22T17:36:33.055+0000 7f3c5b3a8700 10 librbd::ImageWatcher: 0x5564c65f8240 notify_quiesce: request_id=3 
 2020-05-22T17:36:33.055+0000 7f3c5b3a8700 10 librbd::ImageWatcher: 0x5564c65f8240 notify_quiesce: async_request_id=[4367,93891297478144,3] attempts=10 
 2020-05-22T17:36:33.055+0000 7f3c5b3a8700 20 librbd::watcher::Notifier: 0x5564c65f82c0 notify: pending=1 
 2020-05-22T17:36:33.055+0000 7f3c743da700    5 librbd::Watcher: 0x5564c96cf680 notifications_blocked: blocked=0 
 2020-05-22T17:36:33.055+0000 7f3c743da700 10 librbd::Watcher::C_NotifyAck 0x5564c1b0fbd0 C_NotifyAck: id=1258425422260, handle=93891333330944 
 2020-05-22T17:36:33.055+0000 7f3c743da700 10 librbd::ImageWatcher: 0x5564c96cf680 quiesce request: [4367,93891333330944,3] 
 2020-05-22T17:36:33.055+0000 7f3c743da700 20 librbd::ImageState: 0x5564c1f94580 notify_quiesce 
 2020-05-22T17:36:33.055+0000 7f3c5b3a8700 10 librbd::ImageWatcher: 0x5564c8fdb2b0 C_ResponseMessage: r=0 
 2020-05-22T17:36:33.055+0000 7f3c5b3a8700 10 librbd::Watcher::C_NotifyAck 0x5564c1b0fbd0 finish: r=0 
 2020-05-22T17:36:33.055+0000 7f3c7fbf1700 15 rbd::mirror::image_replayer::util::decode_client_meta: 
 2020-05-22T17:36:33.055+0000 7f3c7fbf1700 15 rbd::mirror::image_replayer::util::decode_client_meta: client found: client_meta=[image_id=110f23aef24a, state=Replaying, sync_object_count=120, sync_points=[], snap_seqs=[[local_snap_seq=458, peer_snap_seq618]] 
 2020-05-22T17:36:33.055+0000 7f3c7fbf1700    5 rbd::mirror::image_replayer::journal::Replayer: 0x5564c5a2e700 validate_remote_client_state: image_id=110f23aef24a, remote_client_meta.image_id=110f23aef24a, remote_client.state=connected 
 2020-05-22T17:36:33.055+0000 7f3c743da700    5 librbd::Watcher: 0x5564c65f8240 notifications_blocked: blocked=0 
 2020-05-22T17:36:33.055+0000 7f3c743da700 10 librbd::Watcher::C_NotifyAck 0x5564c9b29a90 C_NotifyAck: id=1258425419227, handle=93891297478144 
 2020-05-22T17:36:33.055+0000 7f3c743da700 10 librbd::ImageWatcher: 0x5564c65f8240 quiesce request: [4367,93891297478144,3] 

 ... snip ... 

 2020-05-22T17:36:37.648+0000 7f3c7fbf1700 15 rbd::mirror::ImageReplayer: 0x5564bed39900 [3/6171e4b9-eb13-49a4-a4c3-b3ae89c31304] unregister_admin_socket_hook: 
 2020-05-22T17:36:43.054+0000 7f3c43b79700 10 librbd::ImageWatcher: 0x5564c65f8240 quiesce request [4367,93891297478144,3] timed out 
 2020-05-22T17:36:43.054+0000 7f3c43b79700 20 librbd::ImageState: 0x5564c177d600 notify_unquiesce 
 2020-05-22T17:36:45.871+0000 7f3c43b79700 10 librbd::ImageWatcher: 0x5564be582b40 quiesce request [4367,93891187611648,3] timed out 
 2020-05-22T17:36:45.871+0000 7f3c43b79700 20 librbd::ImageState: 0x5564c8c47a80 notify_unquiesce 
 </pre> 

 http://qa-proxy.ceph.com/teuthology/jdillaman-2020-05-22_12:40:24-rbd-wip-jd-testing-distro-basic-smithi/5082994/teuthology.log

Back