Bug #36074
closed[rbd-mirror] sporadic deep-copy failure possible during fsx stress test
0%
Description
If an image is being synced by an rbd-mirror daemon and is then blacklisted from the local cluster while it's deep-copying an object (specifically after it's written the last snapshot version but before it updates the object map), it might silently fail without passing the -ESHUTDOWN error code back to "ImageCopyRequest<I>::handle_object_copy". Without the failure error code, the sync point object position can get improperly updated so that the now the object map remains incorrect yet not flagged as invalid (due to the blacklisting).
Updated by Jason Dillaman over 5 years ago
2018-09-18 13:15:04.761 7feb857fa700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 ObjectCopyRequest: dst_oid=rbd_data.106d34a0242b.0000000000000000 2018-09-18 13:15:04.761 7feb857fa700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_src_object_extents: 2 src extents 2018-09-18 13:15:04.761 7feb857fa700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_list_snaps: src_oid=rbd_data.10659609c337.0000000000000000 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_list_snaps: r=0 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: start_src_snap_id=0, end_src_snap_id=97, clone_end_snap_id=97, end_dst_snap_id=122, diff=[0~4194304], end_size=4194304, exists=1 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 0~4194304, dst_object_offset=0, read: [0~4194304] 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: read/write op: 0~4194304 dst: 0 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 4194304~4194304, dst_object_offset=4194304, read: [] 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: start_src_snap_id=97, end_src_snap_id=171, clone_end_snap_id=0, end_dst_snap_id=125, diff=[0~4194304], end_size=0, exists=0 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: clearing truncate diff: [0~4194304] 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: extent 0~4194304 intersects truncation 0~4194304, inserting zero 0~4194304 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 0~4194304, dst_object_offset=0, read: [] 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 4194304~4194304, dst_object_offset=4194304, read: [] 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_read_object: src_snap_seq=97 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_read_object: read op: 0~4194304 2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_read_object: read rbd_data.10659609c337.0000000000000000 2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_read_object: r=0 2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 merge_write_ops: 2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_from_parent_ops: no parent 2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_read_from_parent: r=0 2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_zero_ops: 2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_zero_ops: src_snap_seq=97, end_size=4194304 2018-09-18 13:15:04.809 7febfdffb700 5 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_zero_ops: object DNE for snap_id: 125 2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: dst_snap_seq=0, dst_snaps=[] 2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: write op: 0~4194304 2018-09-18 13:15:06.317 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_write_object: r=0 2018-09-18 13:15:06.317 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: dst_snap_seq=122, dst_snaps=[122] 2018-09-18 13:15:06.317 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: remove op 2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_write_object: r=0 2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_update_object_map: dst_snap_id=122, object_state=1 2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::ObjectMap: 0x7feb78050410 aio_update: start=0, end=1, ->1 2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::object_map::UpdateRequest: 0x7febf00e6db0 update_object_map: ictx=0x7feb800236e0, oid=rbd_object_map.106d34a0242b.000000000000007a, [0,1) = ->1 2018-09-18 13:15:11.721 7fec1dffb700 20 librbd::object_map::UpdateRequest: 0x7febf00e6db0 handle_update_object_map: r=-108 2018-09-18 13:15:11.721 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_update_object_map: r=0 2018-09-18 13:15:11.721 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 finish: r=0
Updated by Jason Dillaman over 5 years ago
2018-09-18 15:01:14.577 7f1cb37fe700 10 librbd::object_map::RefreshRequest: 0x7f1ca8009770 send_load: oid=rbd_object_map.106d34a0242b.000000000000007a ... snip ... 2018-09-18 15:01:14.577 7f1cb37fe700 20 librbd::io::ObjectDispatch: 0x2401390 read: rbd_data.106d34a0242b.0000000000000000 0~8388608 2018-09-18 15:01:14.577 7f1cb37fe700 20 librbd::io::ObjectRequest: 0x7f1ca800f8f0 send: 2018-09-18 15:01:14.577 7f1cb37fe700 20 librbd::ObjectMap: 0x7f1ca8009400 object_may_exist: object_no=0 r=0
Updated by Jason Dillaman over 5 years ago
- Status changed from In Progress to Fix Under Review
Updated by Mykola Golub over 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #36122: mimic: [rbd-mirror] sporadic deep-copy failure possible during fsx stress test added
Updated by Mykola Golub about 5 years ago
- Pull request ID set to 24179
A small mock test fixup, which would be nice to backport too: https://github.com/ceph/ceph/pull/26322
Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".