Project

General

Profile

Bug #36074

[rbd-mirror] sporadic deep-copy failure possible during fsx stress test

Added by Jason Dillaman 11 months ago. Updated 7 months ago.

Status:
Pending Backport
Priority:
Normal
Target version:
-
Start date:
09/18/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

If an image is being synced by an rbd-mirror daemon and is then blacklisted from the local cluster while it's deep-copying an object (specifically after it's written the last snapshot version but before it updates the object map), it might silently fail without passing the -ESHUTDOWN error code back to "ImageCopyRequest<I>::handle_object_copy". Without the failure error code, the sync point object position can get improperly updated so that the now the object map remains incorrect yet not flagged as invalid (due to the blacklisting).

http://qa-proxy.ceph.com/teuthology/jdillaman-2018-09-18_08:50:35-rbd-wip-rbd-mirror-distro-basic-smithi/3038017/teuthology.log


Related issues

Copied to rbd - Backport #36122: mimic: [rbd-mirror] sporadic deep-copy failure possible during fsx stress test In Progress

History

#1 Updated by Jason Dillaman 11 months ago

2018-09-18 13:15:04.761 7feb857fa700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 ObjectCopyRequest: dst_oid=rbd_data.106d34a0242b.0000000000000000
2018-09-18 13:15:04.761 7feb857fa700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_src_object_extents: 2 src extents
2018-09-18 13:15:04.761 7feb857fa700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_list_snaps: src_oid=rbd_data.10659609c337.0000000000000000
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_list_snaps: r=0
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: start_src_snap_id=0, end_src_snap_id=97, clone_end_snap_id=97, end_dst_snap_id=122, diff=[0~4194304], end_size=4194304, exists=1
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 0~4194304, dst_object_offset=0, read: [0~4194304]
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: read/write op: 0~4194304 dst: 0
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 4194304~4194304, dst_object_offset=4194304, read: []
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: start_src_snap_id=97, end_src_snap_id=171, clone_end_snap_id=0, end_dst_snap_id=125, diff=[0~4194304], end_size=0, exists=0
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: clearing truncate diff: [0~4194304]
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: extent 0~4194304 intersects truncation 0~4194304, inserting zero 0~4194304
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 0~4194304, dst_object_offset=0, read: []
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_ops: src_object_extent: 4194304~4194304, dst_object_offset=4194304, read: []
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_read_object: src_snap_seq=97
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_read_object: read op: 0~4194304
2018-09-18 13:15:04.761 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_read_object: read rbd_data.10659609c337.0000000000000000
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_read_object: r=0
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 merge_write_ops:
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_read_from_parent_ops: no parent
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_read_from_parent: r=0
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_zero_ops:
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_zero_ops: src_snap_seq=97, end_size=4194304
2018-09-18 13:15:04.809 7febfdffb700  5 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 compute_zero_ops: object DNE for snap_id: 125
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: dst_snap_seq=0, dst_snaps=[]
2018-09-18 13:15:04.809 7febfdffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: write op: 0~4194304
2018-09-18 13:15:06.317 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_write_object: r=0
2018-09-18 13:15:06.317 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: dst_snap_seq=122, dst_snaps=[122]
2018-09-18 13:15:06.317 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_write_object: remove op
2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_write_object: r=0
2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 send_update_object_map: dst_snap_id=122, object_state=1
2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::ObjectMap: 0x7feb78050410 aio_update: start=0, end=1, ->1
2018-09-18 13:15:11.377 7fec1dffb700 20 librbd::object_map::UpdateRequest: 0x7febf00e6db0 update_object_map: ictx=0x7feb800236e0, oid=rbd_object_map.106d34a0242b.000000000000007a, [0,1) = ->1
2018-09-18 13:15:11.721 7fec1dffb700 20 librbd::object_map::UpdateRequest: 0x7febf00e6db0 handle_update_object_map: r=-108
2018-09-18 13:15:11.721 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 handle_update_object_map: r=0
2018-09-18 13:15:11.721 7fec1dffb700 20 librbd::deep_copy::ObjectCopyRequest: 0x7feb78040ef0 finish: r=0

#2 Updated by Jason Dillaman 11 months ago

2018-09-18 15:01:14.577 7f1cb37fe700 10 librbd::object_map::RefreshRequest: 0x7f1ca8009770 send_load: oid=rbd_object_map.106d34a0242b.000000000000007a
... snip ...
2018-09-18 15:01:14.577 7f1cb37fe700 20 librbd::io::ObjectDispatch: 0x2401390 read: rbd_data.106d34a0242b.0000000000000000 0~8388608
2018-09-18 15:01:14.577 7f1cb37fe700 20 librbd::io::ObjectRequest: 0x7f1ca800f8f0 send:
2018-09-18 15:01:14.577 7f1cb37fe700 20 librbd::ObjectMap: 0x7f1ca8009400 object_may_exist: object_no=0 r=0

#3 Updated by Jason Dillaman 11 months ago

  • Description updated (diff)

#4 Updated by Jason Dillaman 11 months ago

  • Status changed from In Progress to Need Review

#5 Updated by Mykola Golub 11 months ago

  • Status changed from Need Review to Pending Backport

#6 Updated by Nathan Cutler 11 months ago

  • Copied to Backport #36122: mimic: [rbd-mirror] sporadic deep-copy failure possible during fsx stress test added

#7 Updated by Mykola Golub 7 months ago

  • Pull request ID set to 24179

A small mock test fixup, which would be nice to backport too: https://github.com/ceph/ceph/pull/26322

Also available in: Atom PDF